2,274 Matching Annotations
  1. Dec 2022
    1. Author Response

      eLife Assessment:

      This manuscript follows the still unanswered concept of 'original antigenic sin' and shows the existence of a 24-year periodicity of the immune response against influenza H3N2. The valuable work suggests a long-term periodicity of individual antibody response to influenza A (H3N2) within a city. But, to substantiate their argument, the authors would need to provide additional supporting data.

      Thank you for your comments. We have performed additional analyses and included those results in the revision to support our findings.

      Specifically, we included a sensitivity analyses that predicting phases by fitting models with 35- and 6-years periodicity, which were found to provide poorer predictions than the 24-year periodicity used in our main results (Figure 4 – figure supplementary 1).

      We also generated a antigenic map with the locations of our tested strains shown in the map. We also compared the paired antigenic distance of A(H3N2) strains (including our tested strains). These results (Figure 1 – figure supplementary 3) suggested that the tested strains that we used spanned the circulation of A(H3N2) since its emergence and well covered the antigenic space of the virus.

      Reviewer #1 (Public Review):

      The authors suggest that there is a long-term periodicity of individual antibody response to influenza A (H3N2). The interesting periodicity may be surely appeared. Though the authors assume that the periodicity is driven by pre-existing antibody responses, the authors could provide more supportive data and discuss some possibilities.

      Thank you for your comments and please find our point-to-point responses below.

      1) The authors can investigate whether the periodicity reflects an epidemic/invasion record of A(H2N3) within Guangzhou or the surrounding city, e.g., the numbers of flu-infected people yearly can be referred to.

      Thank you for your comments. We aimed to investigate the periodicity in individual level antibody responses, so we made several efforts to minimize the impacts of population level A(H3N2) activity in our analyses. In particular, we have removed the average activity at population level (i.e., strain-specific intercepts), to minimize the impact of higher circulation of a certain stain on the periodicity.

      In our simulations, we tested models that only incorporated population level activity but not including cross-reactions (Figure 3B, I), which did not recover the observed periodicity. In the models that including both population level activity and cross-reactions, we found that less predictable population level activities (i.e., less regular annual epidemics) would increase the variations in individual-level long-term periodicity (Figure 3G-H). We also found that measured periodicities did not vary substantially when comparing those measured at baseline compared to those measured at follow up (~3-4 years later). These results suggested that the local epidemics may only have limited impacts on the observed periodicity in individual’s antibody responses, while the cross-reactions between previous exposed and currently circulating strains may be the main drivers.

      To address this comment, we added a paragraph in discussion (lines 336-342):

      “In this study, we did not explore the interactions between individual level antibody responses with population level A(H3N2) activity (e.g., epidemic sizes). We minimized the impacts from population level by performing the Fourier analysis with individual departures from population average and validating the results with data from the Vietnam cohort. Simulation results further suggested that the population level virus activity alone was not able to recover the observed periodicity, though epidemics with less regularity seemed to increase the variability in individual-level periodicity in the presence of broad cross-reactions (Figure 3G-H).”

      2) The authors can consider whether the participants are recently/previously vaccinated and/or infected with flu. The remaining antibodies may reflect a long memory but may show a recent activation.

      Thank you for your comments. We agree with the reviewer that the observed seroconversion of the circulating strains may reflect responses recent re-exposures. Given the low influenza vaccine coverage in our cohort (1.3%, 10 out of 777) and in China in general (<5% [3, 4]), we believe that our observed periodicity and seroconversion patterns were unlikely to be caused by to recent influenza vaccinations.

      We think that the pervasive exposure to A(H3N2) could be a driver to the observed seroconversions to circulating strains between our baseline and follow-up were likely due to the pervasive exposures (or reinfections for those who developed into infections). Using the same data set, we previously reported 98% and 74% of participants experienced 2- and 4-fold rise to any of the 21 tested A(H3N2) strains [5].

      As the reviewer and previous studies suggested, the antibody responses could reflect long term memories that were activated after recent exposures [1, 6]. We generated our hypothesis based on this features, and to characterize the periodicity that may arose from the interactions between long term memories and newly generated antibodies.

      We incorporate the re-infection mechanism in our simulations, with and without subsequent cross-reactions with previously exposed distant strains (Figure 3I). Results indicate that reinfection alone cannot recover the observed long-term periodicity (Figure 3A), while reinfection plus the resulting cross-reactions can recover such long-term periodicity (Figure 3D). Therefore, we believe that the repeated exposures or re-infections would not affect our reported periodicity, while they may be drivers of continuous formulation of the life-course antibody profiles and the observed periodicity. Of particular note is the consistency of measured periodic behaviour at baseline and follow up (~3-4 years later).

      To address this comment, we reported the vaccination status of our participants when introducing the data (lines 127-129) and in the discussions (lines 280-282 and 313-315):

      “Only 0.6% (n = 5) of participants self-reported influenza vaccinations between the two visits, therefore, the observed changes in HI titers between the two visits were likely due to natural exposures.”

      “Due to the low influenza coverage in our participants and in China in general, the observed seroconversions likely reflected antibody responses after natural exposures during the study period.”

      “Particularly, our simulation results suggested that model including repeated exposures or population level A(H3N2) activity alone did not recover the long-term periodicity (Figure 3).”

      3) The strains inducing high HI titers may have similar mutations and may be reactive to the same antibodies. What are the mutation frequencies among 21 A(H3N2) strains?

      Thank you for your comments. We selected the 21 tested strains to cover the span of the circulation of A(H3N2) strains since 1968 and antigenic diversity. We prioritized with the strains that were included in the vaccine formulation and tested to create the antigenic map by Fonville et al. [1].

      We reproduced the antigenic map (up to strains isolated in 2010) by Fonville et al. [1] and compared the antigenic locations of our tested A(H3N2) strains (Figure 1—figure supplement 3). The 21 strains (or their belonging antigenic clusters if the strains were not used for the map) largely tracked the antigenic evolution of A(H3N2) since its emergence in 1968, with a reportedly mutation rate of 0.778-unit changes in antigenic space per year [1, 2].

      We further calculated the paired antigenic distance of strains tested in the antigenic map, which was highly correlated with the time intervals between the isolation of the two strains. The figure also suggested our tested strains cover the time spans and antigenic distances that were shown in the original antigenic map. In addition, our observed periodicity was identified in individual time series of residuals, which has removed the shared virus responses or assay measurements (Figure 1). Therefore, we believe that the impact of specific mutations may have limited impacts on our findings.

      To address this comment, we included the reproduced antigenic map showing the locations of the tested strains and their pair-wise antigenic distance in Figure 1—figure supplement 3 and referenced in the main text (line 127).

      Reviewer #2 (Public Review):

      This is a well-thought-out, clearly exposed article. It builds upon the platform of 'original antigenic sin' (OAS), a notion first developed from studying individuals infected with influenza. According to OAS, the initial infection will set the dominant immune response targets (antigens) that immune cells will recognize, such that infection with a related strain will cause a strong response focused mainly against the initially infecting strain, that then goes on to protect against the new-infecting strain. This study builds off this idea, showing that as strains become increasingly antigenically distant as inferred by the time between strain appearance, the cross-protection can drop to a point where it needs to be invigorated with a potentially new response. The potential biological mechanisms behind this aren't discussed, but a model is built that conveys the potential for 'relative risk' of an individual over the course of the life, based essentially on when one was born.

      Thank you for your comments. We expanded our introduction hoping to include more biological mechanisms, especially those related with original antigenic sin.

      “Antibodies mounted against a specific influenza virus decay (in either absolute magnitude or antigenic relevance) after exposure until re-exposure or infection to an antigenically similar virus occurs, whereupon back-boosting of antibodies acquired from previous infections (e.g., activation of memory B cells) can occur, as well as updating antigen specific antibodies to the newly encountered infection (e.g., activation of naïve B cells.” (lines 80-84)

      “Original antigenic sin (OAS) is a widely accepted concept describing the hierarchical and persistent memory of antibodies from the primary exposure to a pathogen in childhood. Recent studies suggested that non-neutralizing antibodies acquired from previous exposures can be boosted and may blunt the immune responses to new influenza infections.” (lines 92-97)

      The basic premise was to measure from serum influenza haemagglutinin-inhibition (HI) titers of 21 strains of influenza A (H3N2) - related strains causing disease at various times over a period of some 40 years- from a diverse set of ≈800 participants of various ages, at two time points, spaced 2 yr apart. The authors then calculated the HI titer for the 21 strains for each individual. From this, each participant's age, their age at the time of a strain's development, and when a strain emerged were used to assess whether there was periodicity to immune responses by performing a splined Fourier transform for each individual and then examining the composite pattern across time for HI titers. The authors propose that on average there is a 24-year periodicity to immune responses to influenza strains, such that after the initial infection, cross-reactivity reduces to the point where it may be less meaningful for protection over around 24-year, and suggests activation of a 'new' immune response might be required to control the more distant strain involved in the response at that time. The periodicity was longer than would be predicted if age were not a factor involved in the HI titer patterns across time. Further, variability in the periodicity was shown to involve broad cross-reactivity between strains and narrow cross-reactivity in more highly-related (closer in time) strains, individual HI titer, and periodic population fluctuations. In the literature, viral strains are estimated to mutate to the point of losing 50% cross-reactivity with a T1/2 of approximately 2.5 yr, which would make the inferred lifespan plausible but perhaps surprisingly long, implying there are immune feedback parameters that influence periodicity. The authors also use an independent cohort of approximately 150 individuals from a separate, published, study to validate some findings revealed in the primary data set.

      Thank you for your comments and sorry for the confusion. We agree with the reviewer that the onward protection from the cross-protection should be shorter than 24-year periodicity that was identified in the retrospective antibody responses. We hope to clarify that we identified long-term periodicity by retrospectively investigating the individual antibody profiles, which were results of multiple previous exposures and immunity and cross-reactions that arose from these previous exposures. Therefore, the long-term periodicity is a retrospective characterization, and should not be directly interpretated as the duration of onward protection.

      As shown in Figure 4A, the 24-year periodicity consists of phases when individuals’ titers are higher (phase I & II) and lower (phase III & IV) than the population average. As such, the duration of onward protection may be shorter than the entire periodicity. Assuming the protection decreasing with lower titer levels, the onward protection is expected to decrease in phase II and take 1-6 years to drop from the furthest to population average. This is consistent with findings that homotypic cross-protection against PCR-confirmed infections up to about five seasons (lines 291-293), but whether such protection is driven by the declining of cross-reactions still need further investigations.

      To address this comment, we rephrased our discussion and make the interpretation less confusing. (lines 285-287):

      “Of note, the long-term periodicity is a retrospective characterization of individual antibody profiles that arose from multiple exposures and cross-protection, which should not be directly interpreted as the duration of onward protection conferred by the existing antibodies.”

      Strengths: Overall, the study is well executed and the patterns that are visually apparent in Figure 1A (the 'raw' data) are built on to inform a model of the potential breadth of cross-reactivity in a given individual at any given time after birth, integrated with the influenza strains to which they are most likely to have been first exposed. It is a complex thing to make sense of data involving many individuals who could be infected or vaccinated at any and variable points in time over the course of their life, but the authors derive a model that probabilistically accounts for possible infection events, so controls for this nicely, or at least to a degree that is practicable.

      Thank you for your supportive comments. We hope to clarify that we identified the long-term periodicity using the residuals of individual HI titers after extracting the population activity that is visually noticeable in Figure 1A. By doing this, we hope to minimize the impacts of population level A(H3N2) activity and laboratory measurements on individual antibody responses (Figure 1C; detailed methods in lines 396-412).

      Questions related to the main limitation: The level of math in this paper makes it hard for a basic biologist to critique the approach, but the argued points are intriguing. Foremost, in the final part of the paper the authors move from building a model to testing its potential to predict HI titers in the final quarter strains of the study period, placing individuals into one of four phases: I) early increasing to high titer response, II) waning response phase where they are returning back to the average population-level response against a strain, III) sub-par response against a strain and then reinitiation of HI titers in phase IV. Pleasingly this shows a good correlation between individuals' ages and their predicted phase. However, while the fit predicts phase well in Fig 4C and 4D, it looks to perform less adequately in Fig 4B.

      1) Why is this?

      Thank you for your comments and sorry for the confusion. In Figure 4B, we aimed to characterize and predict the position instead of the amplitude in the individual time series of residuals. Therefore, we fitted the model using only harmonic terms (i.e., sine and cosine functions; Equation 12 on page 26) [7], while we believe there may be other factors that could affect the observations but were not included in the model. The perditions from the model inform the position and velocity of harmonic oscillators rather than the amplitude or extent of the wave, therefore, the predictions did not exactly fit the observations.

      To address this comment, we expand the corresponding methods hoping to make it clear (lines 661-663):

      “Of note, we fitted the model aiming to estimate the position of the harmonic oscillators and did not consider for other non- harmonic factors, therefore the model may not fully capture the variations of the data.”

      2) Another point for consideration is that the time between samplings (2010-2012) is comparatively short, given a 24-yr predicted periodicity. What would happen to the predictions if the periodicity were 35-yr or 6-yr? Would the model fail to call individuals accurately in these cases?

      Thank you for your comments. We repeated our predictions in Figure 4F-G by assuming a 35-year and 6-year periodicity respectively as suggested. Results suggested that model predictions with either 35-year or 6-year did not outcompete the model predictions assuming a 24 years old (Figure 4—figure supplement 1). For instance, the observed proportion of seroconversion to circulating strains in each cohort have correlation coefficients of 0.49 (p-value = 0.05), 0.63 (p-value = 0.02) and -0.12 (p-value = 0.69) with the predicted proportion of phase IV when assuming a 35-, 24- and 6-year periodicity, respectively.

      We also hope to clarify that we investigated the prediction potentials of long-term periodicity from two perspectives. Except for using the periodicity to predict the seroconversions between baseline and follow-up, we also predict the phase of each individual in the year of 2012 only using HI titers against strains that were isolated before 2002. Our results suggested our 10-years ahead predictions well correlated with observations (Figure 4C).

      To address this comment, we also included the results of analyses using alternative 35- and 6-year periodicity as Figure 4—figure supplement 1, and reported in the main text (lines 262-264).

      3) Similarly, if the samples were taken further apart, would the model still be effective at predicting phase?

      Thank you for your comments. We hope to clarify that we collected two cross-sectional serum samples, while we identified the long-term periodicity and predicted phase with serums collected from each visit, separately. For instance, in our sensitivity analysis that using serum collected in follow-up (Figure 1—figure supplement 1), we revealed similar long-term periodicity (baseline in Figure 1) with that identified using the baseline serums, despite pervasive exposures during this time period (time separating samples varied from 3-4 years). In addition, the Vietnam data collected sera from six consecutive years. These data showed a similar long-term periodicity (Figure 2—figure supplement 5).

      For the phase prediction, we used residuals of HI titers against 14 historical strains that were isolated between 1968 and 2002, and predicted the phase of strain that was isolated in the year 2012. This prediction was derived purely by depending on the periodic pattern of the time series and without information for strains isolated 10 years prior to 2012. Therefore, the prediction was 10 years ahead and was well correlated with observations from the complete time series, further supporting that there may be an intrinsic cycling in individual antibody responses and that this cycle is fairly stationary and predictable.

    1. Author Response

      Reviewer #1 (Public Review):

      While the circuits underlying the computation of directional motion information in the fly brain are very well described, much less is known about the neurons serving the detection of objects. In a previous publication from the same lab, it has been shown that flies perform body saccades to track a moving object during flight. In the current paper, Frighetto and Frye provide evidence that T3 cells, a population of neurons within the optic lobes, are involved in this task. First, they performed 2-photon Calcium imaging from T3 cells to show that these cells respond to moving bars, which they later use in behavioural experiments. They then silenced T3 cells using genetic tools and tested the behavior of these flies in response to a rotating bar using two different setups. In one, the flies are fixed and bilateral changes in wing stroke amplitude are used as a measure for turning, in the other, flies are magnetically tethered such that they can rotate around the vertical body axis. Silencing T3 cells leads to the abolishment of the steering response induced by object position using a bar that is defined by its motion relative to the surround, but leaves the response to object motion intact. In the magnetically tethered flies, it reduces the number of saccades and thus leads to an impairment of bar-tracking behavior. In another set of experiments they optogenetically activated the whole population of T3 neurons (which supposedly impairs their normal function), which leads to an increase in the number of saccades after the activation (when the light stimulus used to activate the cells is turned off). Silencing the neurons necessary for detection of local motion, T4 and T5 cells, in contrast reduces responses elicited by object motion rather than position, but also has an impact on object tracking saccades. The authors provide a simple model, where speed-dependent signals from multiple T3 cells are integrated and trigger a saccade, when a threshold is reached.

      The data generally support the conclusion that T3 cells play a role in detecting bar position and in controlling saccades in response to rotating bars. However, there are some inconsistencies in the data that are not sufficiently explored and discussed.

      1) In a previous paper from the lab (Keleş et al., 2020), it was shown that T3 cells respond preferentially to small objects, whereas here they robustly respond to elongated bars and even large-field gratings. This discrepancy is not discussed.

      The most likely explanation is that Keleş et al. (2020) work used stimuli of half-contrast (or lower) to probe contrast polarity effects, whereas our stimuli here match the behavior experiments using maximum contrast broadband stimuli. Keleş et al. (2020) work also provided visual stimuli over the full display, >200-degrees in azimuth, whereas here we only provide stimuli unilaterally over <100 degrees; perhaps there was some effect of contralateral stimulation. Finally, different Gal4 drivers; here we use a split-Gal4 that is highly specific for T3. Keleş et al. (2020) work used a normal Gal4 driver less clean than the split. We shall discuss these discrepancies in revision.

      2) In a previous paper, the authors showed that integrated positional error rather than bar position is used to elicit bar-tracking saccades and that saccade amplitude is relatively stereotyped. However, here they show, that T3 cells respond much more strongly to a slowly moving stimulus (18{degree sign}/s) rather than to the fast moving stimuli used for the behavioral experiments (> 90{degree sign}/s). This response property plays an important role for the model they propose. My general concern here is that the findings might not be generalizable to slower moving bars, where more precise, position-dependent responses could play a larger role, and that these fast moving bar stimuli represent an extreme situation, where the flies cannot accurately track bar position any more.

      We agree that flies will not accurately track purely positional cues at higher bar speeds, since responses to positional signals are inherently sluggish. In free-flight, files execute orientation saccades when a stationary post subtends ~30 degrees (bar width used here), at which point the leading edge of the post is moving ~250°/s (van Breugel and Dickinson 2012). Thus, higher bar speeds are the norm for flies, and our behavioral stimuli (90°/s) was chosen to robustly trigger tracking saccades and to compare with previously published behavioral data sets. Bar velocity of 18°/s is far below the range that robustly triggers orientation saccades. We image at 90°/s and 180°/s to show that T3 responses to behaviorally relevant bar speeds could reasonably act as inputs to an integrate-and-fire behavioral controller. These points shall be clarified in revision.

      3) The claim that T3 cells are tuned to stimulus velocity is not supported by the data in my view. For the bar stimuli, the authors only tested speeds of 18{degree sign}/s and above 90{degree sign}/s, but nothing in between. For the grating motion there seems to be an influence of temporal frequency for the same stimulus velocity (see e.g. Fig.1_1), but this is not quantified.

      We shall add a full spatiotemporal response profile in revision. One note: we presented T3 responses to different grating speeds in Supplemental material because our goal was merely to indicate speed sensitivity by T3, rather than to present a comprehensive speed tuning curve. T3 is distinct from T4 and T5 in that it is not directionally selective, is full-wave rectified for contrast, and shows similar responses to bars of differing temporal frequencies moving at the same speed. These properties are also likely accompanied by a broad spatial frequency sensitivity (which would bestow speed tuning), but in revision shall either demonstrate this or remove claim to it.

      4) The results from the optogenetic activation experiments are hard to interpret, as it is unclear how a prolonged activation of all T3 cells would affect the downstream circuitry. It is not clear that this experiment is equivalent to a "loss-of-function perturbation" of T3 cells as the authors claim in the text.

      We are making an assumption, which we shall clarify in revision, that downstream circuitry requires a spatiotemporal progression of columnar activity, as would be generated by the projection of a discrete bar-type-object moving across the eye, and that activation of all columnar inputs together, as would occur with CsChrimson stimulation, would disrupt this discrimination. Although it is a supposition, we feel that it is parsimonious. We compared the effect of CsChrimson stimulation under two different LED intensities but found no effect on bar tracking behavior.

      Reviewer #2 (Public Review):

      In their manuscript titled "Feature detecting columnar neurons mediate object tracking saccades in Drosophila", Frighetto & Frye study the effect manipulating T3 neurons has on tethered flight saccades. The authors first characterize the responses of T3 neurons to simple visual stimuli, and then manipulate T3 cells (with both Kir2.1 and CsCrimson) and study the effects on the fly's tethered flight behavior, focusing on different types of sharp turns (saccades). Finally, the authors suggest an integrate and fire model to explain how an array of T3-like neurons can produce some of the recorded behavior.

      The authors study the elementary, yet challenging, computation of object discrimination. They hone in on a cell type that most likely plays an important role in the circuit. However, the authors do not sufficiently clarify the framework in which they conceptualize T3's role in object discrimination, neither when discussing it in the introduction/discussion nor when explaining experimental results. The authors present the work in comparison to T4/T5 cells. However, T4/T5 cells have been shown to be both local motion detectors and the main cell types to compute motion in the fly's eye. Downstream neurons integrate over these local units to detect different patterns of global and local motion (Authors should cite Krapp 1996 Nature). Are the authors suggesting that T3 neurons perform a similar function only as local object detectors? That is a bold claim that will need to be supported with more experimental results and reconciled with previous results. We already know of other Lobula Columnar neurons (LCs) that respond to different sizes, some even smaller than the optimal T3 stimulus (e.g. Klapoetke 2022 Neuron) and we know of LCs that respond to small objects that do not receive major inputs from T3 cells (e.g. Hindmarsh 2021 Nature).

      We are attempting to posit a simple and parsimonious framework for T3 action. Are T3 neurons “local object detectors”? T3 is clearly not “selective” for local objects, since we show that they respond to elongated bars and wide-field gratings (at least when projected over the ipsilateral visual hemisphere). T3 is, however, “sensitive” to objects: vertical bars yielded a mean response peak ~1 ΔF/F whereas a small square object elicited a peak of ~4 ΔF/F (Keleş et al., 2020). This amplitude differential likely indicates surround inhibition, but does not preclude a downstream integrating neuron from pooling columnar inputs to assemble a spatial receptive field for either an elongated bar or a small object. Individual T4/T5 neurons show roughly double the response amplitude to a small object than a long vertical bar (Keleş et al., 2020), which is consistent with other reports, but one would not classify T4/T5 as “small object detectors” as they play a fundamental role in detecting wide-field motion stimuli. We intend to posit that (i) columnar T3 neurons are small-field (local) detectors of the features contained within stimuli that flies readily track, (ii) that the integration of these local signals could support the integrated error computations that flies make to track bars, which (iii) explains why T3 blockade compromises bar tracking saccades. We do not mean to claim that T3 are the first, last, or only inputs to object detection circuitry in deeper neuropiles. We shall endeavor to clarify these issues in revision.

      These differences between T4/T5 cells and T3s also make interpreting the experimental manipulations more challenging. When hyperpolarizing T4/T5 or 'blinding' them with CsCrimson activation, the visual motion circuit is severely disrupted. However, the same cannot be said about inactivating/blinding T3 neurons and the object detection circuit (if it is indeed a single circuit). The authors are justified in deducing a connection between blocking T3 neurons and a reduction in bar tracking, but generalizing the results to object detection requires more experiments and clarifications.

      We consider “bar tracking” to be one form of object detection, but not the only form. A bar is an “object” (albeit a tall object) in the sense that it is optically disparate from the visual surround. Thus, inactivating/blinding T3 indeed severely disrupts the detection of bar-type-objects. We shall clarify the language to remove any confusion between “object” and “bar”. We do not mean to generalize T3 function to all object vision in the same way that T4/T5 function is generalized to all motion vision, and this shall be clarified in revision.

      When framing the manuscript in the object detection framework, previous results regarding the definition of an object should also be addressed. Maimon Curr. Biol. 2008 and work from their own lab (Mongeau, 2019) have already shown that tethered flies respond differently to bars and small objects (fixating on the former while anti-fixating on the latter). Previous work has also shown that T3 neurons respond strongly to small objects and suppress responses to long bars (Tanaka Curr. Biol. 2020). Since all the behavioral experiments in the current manuscript and all the visual stimuli are full arena-length bars, it is impossible to tell whether the T3 results generalize to small objects and even how to reconcile the stronger response to small objects with the role ascribed to T3 cells in generating behavioral responses to long bars.

      This amplitude differential between small object and elongated bar responses by T3 likely indicates surround inhibition, but does not preclude a downstream integrating neuron from pooling columnar inputs to assemble a spatial receptive field for either an elongated bar or a small object. Consider that T4/T5 neurons show roughly double the response amplitude to a small object than a long vertical bar (Keleş et al., 2020 and consistent with other reports), but one would not classify T4/T5 as “object detectors” as their small-field columnar signals are integrated by downstream wide-field neurons that assemble spatial filters for specific patterns of optic flow that are generated during flight maneuvers (Krapp et al., 1996 Nature). One downstream integrator of T3 inputs, LC11, is more selective for small objects than T3. We shall clarify these points in revision.

      Finally, the authors propose a model for a hypothetical neuron downstream of T3 that would integrate over several T3s and generate saccades. However, given the current knowledge level in the fly vision field, the model should either be grounded more in actual circuit connectivity or produce testable predictions that would guide further research.

      We are currently working on the putative downstream partners of T3, and testing for the integration of T3 signals. Preliminary data show that silencing a specific LC class postsynaptic to T3 recapitulates the effects of silencing T3 on saccadic bar pursuit. In the revised version of the manuscript we will provide additional discussion.

      The authors should decide whether they would like to address these concerns with more specific experiments that would shed light on the role T3 has to play under different conditions and different definitions of a visual object, or whether they would prefer to limit the scope of their claims.

      We shall endeavor to do both!

      Reviewer #3 (Public Review):

      In free flight, flies largely change their course direction through rapid body turns termed saccades. Given how important these turns are in determining their overall behavior and navigation, it is important to understand the neural circuits that drive the timing of triggering these saccades, as well as their amplitude. In this paper the authors leverage the powerful genetic tools available in the fruit fly, Drosophila, to address this question by performing physiology experiments as well as behavioral experiments with inactivation and activation perturbations.

      The authors make three primary conclusions based on their experiments: (1) the feature detecting visual pathway (T3) is responsible for triggering saccades in response to moving objects, but not widefield motion, (2) the pathway primarily responsible for wide field motion encoding (T4/T5) is responsible for triggering saccades in response to widefield motion, and (3) the T4/T5 pathways is responsible for controlling the amplitude of both object and widefield motion triggered saccades.

      The authors go on to show that using calcium imaging data of T3 activity it is possible to predict under what conditions flies will initiate a saccade when presented with objects moving at different speeds, resulting in a parsimonious model for how saccades are triggered.

      Together, the imaging, behavior, and modeling provide compelling evidence for claims 1 and 2, however, the evidence and modeling for point 3 - the amplitude of the saccades - is lacking. The statistical analysis does not go into sufficient detail in comparing across different cases, and in particular, there is little mention of the effect sizes, which appear to be quite small (this is primarily in reference to 3F and 4E). The data suggest that both the T3 and T4/T5 pathways contribute to saccade amplitude, instead of T4/T5 being the only or primary drivers.

      We agree that the evidence suggests that both T3 and T4/T5 pathways contribute to saccade amplitude for bar tracking behavior, and shall clarify this conclusion in revision. However, we also note that the effect of silencing T4/T5 is more prominent (e.g., peak angular velocity) and more consistent across visual conditions. We will dig deeper into the data to substantiate this point. The effect sizes might be small because the silencing approach (i.e., inward rectifying Kir2.1 channels) maintains a hyperpolarized state but does not completely block neuron function; consider that the wide-field optomotor responses of T4/T5>Kir2.1 flies is reduced but not eradicated (Fig. 3A_1).

    1. Author Response

      Reviewer #1 (Public Review):

      Li et al. have designed a study that examines specific mechanisms for how different DNA sequence variants in the common cancer gene p53 (also known as TP53) influence the sensitivity of tumors to a variety of common cancer treatments. Specifically, they examine a handful of p53 variants with respect to glioblastoma and its response to platinum-based chemotherapy and to radiation therapy. The authors begin by mentioning that looking at DNA variants in cancer is useful but also incomplete: methylation, PTMs, and non-DNA sequence variants can also be critical. They then mention that they have created a model showing that nearly all cancers with p53 mutations have loss-of-function variants and that many cancers with "normal" wildtype p53 in fact have variants causing LOF. These p53 LOF tumors lead to worse patient outcomes, but the authors here show that these tumors appear to be more susceptible to radiation and platinum-based chemotherapy, which they say they have validated in glioblastoma xenografts. This potentially opens up a new avenue for precision medicine for many different sources of cancer that share common p53 LOF variants. The authors have taken a modern approach towards cancer diagnosis and shown how this can improve targeted treatments across a large array of cancer types. They have provided a reasonably convincing proof of concept of this approach for n = 35 PDXs in one cancer type. By and large, the approach and results are reasonable, although many of the exact results concerning the genes and pathways identified that covary with the various treatments and p53 variants are unclear. For instance, the feature selection seems to be somewhat ad hoc, e.g. the method used to determine p53 LOF from p53 WT in the TCGA data was not the same method used for determining p53 LOF from p53 WT in the PDX data.

      Thanks for the positive comments. In our study, we used the same method for feature selection (i.e., p53 targets identification), and for calculating CES in different cancer types. This is described in Materials and Methods. However, the methods used to identify the LOF of WT TP53 in TCGA and PDX data are different. For TCGA LUNG, BRCA, COAD, ESCA cohorts, we used the SVM models built from the same cancer type to predict TP53 status. For PDX samples derived from the glioblastoma patients, we used the unsupervised clustering approach. This is because:

      1) To train an SVM model, we need a large number of “normal” samples (to represent p53 normal status) and “tumor samples with TP53 truncating mutation” (to present p53 LOF status). In this PDX cohort (n = 35), we have no “normal” samples and only one p53-truncating mutation (Fig. 4f, Table S6). Technically, it is impossible to build an SVM model from this PDX cohort.

      2) The TCGA GBM cohort also has very limited “normal” samples (n = 5) which prevents us from training an SVM model for glioblastoma prediction.

      3) The TCGA pan-cancer SVM model is not a good choice since GBM was not included into the pan-cancer cohort due to its limited training sample size. Although the pan-cancer model achieved a high AUROC, its performances varied significantly across cancer types. This is most likely due to the imbalanced sample size, since the pan-cancer model is biased by cancer types (e.g., lung and breast) with the larger sample sizes.

      4) Even we were able to build a new SVM model from the TCGA pan-cancer with GBM samples included, applying this SVM model to predict non-TCGA samples is still very challenging because of batch effects.

      Therefore, we first used the unsupervised clustering as an alternative to the SVM model to classify samples, and then we manually annotate the PDX clusters into “p53-pN” and “p53-pLOF” according to the composite expression score.

      We agree with the reviewer that the underlying pathways/mechanisms that can potentially explain the different treatment effects and p53 non-mutational LoF are still unclear and warrant further investigation.

      The TCGA AUROCs were incredibly good - over 99% - versus more like 75% for the actual proof of concept. While any significant p-value is fine for basic research, it would be nice to know how this could be improved and bring the results in Figure 4 from ~75% to the >99% that would be necessary for use as a medical diagnostic or for treatment selection for precision medicine.

      Thanks for your suggestion. Precision cancer medicines that target TP53 mutations are currently being evaluated in clinical trials. Developing a robust model to predict p53 functional status for medical diagnosis or treatment selection is the primary goal of our study. However, there is still a long way to go to bring the model trained from external data into medical practice. To minimize the biological, clinical and technological heterogeneities and bias, the best approach is to train an SVM model from the same cancer type in the same institute; this requires:

      1) The sample sizes of both normal and tumors harboring TP53 truncating mutation should be sufficient to train the SVM model. Take the TCGA lung cancer dataset (n_tumor = 1003) as an example, we built an excellent SVM model from 108 normal samples and 254 tumor samples with TP53 truncating mutations. A much larger sample size is needed if the TP53 truncating mutation frequency is low.

      2) Matched data including whole-exome or whole-genome sequencing (to determine TP53 mutation status), RNA-seq (for gene expression), and treatment response.

      If one plans to use public data such as TCGA to train the model, the major challenge is integrating data from different sources (i.e., remove batch effects arising from different patients’ cohorts, tumor samples storage and processing, library preparation, sequencing, and bioinformatics analyses).

      However, there are significant questions regarding the specific findings uncovered: do the gene pathways identified through bioinformatic analysis fit in with the many highly-studied mechanistic roles of p53? Do the cohort selections - which vary by an order of magnitude in sample size, and come from different locations and different tissues - make statistical sense for cross-validation?

      According to our analysis, p53 targets shared by four selected cancer types are significantly enriched in “cell cycle control” and “DNA damage response” pathways, which are the canonical functions of p53 (PMID: 9039259, PMID: 36183376).

      For the four TCGA cancer cohorts selected in our study, cross-validations were independently performed for each cancer type. For the pan-cancer cohort, we agree with the reviewer that the samples come from different locations and different tissues, and the pan-cancer SVM model could be potentially biased by a few cancer types with larger number of samples. Building a pan-caner SMV model is a compromised strategy when each cancer type alone does not have sufficient samples to train its own SVM model, and more rigorous evaluations (by independent datasets) are needed. This is why we put the pan-cancer results into the supplementary materials. We have revised the manuscript to make this point clear (Page 9).

    1. Author Response

      Reviewer #1 (Public Review)

      [...] One potential issue is that the high myelination signal is associated with the compartment in V2 (pale stripes) which was not functionally defined itself but by the absence of specific functional activations. No difference was reported between those stripes that were defined functionally. Other explanations for the differential pattern of a qMRI signals, e.g. ROI distribution for presumed pale stripes is not evenly distributed (more foveal), ROIs with low activations due to some other factor show higher myelin-related signals, cannot be excluded based on the analysis presented.

      Indeed, it would have been advantageous to directly functionally delineate pale stripes in V2. Since we were not able to achieve this by fMRI, we needed an indirect method to infer pale stripe contributions in the analysis. We also added a statement in the discussion section to emphasize this more (p. 9, lines 286–288).

      Furthermore, different myelination between thin and thick stripes was not tested, since we did not have a concrete hypothesis on this. Despite the conflicting findings of stronger myelination in dark or pale CO stripes in the literature, no histological study stated myelination differences between dark CO thin and thick stripes. Therefore, our primary interest and hypothesis was lying in comparing the different myelination of thin/thick and pale stripes using MRI.

      Thank you very much for this comment about potential other sources of differential qMRI parameter patterns. Indeed, based on the original analysis we could not exclude that the absence of functional activation around the foveal representation may have biased our analysis. We therefore added a supporting analysis, in which we excluded the region around the foveal representation from the analysis. The excluded cortical region was kept consistent between participants by excluding the same eccentricity range in all maps. We added more details in the results section of the revised manuscript (p. 8, lines 189–202). In Figure 5-Supplement 1 and Figure 5-Supplement 3, results from this supporting analysis are shown which reproduced the primary findings from the main analysis, particularly the relatively higher myelination of pale stripes.

      ROI definitions solely based on fMRI activation amplitude have additional limitations. However, we find it unlikely that a small fMRI effect size and low contrast-to-noise ratio (i.e. stochastic cause of low statistical parameter values/”activation”) has impacted the results, since Figure 3 shows that we could achieve a high degree of reproducibility for each participant.

      We would note that the fact that we found consistent differences across MPM and MP2RAGE sessions makes some potential artifacts driving the differences unlikely. We also find it unlikely that systematic cerebral blood volume differences between stripes would have driven the results. A higher local blood volume would lead to increased BOLD responses but also to a higher R1 value due to the deoxy-hemoglobin induced relaxation, which is opposite to the observation of higher activity in the thick/thin stripes but lower R1 values.

      Further studies using other functional metrics (e.g. VASO, ASL etc.) may help us to even more clearly demonstrate specificity but were out of the scope of this already rather extensive study. Although we have added extensive further analyses in the revised manuscript such as controlling for foveal effects or registration performance, we did not see a possibility to fully exclude a systematic bias that might potentially be caused by unknown factors.

      Another theoretical and practical issue is the question of "ground truth" for the non-invasive qMRI measures, as the authors - as their starting point - roundly dismiss direct histological tissue studies as conflicting, rather than take a critical look at the merit of the conflicting study results and provide a best hypothesis. If so, they need to explain better how they calibrate their non-invasive MR measurements of myelin.

      We agree and have now further elaborated on the limits of specificity of the R1 and R2* signal as cortical myelin marker (p. 2, lines 68–88; p. 6, line 163; p. 8, line 216; p. 9, lines. 257–260). However, we still think that it is important for the reader to appreciate the conflicting results in histological studies using staining methods for myelin, which adds to the study’s background.

      We did not intend to give the impression that MRI provides the missing ground-truth to adjudicate histological controversies, but that it provides an alternative and additional view on the open questions. We changed the introduction to better reflect the aspect that the study offers a unique view by providing myelination proxies and functional measures in the same individual, which allows for direct comparison and investigation of structure-function relationships (see p. 2, lines 68–70; p. 3, lines 93–95), which is not accessible to any other approach. Nevertheless, we would like to note that R1 has been well established as a myelin marker under particular conditions (Kirilina et al., 2020; Mancini et al., 2020; Lazari and Lipp, 2021). It has also been widely used for cortical myelin mapping across a variety of populations, systems and field strengths. We added this statement to the introduction (see p. 2, lines 82-85). We note that we excluded volunteers with pathologies or neurological disorders from the study and their mean age was about 28 years. Thus, we had conditions comparable to previous (validation) studies.

      Because of the contradictory findings of histological studies, we could not further finesse the hypothesis beyond our previous a priori hypothesis that we expected differences in the myelin sensitive MRI metrics between the thin/thick versus pale stripes. To improve the contextual understanding, we added a paragraph in the discussion section covering in more depth how the MRI results relate to known histological findings (see pp. 8–9, lines 216–240).

      While this paper makes an important contribution to the question of the association of specific myelination patterns defining the columnar architecture in V2, it is not entirely clear whether the authors can fully resolve it with the data presented.

      Indeed, we agree that non invasive aggregate measures, such as the R1 metrics, offer limited specificity which precludes a fully conclusive inference about cortical myelination. We have further emphasized this on several occasions in the text (see p. 2, lines 68–88; p. 6, line 163; p. 8, line 216; p. 9, lines. 257–260). Since the correspondence of cortical myelin levels and R1 (and other metrics) is an active area of research, we expect that the understanding, sensitivity and specificity of R1 to cortical myelination will further improve. We note that the use of qMRI is a substantial advance over weighted MRI typically used, which suffers from lack of specificity due to instrumental idiosyncrasies and varying measurement conditions.

      Reviewer #2 (Public Review)

      [...] Unfortunately, this particular study seems to fall into an unhappy middle ground in terms of the conclusions that can be drawn: the relaxometry measures lack the specificity to be considered "ground truth", while the authors claim that the literature lacks consensus regarding the structures that are being studied. The authors propose that their results resolve whether or not stripes differ in their patterns of myelination, but R1 lacks the specificity to do this. While myelin is a primary driver of relaxation times in cortex, relaxometry cannot be considered to be specific to myelin. It is possible that the small observed changes in R1 are driven by myelin, but they could also reflect other tissue constituents, particularly given the small observed effect sizes. If the literature was clear on the pattern of myelination across stripes, this study could confirm that R1 measurements are sensitive to and consistent with this pattern. But the authors present the work as resolving the question of how myelination differs between stripes, which over-reaches what is possible with this method. As it stands, the measured differences in R1 between functionally-defined cortical regions are interesting, but require further validation (e.g., using invasive myelin staining).

      We agree that we have inadvertently overstated the specificity of R1 at several occasions in the text. We therefore toned down the statements concerning the correspondence between R1 and myelin throughout the manuscript (e.g. see p. 2, lines 68–88; p. 6, line 163; p. 8, line 216; p. 9, lines. 257–260).

      We also removed the phrase that gave the impression that MRI can conclusively resolve the conflicting results found in histological studies. In the Introduction, we changed the corresponding paragraph by emphasizing the alternative view, which can be obtained from MRI by the possibility to investigate structure-function relationships in the living human brain, which would not be possible by invasive myelin staining (see p. 2, lines 68–70; p. 3, lines 93–95).

      We acknowledge that – perhaps aside from electron microscopy – all common markers have shortcomings, which limit their specificity. For example, classic histology is not quantitative and resulted in conflicting results. It even includes the very fundamental issue, that the composition of myelin varies across the brain and within brain areas significantly (e.g., its lipid composition (González de San Román et al., 2018)). Thus, we regard the different invasive/non-invasive measures as complementary. R1 adds to this arsenal of measures and can be acquired non invasively. It has been shown to be a reliable myelin marker under certain circumstances. It follows the known myeloarchitecture patterns of the human brain, which was also checked for the data of the present study (see Figure 4 and Appendix 2). It is responsive to traumatic changes (Freund et al., 2019), development (Whitaker et al., 2016; Carey et al., 2018; Natu et al., 2019) and plasticity (Lazari et al., 2022). Since we studied healthy volunteers with no known pathologies that were sampled randomly from the population, we believe that the previous results generally apply and suggest sufficient specificity of the R1 marker. Of course, we cannot fully exclude bias due to unknown factors that have not been investigated/discovered by validation studies yet. However, in this case we expect that the systematic differences between stripe types would remain an important result most likely pointing to another interesting biological difference between stripes.

      While more research is needed to clarify the precise role of R1 for cortical myelin, we think that the meaningful determination of quantitative MR parameter within one cortical area is still interesting for the neuroscientific community.

      Moreover, the results make clear that R1 differences are not sufficiently strong to provide an independent measure of this structure (e.g., for segmentation of stripe). As such, one would still require fMRI to localise stripes, making it unclear what role R1 measures would play in future studies.

      Indeed, the observed small effect sizes in the present study still requires a functional localization with fMRI. We expected small effect sizes using R1 and R2* due to the known small inter-areal or intra-cortical differences of MRI myelin markers. Therefore, this study aimed at a proof-of-concept investigating whether intra-areal R1 differences at the spatial scale of columnar structures can be detected using non-invasive MRI. Our study shows that these differences can be seen but currently not at the single voxel level. We anticipate that with further improvements in sequence development and scanner hardware, high-resolution R1 estimates with sufficient SNR can be acquired making fMRI redundant (for this kind of investigations). Please see the reply to the next comment concerning the impact of using R1 in future studies.

      The Introduction concludes with the statement that "Whereas recent studies have explored cortical myelination ... using non-quantitative, weighted MR images... we showed for the first time myelination differences using MRI on a quantitative basis". As written, this sentence implies that others have demonstrated that simpler non-quantitative imaging can achieve the same aims as qMRI. Simply showing that a given method is able to achieve an aim would not be sufficient: the authors should demonstrate that this constitutes an important advance.

      Thank you for this comment. It goes to the heart of the concerns raised about specificity and sensitivity of MRI based myelin metrics. We elaborate here on the main advantage of using qMRI in our current study and why it is more specific than weighted MR imaging. However, we emphasize that a thorough comparison between qMRI and weighted MRI is highly complex and refer to our recent review paper on qMRI for further details (Weiskopf et al., 2021), which are beyond the scope of our paper. The signal in weighted MRI, even when optimally optimized to the tissue of interest, additionally depends on both inhomogeneities in the RF transmit and receive (bias) fields. Other methods like using a ratio image (T1w/T2w) can cancel out the receive field bias entirely (in the case of no subject movements between scans) but not the transmit field bias. This hampers the direct analysis and interpretation of signal differences between distant regions of the brain. For high resolution imaging applications, the usage of high magnetic fields such as 7 T is beneficial or even mandatory due to signal-to-noise (SNR) penalties. With increasing field strength, these inhomogeneities also apply to small regions as V2. For these cases, qMRI is advantageous since it provides metrics which are free from these technical biases, significantly improving the specificity. As high-field MRI has the potential to non invasively study the structure and function of the human brain at the spatial scale of cortical layers and cortical columns, we believe that the results of our current study, which successfully demonstrate the applicability of qMRI to robustly detect small differences at the level of columnar systems, is relevant for future studies in the field of neuroscience.

      We emphasized these considerations in the revised manuscript (see. p. 9, lines 273–285).

      The study includes a very small number of participants (n=4). The advantage of non-invasive in-vivo measurements, despite the fact that they are indirect measures, should be that one can study a reasonable number of subjects. So this low n seems to undermine that point. I rarely suggest additional data collection, but I do feel that a few more subjects would shore up the study's impact.

      The present study was conducted in line with a deep phenotyping study approach. That is, we focused on acquiring highly reliable datasets on individuals. We did not intend to capture the population variance, which is often the goal of other group studies, since low level and basic features such as stripes in V2 are expected to be present in all healthy individuals. Thus we traded off and prioritized test-retest measurements for fMRI sessions and using an alternative MP2RAGE acquisition over a larger number of individuals. This resulted in 6–7 scanning sessions on different days for each individual, summing up to 26 long scanning session in total. We also note that the used sample size is not smaller than in other studies with a similar research question. For example, another fMRI study investigating V2 stripes in humans used the same sample size of n=4 (Dumoulin et al., 2017).

      The paper overstates what can be concluded in a number of places. For example, the paper suggests that R1 and R2 are highly-specific to myelin in a number of places. For example, on p7 the text reads" "We tested whether different stripe types are differentially myelinated by comparing R1 and R2..." Relaxation times lack the specificity to definitively attribute these changes purely to myelin. Similarly, on p11: "Our study showed that pale stripes which exhibit lower oxidative metabolic activity according to staining with CO are stronger myelinated than surrounding gray matter in V2." This implies that the study directly links CO staining to myelination. In addition to using non-specific estimates of myelination, the study does not actually measure CO.

      We agree that we did not clearly point out the limitations of R1 myelin mapping. Therefore, we toned down the statements about the connection between cortical myelin and R1. The mentioned statements in the reviewer’s comment were changed accordingly (see p. 6, line 163; p. 11, lines 353–354). We also included a small paragraph to clarify the used terminology (color-selective thin stripes, disparity-selective thick stripes) in the manuscript (see p. 4, lines 110–114) to avoid the inadvertent conflation of CO staining and actually measured brain activity.

      I'm confused by the analysis in Figure 5. I can appreciate why the authors are keen to present a "tripartite" analysis (thick, thin, and pale stripes). But I find the gray curves confusing. As I understand it, the gray curves as generated include both the stripe of interest (red or blue plots) and the pale stripes. Why not just generate a three-way classification? Generating these plots in effect has already required hard classification of thin and thick stripes, so it is odd to create the gray plots, which mix two types of stripes. Alternatively, could you explicitly model the partial volume for a given cortical location (e.g., under the assumption that partial volume of thick and thin strips is indicated by the z-score) for the corresponding functional contrast? One could then estimate the relaxation times as a simple weighted sum of stripe-wise R1 or R2.

      Figure on weighted average of stripe-wise R1 and R2. (a) shows the weighted sum of R1 (de-meaned and de-curved) over all V2 voxels. z-scores from color-selective thin stripe experiments and disparity-selective thick stripes were used as weights in the left and middle group of bars, respectively. An intermediate threshold of zmax=1.96 was used, i.e., final weights were defined as weights=(z-1.96). Weights with z<0 were set to 0. For pale stripes (right group of bars), we used the maximum z-score value from thin and thick stripe measurements. We then set all weights with z≥1.96 to 0 and used the inverse as final weights. i.e., weights = -1 * (max(z)-1.96). (b) shows the same analysis for R2. Error bars indicate 1 standard error of the mean.

      (1) Yes, indeed. We agree that modeling the partial volume of each compartment (thin, thick and pale stripes) in each V2 voxel would be the most elegant approach. However, we note that z-scores between thin and thick stripe experiments may not reflect the voxel-wise partial volume effect, since they are a purely statistical measure and not a partial volume model. Having said this, we think that this general approach can give some additional insights and we provide results for a similar analysis here. We calculated the weighted sum of R1 and R2 values over all V2 voxels for each stripe compartment (thin, thick and pale stripes) independently (see above figure). For R1, we see the same pattern of R1 between stripe types as in the manuscript (Figure 5). Additionally, we show the differences here for each subject, which further demonstrates the reproducibility across subjects in our study. For R2, no clear pattern across subjects emerged, confirming the results in our manuscript. Since, this analysis did not add relavant new information to the manuscript, we refrained from adding this figure to the manuscript, in order not to overload it.

      (2) In our current study, we were not primarily interested in investigating differences between thin/thick stripes and pale stripes. While histological analysis found differences (though not consistent) between CO dark stripes (more myelinated, (Tootell et al., 1983)) and CO pale stripes (more myelinated, Krubitzer and Kaas, 1989)), no study stated myelin differences between CO dark stripes. This does not fully exclude the possibility of myelination differences but suggests that if myelination differences between CO dark stripes existed, they would presumably be smaller than differences between CO dark and CO pale stripes. Thus, it would be even more difficult to demonstrate than the hypothesis of this manuscript.

      Therefore, we decided to directly test two compartments against each other instead of modeling all three compartments within a single model. In our analysis, we thereby loosely followed the analysis methods described in Li et al. (2019), which compared myelin differences between thin/thick and pale stripes in macaques. We note that this demonstrates further consistency, since it is not trivial that both thick and thin stripes show lower R1 values than the pale stripes. For example, there may be no or opposite differences.

      (3) Just for clarification, the plots in Figure 5 show the comparison of R1 (or R2*) between two compartments in V2. The red (blue) curve includes the thin (thick) stripe of interest. The gray curve includes everything in V2 minus contributions from thick (thin) stripes of interest. If we take the thin stripe comparison as example (Figure 5a), then red contains the thin stripes of interest while gray contains everything minus the thick stripes. Therefore, assuming a tripartite stripe arrangement, the gray curve contains both thin and pale stripe contributions.

      References

      Carey D, Caprini F, Allen M, Lutti A, Weiskopf N, Rees G, Callaghan MF, Dick F. Quantitative MRI provides markers of intra-, inter-regional, and age-related differences in young adult cortical microstructure. Neuroimage 2018; 182:429–440.

      Dumoulin SO, Harvey BM, Fracasso A, Zuiderbaan W, Luijten PR, Wandell BA, Petridou N. In vivo evidence of functional and anatomical stripe-based subdivisions in human V2 and V3. Sci Rep 2017; 7:733.

      Freund P, Seif M, Weiskopf N, Friston K, Fehlings MG, Thompson AJ, Curt A. MRI in traumatic spinal cord injury: from clinical assessment to neuroimaging biomarkers. Lancet Neurol 2019; 18:1123–1135.

      González de San Román E, Bidmon H-J, Malisic M, Susnea I, Küppers A, Hübbers R, Wree A, Nischwitz V, Amunts K, Huesgen PF. Molecular composition of the human primary visual cortex profiled by multimodal mass spectrometry imaging. Brain Struct Func 2018; 223:2767–2783.

      Kirilina E, Helbling S, Morawski M, Pine K, Reimann K, Jankuhn S, Dinse J, Deistung A, Reichenbach JR, Trampel R, Geyer S, Müller L, Jakubowski N, Arendt T, Bazin P-L, Weiskopf N. Superficial white matter imaging: Contrast mechanisms and whole-brain in vivo mapping. Sci Adv 2020; 6:eaaz9281.

      Krubitzer LA, Kaas JH. Cortical integration of parallel pathways in the visual system of primates. Brain Res 1989; 478:161–165.

      Lazari A, Lipp I. Can MRI measure myelin? Systematic review, qualitative assessment, and meta-analysis of studies validating microstructural imaging with myelin histology. Neuroimage 2021; 230:117744.

      Lazari A, Salvan P, Cottaar M, Papp D, Rushworth MFS, Johansen-Berg H. Hebbian activity-dependent plasticity in white matter. Cell Rep 2022; 39:110951.

      Li X, Zhu Q, Janssens T, Arsenault JT, Vanduffel W. In Vivo Identification of Thick, Thin, and Pale Stripes of Macaque Area V2 Using Submillimeter Resolution (f)MRI at 3 T. Cereb 2019; 29:544–560.

      Mancini M, Karakuzu A, Cohen-Adad J, Cercignani M, Nichols TE, Stikov N. An interactive meta-analysis of MRI biomarkers of myelin. Elife 2020; 9:e61523.

      Natu VS, Gomez J, Barnett M, Jeska B, Kirilina E, Jaeger C, Zhen Z, Cox S, Weiner KS, Weiskopf N, Grill-Spector K. Apparent thinning of human visual cortex during childhood is associated with myelination. PNAS 2019; 116:20750–20759.

      Tootell RBH, Silverman MS, De Valois RL, Jacobs GH. Functional Organization of the Second Cortical Visual Area in Primates. Science 1983; 220:737–739.

      Weiskopf N, Edwards LJ, Helms G, Mohammadi S, Kirilina E. Quantitative magnetic resonance imaging of brain anatomy and in vivo histology. Nat Rev Phys 2021; 3:570–588.

      Whitaker KJ, Vértes PE, Romero-Garcia R, Váša F, Moutoussis M, Prabhu G, Weiskopf N, Callaghan MF, Wagstyl K, Rittman T, Tait R, Ooi C, Suckling J, Inkster B, Fonagy P, Dolan RJ, Jones PB, Goodyer IM, NSPN Consortium, Bullmore ET. Adolescence is associated with genomically patterned consolidation of the hubs of the human brain connectome. PNAS 2016; 113:9105–9110.

    1. Author Response

      Reviewer #1 (Public Review):

      1) In family 2, the variant was detected by routine trio-based WES diagnostics. Sanger confirmation was not performed. IGV images can be added as supplementary material. Furthermore, median coverage was 75× which might not be sufficient for the identification of all heterozygous variants.

      We thank reviewer for pointing it out for clarification. Obviously, at the time (2016) of the reporting of this variant this was our laboratory’s thoroughly validated protocol, which shows that median (!) coverage of 75x with the technology at the time is more than sufficient for robust variant calling. This particular variant was actually below 75X in coverage (at 65x), but Sanger confirmation was not necessary (based on thorough validation of the robustness of calling and GATK scores and other quality parameters for de novo calling). In addition, when coverage goes below 30-35X Sanger confirmation is warranted.

      2) Proband 2 (P2) was born as the second child of non-consanguineous parents of Caucasian descent after an uneventful pregnancy and delivery. The boy was macrosomic at birth. Since there was macrosomia, how would the pregnancy be uneventful? At the last assessment at 10 years of age, obesity associated with hyperphagia was of concern; the weight of the patient should be clarified. P2 was diagnosed with autism spectrum disorder but a normal cognitive profile. The identified NM_001014809.2(CRMP1_v001):c.1280C>T variant is very rare and reported in GnomAD exomes with allele frequency 0.0000041.

      Routine echographia during pregnancy did not result in any concerns. The pregnancy was indeed uneventful. BMI at last evaluation was 26.1. We included the details in the revised manuscript.

      3) Proband 3 (P3) is the first of three children of a non-consanguineous family of European descent. There is a familial history of obesity on both parental sides, and the father is macrocephalic (head circumference: 60.5 cm). Macrocephaly can be isolated and benign, such as in benign familial macrocephaly. However, P3 presented with moderate intellectual disability and an autism spectrum disorder. Since P3 has a macrocephaly also, the PTEN gene should be further interrogated by detailed WGS data analysis as well as an additional orthogonal method(s) since it has pseudogenes.

      We have not noted any pathogenic variant of the PTEN gene in the genetic analysis.

      Reviewer #2 (Public Review):

      Weaknesses of the article include:

      1) Spelling errors and difficult-to-understand language. The use of "variant" is now preferred over mutation. According to current nomenclature, predicted but not experimentally confirmed protein alterations should be written as p.(Phe351Ser) rather than p.Phe351Ser.

      We apologise for the spelling errors and the difficult-to-understand language in the manuscript. We considered the reviewers comments seriously and corrected the errors and rephrased the sentences wherever necessary.

      2) Inconsistent use of in silico pathogenicity predictors and conservation metrics. These should be standardized for each case and should include at least phylop, CADD, and REVEL.

      We have applied consistency in the description of in silico pathogenicity predictors and conservation metrics for each patient.

      3) CRMP1 is under significant constraint against loss-of-function variation in gnomAD - pLI = 0.99, LOEUF 0.28. Genes in the top decile are highly enriched for haploinsufficiency as a disease mechanism. This should be considered in the interpretation of this data and incorporated into the manuscript.

      We thank the reviewer for the comment. As per reviewer’s suggestion, we have included a statement in the revised manuscript under ‘Subjects and Methods’ section.

      4) I am not convinced the data supports a dominant-negative interpretation. The variants do not oligomerize as well as wild-type CRMP1, and when co-expressed with wild-type CRMP1 there is an increase in monomeric wild-type CRMP1. While this could support a dominant-negative interpretation, an alternative explanation is these are loss-of-function alleles that cannot oligomerize, and at the stoichiometry of this artificial overexpression system, this leads to increased monomeric wild-type CRMP1. The axonal outgrowth studies are more compelling, but without a loss-of-function control allele, it is difficult to interpret.

      The experiments in Figure 2 should be replicated, quantitated, and their statistical significance confirmed.

      We thank reviewer for raising concern about the experiment and interpretation of the data. We performed size exclusion chromatography experiments and included the data in the revised Figure 2. Unfortunately, we could not reproduce the experiments for Figure 2B. From our current experimental results, we prove that the CRMP1 variants affect the homo-oligomerization process.

      Reviewer #3 (Public Review):

      1) The major weakness is Figure 2, as it is not performed up to high standards like the rest of the paper. Panel A does not show any loading control and does not confirm. Panel B at 720 kDa band is not convincing. Results should be repeated with size exclusion chromatography and/or another method to determine molecular weight and should be quantified from triplicate experiments. Panel C is also not convincing and should be repeated to more carefully show results, and quantified.

      We thank reviewer for this important concern raised on our Figure 2 experimental data. We addressed the comments in the revised manuscript. We performed size exclusion chromatography and presented the results in the revised manuscript and discussed accordingly in page 23-24.

      Fig. 2A: This panel shows the recombinant CRMP1 wildtype and the variants from E-coli expressing system. We repeated the expression several times and obtained similar partially cleaved proteins. Fig. 2A is Coomassie Brilliant Blue staining. Protein size marker and loading control (BSA) were applied on the same gel as shown in Fig.2A original.

      Fig.2B: Due to limited protein expression of T313M and P475L mutants, we could not repeat the gel-filtration experiments.

      Fig. 2C, 2D: It is difficult to adjust the expression level of each construct (CRMP1 wildtype, T313M, or P475L) in HEK293T cells (input). Therefore, we measured the signal intensity of myc-IP band and input ratio of V5 blot in each condition. Fig. 2D shows the ratio from four independent experiments.

    1. Author Response

      Reviewer #1 (Public Review):

      In this paper, Quiniou and colleagues show via orthogonal methods that human thymopoiesis releases a large population of CD8+ T cells harboring a/b paired TCRs that (i) have high generation probabilities, (ii) have a preferential usage of some V and J genes, (iii) are shared between individuals and (iv) can each recognize and be activated by multiple unrelated viral peptides, notably from EBV, CMV and influenza.

      Major strengths of the paper

      Quiniou et al. generated single-cell sequencing datasets of the earliest stages of TCR beta chain gene recombination. And then showed that a subset of them is highly clustered also having high generation probability.

      They show that these T cells can bind multiple antigens, both via the use of public antigen-specific datasets as well as corroborating experimental TCR expression and binding essays.

      Minor weaknesses

      To what extent is TCR clustering and high Pgen and cross-individual sharing correlated? What is the Pgen of the sequences clustered with the high Pgen cells? Can you comment on the correlation between these three phenomena?

      Indeed, there is a significant positive correlation between the Pgen and the number of connections among the clustered TCRs, as was reported in Fig.1F of the original manuscript. Furthermore, this correlation is true for both private and public TCRs, as was reported in figure 2B of the original manuscript.

      To show the link between the three phenomena, we now have added two supplementary figures showing a high positive correlation between Pgen and the number of connections, and between cross-individual sharing and the number of connections, and to a lesser extent between Pgen and cross-individual sharing (Figure 2-figure supplement 4C and D in the manuscript supplementary information).

      However, we would like to emphasize that the difference in the mean Pgen of the clustered and dispersed TCRs is of about 20-fold. This is a high difference for a biological process (and highly statistically significant), but a small one compared to the 10-log10 span of the Pgens of the two populations. Factually, what we observed is not that clustered sequences have a high Pgen, but that they have a higher Pgen than the non-clustered sequences. Yet, many CDR3s with high Pgen do not cluster, and vice versa, indicating that a high Pgen is not the only (nor most important) driver of clustering. We have now added these as Figure 1-figure supplement 3E-F of our revised manuscript.

      In other words, to what extent is this surprising to see that highly clustered TCRs have higher Pgen and are more shared?

      That for a given CDR3 there is a correlation between having a high Pgen and being public is not surprising as both suggest a positive selection during evolution. What is more surprising is that there are CDR3s forming large clusters that occupy over 20% of the repertoire and that co-cluster between individuals with different HLA, “indicating a convergence of specificities between individuals’ clustered repertoires”. This suggests a surprising selection process that could depend less on HLA than the “classical” selection.

      These points are now better emphasized in the revised manuscript.

      Potential Impact of the paper

      This work highlights an intrinsic property of the adaptive immune response: to generate TCRs with high generation probability that can efficiently bind multiple antigens. This finding has, therefore important impact on drug discovery and vaccine design.

      We thank the reviewer for his appreciation.

      Reviewer #2 (Public Review):

      This study analyses the T cell receptor (TCR) repertoire of double positive human thymocytes, and compares this to mature single positive CD8 cells. The first major finding is that the repertoire post-selection is enriched for groups of TCRs with high generation probabilitites, similar sequences, and for TCRs previously annotated for viral specificity. This data is clearly presented and convincing. The extent of analysis of the human thymocyte repertoire is still very limited, and the paper adds significantly to this important question.

      We thank the reviewer for his appreciation.

      The second major finding is much more controversial. The authors first investigate the publicly available databases and show that there is a substantial proportion of TCRs which have been annotated to multiple viral specificities, a fact which is well-known to the specialists in the field, but not previously addressed.

      Indeed, we are not aware of reports disclosing “a substantial proportion of TCRs which have been annotated to multiple viral specificities”. Actually, one could wonder why “a fact which is well-known to the specialists in the field” is not mentioned and discussed in published articles? To us, it reveals that this point has been overlooked by immunologists as recently in Zhang et al, 2021 where authors aiming at identifying highly specific T cell clones with a new modelling approach, excluded all clones binding more than 1 peptide. Thus, it makes it important to report it, as we do. Furthermore, we would also like to emphasize that we do more than just reporting that some TCR have “been annotated to multiple viral specificities”. We show from a manual curation of public databases that (i) some TCR have been reported to bind to tetramers presenting peptides from unrelated viruses; (ii) that such TCRs co-cluster using Levenshtein distance or GLIPH2 based clustering method; and (iii) that some of these TCRs indeed recognize different, unrelated peptides without significant sequence homology upon re-expression in carrier T cells.

      The authors acknowledge that this in silico analysis is mostly based on unpaired alpha/beta sequence data, and that the chain pairing may influence specificity. They, therefore, perform a number of functional assays, demonstrating examples of T cells which respond by interferon gamma production to more than one peptide.

      We thank the reviewer for pointing to the fact that, beyond tetramer binding, we performed cumbersome functional studies to document polyreactivity.

      The paper is mostly very clearly written and presented and provides some fascinating novel perspectives on T cell cross-reactivity.

      We thank the reviewer for his appreciation

      The findings will surely be of interest to a broad readership - indeed anyone interested in how adaptive immunity works.

      The link between the different sections of the paper is the weakest aspect. The relationship between thymic selection and polyspecificity, and also the real relationship between in silico "cross-reactivity" as evidenced by multiple annotations and the functional polyspecific T cells remains unclear.

      Our flow of reasoning/analyzing was as follow. As we were studying the thymic selection of TCR repertoires, (1) we discovered a massive clustering within these repertoires. As for thymocytes this cannot be accounted for by a history of immune responses, this triggered our attention and led us to analyze the properties of these TCRs. This led us (2) to discover in these thymic repertoires “TCRs which have been annotated to multiple viral specificities”, that we were not aware of. We were so much intrigued by these observations that we wanted to substantiate them using datasets of paired  TCRs. As (3) we could confirm these observations in such datasets, this led us (4) to investigate these TCRs in functional studies. This is the link for the 1-to-4 sections.

      To make this link clearer, we have reworked the titles of the different Results’ sections such as to emphasize the switch from thymocyte bulk sequencing studies to that of single peripheral cell sequencing studies.

      The mechanistic molecular details underlying polyspecificity also remain unclear.

      Indeed, we believe that solving the structure of polyreactive TCRs interacting with different peptides will be needed for a molecular understanding of polyreactivity, but that it falls beyond the present work.

      But overall, lots of interesting new data, and some very intriguing hypotheses for the community to follow up on.

      We thank the reviewer for his overall comment

      Reviewer #3 (Public Review):

      In this manuscript, the authors propose that there is a special, previously unrecognized, high-frequency population of a/b TCRs that are shared between people, have high generation probabilities, and react to many unrelated viral epitopes. Here is the main flow of the results, with comments on the strengths of the conclusions:

      "Thymopoiesis selects a large and diverse set of clustered CDR3s with high generation probabilities" -- this seems correct and has been noted in earlier work by Mora and Walczak and others.

      So far, Mora and Walczak selection models in humans are based on studying PBMCs (our ref n° 27 in the revised version), not thymic DP and SP sorted cells, even in the mouse derived models for which they used the total thymic cells (our ref n° 27).

      Selection leads to a focusing of the CDR3 length which likely increases the degree of clustering and increases Pgen.

      To address this question, we compared the CDR3 length distribution between DP CD3+ cells and CD8 SP cells from our thymic dataset. We did not observe major changes. The distribution and the mean of CDR3 length for the two cell populations remained identical. We only observed a small shifting in the CDR3 length distribution towards shorter sequences post-selection. This is now reported in the new Figure 1-figure supplement 3C in the revised manuscript.

      "Clustered CDR3s are enriched for publicness " This also seems correct and again it makes sense: publicness is equivalent to having been independently rearranged (and sequenced) in another individual, which is determined by Pgen, and clustering is also determined to a large extent by Pgen (the factors that contribute to Pgen, shorter CDR3s for example, are largely shared between neighbor TCRs).

      We agree that theory could have indeed predicted that. In any case, to our knowledge, this is the first report of large clusters of just selected thymocytes’ CDR3s that moreover co-cluster between individuals with different HLA.

      "Clustered public CDR3s are enriched in viral specificities" -- This claim is not justified by the data, which comes from sequence matching against literature-derived databases. Rather, what is true is that "Clustered public CDR3s are enriched in public viral specificities".

      We changed “CDR3s are enriched in viral specificities” for “clustered public CDR3s are enriched in public viral specificities".

      But this might be a simple consequence of the previous observation, that "clustered CDR3s are enriched for publicness". One would need experimental specificity data on the very same datasets to make a conclusion about viral specificities in general.

      We based our interpretation on experimental data.

      Indeed, we manually curated databases to identify CDR3s that bind specific tetramers/dextramers. This type of “experimental specificity data” is for immunologists a paradigmatic and yet unchallenged mean to define specificity.

      We make the observation that there are more CDR3s from a TCR that does bind tetramers/dextramers presenting viral peptides in clustered than in dispersed CDR3s. This is a highly statistically significant fact, that we now report as a fact that we leave open to discussion/challenge by our community.

      "Identification of polyspecific TCRs" -- In this section, the authors report that some of the CDR3 clusters contain CDR3 sequences from literature-derived TCRs with multiple specificities. They conclude that these must represent polyspecific TCRs. The problem with this conclusion is that even having the same CDR3beta, let alone similar CDR3beta sequences, does not imply the same specificity. One can see the problem if one imagines a very deeply sequenced dataset, and focuses on a short CDR3 length with high frequency. With sufficient sampling, one will be able to navigate from nearly any single CDR3beta to any other CDR3beta of the same or similar length by jumping between single-mismatch variants. But this doesn't imply that all the TCRs from which these CDR3s were sampled, which likely have many different Vbeta genes and completely different TCRalpha sequences, must all bind the same thing.

      We will first point to the fact that we did not analyze “a very deeply sequenced dataset”, but only the 18 000 most abundant sequences per sample. Singletons were excluded. In addition, we did not mean to say that all the connected TCRs have the same specificities, regardless of their position in the cluster. Clustering algorithms, whether LV distance of GLIPH2 for example, are now commonly used to infer specificity of clusters and it is admitted that the closer the TCR sequences are, the more they share their specificities.

      That said, it is precisely because we acknowledge the limitation of bulk sequencing for inferring specificities that we turned to also analyze single-cell datasets.

      We made this more apparent by the new sections of the results that more clearly indicate the shift from unpaired bulk thymocyte sequencing and paired single peripheral cell sequencing.

      "Binding properties of polyspecific TCRs" -- Here the authors look to validate these results with paired TCR sequences. They analyze a public dataset made available by 10X genomics, featuring single-cell gene expression, TCR sequencing, and dextramer UMI counts for ~150,000 T cells. This is an amazing dataset with lots of interesting features, but, like any large high-throughput dataset, it needs to be analyzed with care.

      We can assure the reviewer that we were always very careful. Actually, we even started by carefully reviewing the 10X proposed methodology, in which we identified major biases. This led us to explore this dataset cautiously and without preconceived ideas.

      The authors claim to see evidence for large-scale cross-reactivity. This comes mainly from a set of dextramers for A03 and A11-restricted peptides. But these dextramers appear to be binding in a uniquely non-specific manner (by comparison with the other dextramers) and non-TCR-dependent manner in this experiment. One can see this, for example, by comparing the consistency of binding within expanded clonotypes: for a specific dextramer like A*02-GIL(Flu), positive binding for one cell in a clonotype greatly increases the likelihood of binding for other cells in the clonotype, suggesting that the binding is mediated by the TCR.

      This is not true for the A03 and A11 dextramers (except for a few expanded clonotypes in an A*11 donor). TCR sequence doesn't appear to be the determining factor for binding to these dextramers; rather it may be expression of KIR genes or other surface proteins that can interact with MHC.

      These are indeed striking binding patterns that are remarkably similar for a single CDR3 beta associated with more than 40 different CDR3s alpha (and moreover from two donors). The first attitude of immunologists would indeed be of discarding this observation for non-fitting the paradigms. We would like to rather propose an agnostic view at these results.

      These results show that a series of five A03 and A11 dextramers loaded with various peptides bind to cells that express a given CDR3 beta associated with a multitude of CDR3alpha. If it would be an MHC to KIR binding, then such dextramers should bind to most cells, independently of their TCRs. We have added two supplementary figures (Figure 4-figure supplement 8B-C) to show that this is not the case, and that further show very different binding patterns.

      If it would be a binding to “other surface proteins”, it would likely be the same.

      We identified a CDR3 from donor 3 which binds preferentially to A03 and A11 dextramers. However, it binds to only 4 out of 5 of these. If the binding is non-specific and non-TCR-dependent, a binding for the A0301 RIAAWMATY BCL2L1 dextramer should also have been observed. Moreover, we identified this same CDR3beta in two other cells from donor 1 and 4, and that were associated with a different CDR3alpha. Except for only one binding, these TCRs didn’t show binding to the A03 and A11 dextramers.

      Moreover, we identify another CDR3 from donor 1 that is associated with a strong binding to one A1101 dextramer presenting an EBV peptide when associated to many different CDR3alpha. The binding to the other A03 and A011 dextramer is weaker and seem to depend more on the CD3alpha.

      If the binding of A03 and A011 dextramers is non-specific and non-TCR-dependent, why is there such a difference between the binding of A1101 IVTDFSVIK and A1101 AVFDRSDAK dextramers?

      "Polyspecific T cells are activated in vitro by multiple viral peptides" Here the authors explore polyspecificity experimentally. First they report that polyclonal populations of T cells, sorted for binding to one dextramer, can also produce IFN gamma upon stimulation with a distinct peptide, albeit more weakly than for the cognate peptide.

      This is indeed true for CMV+ sorted cells that respond better to CMV peptides than to EBV ones, but not true for EBV+ sorted cells that also respond better to CMV peptides than to EBV ones.

      But it's not clear that the concentrations of the peptides are appropriate for stringently detecting cross-reactivity.

      We wonder what does mean “stringently”? It is possible that stringently mainly means defining the conditions that eliminates what does not fit the current paradigm?

      More factually, the peptide concentration used for these experiments, presented in Fig. 5A-B, was 1 µg/mL, i.e. ~1 µM for a 9-10 aa-long peptide. This is clearly a physiological concentration for viral peptides, routinely used in in-vitro recall assays. We can thus rule out that the observed cross-reactivity is simply due to an excess peptide stimulation.

      Then the authors actually synthesize and characterize individual TCRs. Here what is seen is consistent with expectation and does not seem to support the idea of substantial fuzzy cross-reactivity: binding to the cognate peptide is 3-4 orders of magnitude stronger than to the alternative peptides.

      We respectfully disagree. First, as shown in Fig. 5C TCR#35-13 (cognate peptide HLA-A2-restricted Flu MP 58-66) indeed recognizes the alternative HLA-A2-restricted CMV IE1 184-192 peptide with a 3-4 higher log EC50; yet, the EC50 of this TCR is approx. 10e-6 M, i.e. 1 µM, which remains a physiological concentration. Second, this is not the case for TCR#36-150 (same cognate peptide HLA-A2-restricted Flu MP 58-66), which actually recognizes the alternative HLA-A2-restricted EBV BMLF1 280-288 peptide with a 4-fold lower EC50.

      The only exception is the GAD 114-122 TCR, where the different peptides appear to be closer in binding strength. But in this case, the authors state that they "analyzed their response to a set of peptides comprising their cognate peptide and peptides with no significant structural commonalities, selected by testing combinatorial peptide libraries". If the competitor peptides came from peptide library screening then the observation of strong binding to alternative peptides does not seem as surprising as a TCR that binds well to a Flu peptide, say, and also a CMV peptide, selected from a smallish set of possibilities.

      As explained above, this TCR does not stand as an exception compared to Flu-reactive TCRs. Moreover, it should be noted that this GAD 114-122 TCR recognizes its cognate peptide in a similar or even lower concentration range compared to the Flu-reactive TCR #36-150. It should also be pointed out that, contrary to the Flu-reactive TCRs, here we did not have any reference dextramer binding data to guide our peptide selection, which is why we resorted to combinatorial peptide libraries. Thus, although different strategies were used, peptide selection was “guided” in both instances.

      It is pretty well established that TCRs are cross-reactive, both for nearby peptides and also for sequence-dissimilar peptides.

      We agree and had notably quoted the landmark paper by Don Mason estimating that each TCR may respond to over 106 different peptides from an estimated repertoire of > 1010 peptides. Based on the Don Mason estimate of cross reactivity, the chance to find a cross reactive peptide at random would be around 10-4.

      Here, we just tested a few peptides from different viruses. If Don Mason’s estimates are correct, for a given TCR, the chance to find even just 1 cross-reactive peptide among these few peptides would be at most of 10-3, the chance to find 2 cross reactive peptides would be of 10-6 and that to find 3 or more cross reactive peptides would have be infinitesimal.

      Thus, if the polyreactivity that we described is part of this general cross reactivity, our results are at least highlight a major previously unreported bias in the selection of these cells.

      The question is whether widespread, functionally relevant (not just dextramer binding at some concentration) poly-reactivity to diverse viral peptides is a defining feature of a large fraction of the TCR repertoire. The paper does not appear to present sufficiently strong evidence to support this claim.

      We agree with the reviewer that more work is needed to “fully” appreciate the role of polyreactive cells!

    1. Author Response

      Reviewer #2 (Public Review):

      This paper reports a novel measure of biological age derived from machine-learning analysis of retinal imaging data with chronological age as the criterion measure. The resulting algorithm is impressive. Not only can the retinal image data accurately predict chronological age in the training data and record changes over short time intervals, but it also proves accurate in independent test data and appears to contain information related to mortality risk. In addition, the authors report a GWAS of the new measure.

      I would like to see a bit more validation data in the UKB - how does EyeAge relate to (a) tests of visual acuity - e.g. does it explain aging-related differences?

      We have extended the supplemental tables and figures (Supplementary table 5 and Figure 3- figure supplement 2) to show additional adjustments to the hazard ratios using visual acuity.

      (b) measures of morbidity and disability - e.g. how is EyeAge Accel associated with at least some of the counts of chronic diseases, self-reported physical limitations, tests of physical performance, measures of fluid intelligence?

      We felt that all-cause mortality is the most clear outcome to test against, as other outcomes were not available for all participants or would require domain-specific knowledge to properly incorporate which we felt was out of scope. Given this, we have added this limitation to the discussion:

      “This study has several limitations. First, further work will be needed to assess whether eyeAgeAccel is correlated with other important health outcomes and measures.“

      But overall, this is a very strong report of an exciting new biomarker of aging. It was unclear to me whether the algorithm to compute the measure would be publicly available. The authors should clarify.

      Code for both training and evaluation of eyeAge from fundus images is available by minimally modifying open-source software we previously released under the permissive BSD 3-clause license. We have added the following “Code availability” section to the paper:

      “To develop the eyeAge model we used the TensorFlow deep learning framework, available at https://www.tensorflow.org. Code for both training and evaluation of chronological age from fundus images is open-source and freely available as a minor modification (https://gist.github.com/cmclean/a7e01b916f07955b2693112dcd3edb60) of our previously published repository for fundus model training57.”

    1. Author Response

      Reviewer #1 (Public Review):

      This is a very interesting paper trying to quantify excess deaths due to the COVID-19 pandemic in the USA. The paper is roughly divided into two main sections. In the first section, the authors estimate age and cause-specific excess mortality. In the second section, using their excess mortality estimates, the authors attempt to disentangle the impact of SARS-CoV-2 infection (direct impact) vs. the impact of NPIs on this excess mortality (indirect impact). I have some concerns, particularly with respect to the second section.

      The model used to estimate excess mortality is quite clear. The authors adjust the baseline model to account for low influenza circulation (and deaths) during the COVID-19 pandemic, to avoid underestimating the number of deaths caused by COVID-19. While this makes sense if the authors are trying to estimate the total number of deaths caused by COVID-19, I'm not sure it needs to be accounted for if the authors want to estimate excess/added deaths. A counterfactual scenario would've included influenza. It also raises the question of whether (conceptually) they should be adjusting for other causes of deaths that may have also decreased during the pandemic. The authors briefly acknowledge this in the discussion ("we can't account for changes in baseline respiratory mortality due to depressed circulation of endemic pathogens other than influenza") but my comment goes beyond respiratory diseases. Analyses of excess mortality from other settings have suggested, for example, decreased deaths due to fewer traffic accidents (not in the US) or due to decreased air pollution, and not accounting for these would also lead to an underestimate of the total deaths caused by COVID-19. I understand that it is not feasible to account for all potential factors, so I wonder if they should focus on reporting excess deaths as compared to a counterfactual with influenza.

      Thanks. We think it is helpful to “single out” influenza as it causes major fluctuations in mortality from multiple causes in regular years and is a useful reference to contrast the pandemic impact. But the reviewer’s point is well taken. We have clarified our assumptions about the meaning of the baseline in this analysis (methods p 5), discussed the depressed circulation of other pathogens in depth, and mentioned air pollution (p 12-13). We have also slightly reworked our comparison between COVID19 and influenza so that excess mortality estimates are comparable and now cover periods of the same duration (Nov 2017-Mar 2018 for flu and Nov 2020-Mar 2021 for COVID19, see Figure S11).

      The second section, trying to estimate direct vs. indirect effects is also very interesting. However, more details are required about the regression model used and, importantly, what the assumptions and limitations of the approach are. Specifically:

      • Please provide a bit more information on the regression used for direct vs. indirect effects. I'd like to see explicit discussion of the assumptions and limitations of the approach but also of the stringency index used. Does this model include an intercept? Was the association between stringency index and excess deaths assumed to be linear? Or were different functional forms considered? It is also not clear how well the model fits the data.

      Thanks for these comments which helped us improve this section. We have provided more details about the stringency index in methods (it captures the “sum” of interventions), described the model in methods and supplement, and discussed limitations in caveats section, especially regarding effectiveness of these interventions (p13). We had tried different linear models with and without intercepts but elected to use models with intercepts so as not to overly constrain the relationship between interventions, COVID19 activity and excess mortality. These models also incorporate lags in the predictors that are determined by cross-correlation analysis (as detailed in supplement). In the revised version, we now use gam models, where the relationships between excess mortality and predictors do not have to be linear. We can do so since we were able to add several weeks of data (the regression is now based on 96 pandemic weeks from March 1, 2020 to January 1, 2022). The models are described in detail in supplement p 4-5, and we now specify that they have intercepts. We have also provided additional plots of model fits in main text and supplement (Figures 4 and S16-19).

      • Related to the above, please provide more details on how the results of the regressions were translated into the results presented. The main text reports percentages, but the methods only briefly explain how numbers of direct deaths were calculated, and the supplementary tables report coefficients. It is not clear if these estimates of direct and indirect deaths were somehow constrained to add up to the total number of excess deaths, but it doesn't seem like it since point estimates cross 100% in some cases.

      As discussed in response to one of the editor’s questions, estimates are not constrained to 100%. We have provided more details in the supplement on how we estimate the direct impact of the pandemic. Briefly, we calculate expected deaths in the gam model with all predictors set to their observed values and again with the COVID19 predictor to zero. The direct impact is the difference between the two predictions, divided by the predictions of the full model.

      We note that while some of the estimates derived from gam model exceed 100% (and are similar to the linear model estimates presented in the initial analysis, before revision), these estimates echo the findings from a more empirical analysis, in which we compare all-cause excess deaths with official COVID19 deaths tallies. There, in the two oldest age groups, we find more official COVID19 deaths than estimated by the excess mortality models. Hence both analyses point to an underestimation of the direct burden of COVID19 by the excess mortality approach, specific to the oldest age groups. We return to this point in depth in the discussion (p 12-13) and consider the possible effects of harvesting, depressed circulation of non-SARS pathogens, and inaccurate coding of official statistics (as pointed by reviewer #3).

      • Please discuss the potential limitations of using the stringency index to quantify NPIs.

      Several limitations have been added to caveats (p 13); major issues include aggregation of multiple interventions into a single index, which does not consider the actual implementation nor the effect of interventions. The index is solely based on mandates in place in different locations and time periods. We also assume that the effectiveness of these interventions, for a given level of stringency, does not change over time.

      • When estimating direct and indirect effects, the paper assumes that the estimated parameter is time-invariant? Indirect effects might have changed over the course of the epidemic by factors not necessarily captured by the stringency index used, particularly since the index doesn't take into account the implementation of the measures. Have the authors tested this assumption?

      This is an interesting point, which we have explored further. The non-linear relationships we find between NPIs and chronic condition excess mortality may suggest that the reviewer is right. We discuss the role of NPIs in the results section much more deeply than we were previously (bottom of p8).

      “At lower levels of interventions (Oxford index between 0 and 50), representing the early stages of the lockdown in March 2020, excess mortality rose with interventions. Later in the pandemic, increased interventions were estimated to have a beneficial effect on excess mortality, driven by comparison between the period when interventions were strengthened in response to increasing COVID19 activity in late 2020 (Oxford index above 60) to the period when interventions were relaxed in 2021 (Oxford index between 50 and 60).”

      We cannot run an analysis over different time windows because NPI and time are highly conflated (for instance NPI rise from 0-50% in the very early part of the lockdown period, and then stays above 50% for the rest of the study, so we cannot compare the effect of a 25% level in 2020 and 2021). We have added this limitation in the caveat section p.13.

      • The authors state "In contrast, the indirect impact of the pandemic measured by the intervention term was highest in youngest age groups, decreased with age, and lost significance in individuals above 65 years" - I'm not entirely sure of where this statement comes from? For example Table S3 suggests that the indirect effect (multivariate or univariate) is higher in 25-64 yo than in <25s? The same table also suggests negative impacts (protective effects?) in >75s in the multivariate model. Please clarify.

      There are fewer deaths in the under 25 yo so this is why the coefficients were lower overall in table S3. Yet we find that the proportion of variance explained by interventions is higher in the under 25 yrs than in 25-44 yrs.

      We have now changed our modeling strategy to use gam so Table S3 is no longer relevant but the main conclusion that interventions explain a larger relative portion of excess mortality in the under 25 yrs than in the other age groups, and than other covariates, remains valid. The NPI term is now significant is in all groups (although the relative contribution of NPI still declines with age, as in the prior analysis), so we have rephrased this sentence: “In contrast, the relative contribution of indirect effects, via the intervention variable, was highest in youngest age groups and decreased with age”.

      • How do the authors interpret "Percents of excess deaths" over 100%? Similarly, I don't fully understand how to interpret "The upper bound of the 95% confidence interval for heart diseases was above 100% (158%), suggesting that for every excess death from heart disease estimated by our model, up to 1.58 death from heart disease could be directly linked to SARS-CoV-2 infection.

      We have rephrased this section although the overall conclusions remain unchanged. GAM estimates of the direct COVID 19 impact is statistically significantly above 100% in the 85 yo and over, suggesting that our excess mortality approach is too conservative and does not estimate enough COVID19 excess deaths in this age group. We draw a similar conclusion from a more empirical analysis, in which we compare all-cause excess death estimates with official COVID19 deaths tallies. In this analysis, we find more official COVID19 deaths than estimated by the excess mortality models in the two oldest age groups (point estimates above 100% in the 75-84 and 85+ yrs). Hence both analyses point to an underestimation of the direct burden of COVID19 in the oldest age groups by excess mortality approaches.

      Rephrased results section bottom of p.9: “We estimate that the direct contribution of COVID-19 to excess mortality increases with age, from negative and non-statistically significant in individuals under 25 yrs to over 100% in those over 85 years, echoing the gradient seen in official statistics (Table 4). It is also worth noting that our excess mortality estimates may be too conservative (too high) as we did not account for missed circulation of endemic pathogens. This could explain why our estimates of direct COVID-19 contribution exceed 100% in the oldest age group.“

      We return to this point in depth in the discussion and consider the possible effects of harvesting and depressed circulation of non SARS pathogens (p 12-13).

      • Table 3: The signs of the point estimate vs CI for vehicle accidents are inconsistent.

      Thanks, this was a typo. It should have been 4300 (-700, 9300) excess deaths from accidents. This has been updated with more recent data.

      Reviewer #3 (Public Review):

      Authors examine mortality data in the US and use time-series approaches to estimate excess mortality during the COVID-19 pandemic.

      Major comments:

      I would encourage authors to discuss the two different concepts of excess mortality:

      (#1) what deaths were caused, directly or indirectly, by the pandemic. This is what the authors have aimed to assess, and I have no major concerns with the methodology

      (#2) how many additional deaths occurred during the pandemic, compared to what would have been expected in the absence of a pandemic. For such an analysis I think expected annual influenza deaths should be added back to the baseline (or subtracted from the excess)? Some of the discussion seems to relate more to an impression of #2 rather than #1 but I would be interested in the authors' thoughts.

      We have added more details about the approach, in particular why we think that #1 is the proper analysis here (see methods p 5). Given the sheer magnitude of COVID19 excess deaths (over 1 million excess deaths at the end of our study), adding back influenza deaths (up to 52,000 deaths in a recent severe season with a mismatched vaccine, as in 2017-18) would not make a large difference. We have also provided a more direct comparison of the impact of influenza and COVID19.

      1. Authors estimate fewer excess COVID deaths in the elderly than there were confirmed deaths (Table 3). Could this be an indication of some confirmed deaths being "deaths with COVID" rather than "deaths from COVID"? I'm not sure how to interpret the %s in the final column when they exceed 100%. The authors suggested a harvesting effect but I would suggest "deaths with COVID" might be a more likely explanation? This issue can be a limitation of confirmed-death data.

      This is a good point. We have added a comment along these lines in discussion in the middle of p 12. Still, we think harvesting and/or the depressed circulation of endemic pathogens, which would have inflated our baseline, are more likely explanations for these findings. This is because we find similar estimates (exceeding 100%) in gam models that ignore official statistics and rely on COVID19 case data, or COVID19 hospital occupancy data, and this suggests that other mechanisms, beyond coding of official mortality statistics, are at play.

      Yet, as more detailed official statistics become available, a tabulation of confirmed deaths by presence of a primary vs secondary COVID (U07) code may be revealing and get more directly at the reviewer’s question.

    1. Author Response*

      Reviewer #1 (Public Review):

      ARL3 is a small GTPase that localizes to the primary cilium and plays a role in regulating the localization of some specific ciliary membrane proteins, including PDEδ and NPHP3. Mutations in this gene cause Joubert syndrome, a type of ciliopathy characterized by cerebellar malformation, and retinal degeneration. While the majority of the diseases occur in an autosomal recessive manner, two mutations in ARL3 (D67V and Y90C) have been reported to cause autosomal dominant retinal diseases. In the current paper, Travis et al. sought to understand the pathogenesis of the diseases caused by the two autosomal dominant mutations. They found that D67V acts as a constitutive active mutation, whereas Y90C is a fast-cycling mutant, which can be activated in a guanine nucleotide exchange factor (GEF) independent manner. Since the fast-cycle mutant did not bind to the effector proteins in vitro (likely because the guanine nucleotide falls off from the mutant ARL3, which has a lower affinity to GDP/GTP), they developed a method to snapshot the interaction between ARL3 and its effector. Using this method, they showed that the Y90C mutant indeed has increased interaction with the effectors, suggesting that Y90C is an overactive form of ARL3. They then addressed how photoreceptor cells are affected by these two mutations using a mouse model and found that the mutations disrupt the proper migration of the photoreceptor cells.

      Strengths:

      • The paper is well written, and it was easy to understand what the authors did from the figure legends and the methods section.

      • It was easy to find out what is known or unknown, as the paper has accurate references.

      • The authors developed a method to analyze a snapshot of the interaction between ARL3 and its interactors.

      • The paper has an in vivo model and connects the biochemical characteristics of ARL3 to in vivo cellular phenotypes.

      Weaknesses:

      (1) I understand that authors focused on nuclear migration defect as the phenotype was first described in ARL3-Q71L transgenic mice. The similar phenotype observed in RP2 knockout mice further supports the idea that the defect is caused by the hyperactivation of ARL3. Indeed, the defect is not reported in the ARL3 knockout mice, however, I feel that it does not necessarily mean that the defect is not caused by loss of function. Although it has not been assessed, ARL3 knockout mice might have the same defect. Therefore, I think analyzing both the migration defect and trafficking defect would be more informative, rather than focusing on the migration defect. The fact that the relationship between nuclear migration defect and the retinal degeneration phenotype is not entirely clear further enhances the importance of analyzing the trafficking defect.

      Does the expression of ARL3-Y90C also cause the trafficking defect? If it is the case, you can separate the nuclear migration phenotype from the one caused by the trafficking defect. Would the expression of lipidated cargo(s) rescue the trafficking defect as well?

      I think many questions can be addressed by analyzing the localization of the lipidated cargos, such as PDEδ and GRK1.

      The effect of Arl3-Y90C expression on trafficking of lipidated cargos is an interesting question. Previous papers showed mislocalization of lipidated outer segment proteins in Arl3-KO rods and down-regulation or subtle mislocalization in Arl3-Q71L overexpressing rods. So, this was one of the first things we investigated; however, we never observed mislocalization of ciliary or outer segment lipidated cargos (i.e. GRK1, transducin, Rab28, and PDE) in wild type mature rods that were overexpressing Arl3 mutants, and many were tested. It was through these experiments that we first identified the pronounced nuclear migration defect. Rod photoreceptor nuclear migration is a developmental process that is completed by P10, so Arl3-Y90C overexpression is causing a developmental defect. When rods are positioning their nuclei in the ONL, they are still “immature” as their primary cilium has not begun to elaborate disc membranes for light capture. All our analysis was performed in mature rods, so it is not surprising that we did not observe any lipidated trafficking defects at this timepoint. Since the developmental timing of the nuclear migration defect is important for our manuscript, we have added this to our introduction. Additionally, we use “immature” photoreceptors for the cartoon diagrams showing how Arl3 activity is altered by different mutation and rescue experiments, since formation of the mature outer segment occurs post-migration.

      (2) I am not quite sure if the nuclear migration was assessed properly. Based on the pictures in Fig.1, some of the FLAG-negative cells also seem to be migrating to INL (please see Fig.1C and Fig.1D). Is this biologically normal during development? Could this analysis be affected by the thickness of OPL, the layer between ONL and INL? Also, the picture is cut out in the middle of INL. Could authors include more layers, such as IPL, of the retina in the picture, so that we can evaluate INL and OPL better? Taking this into account, I think it is worth measuring the nuclear position of FLAG-negative cells as a negative control in all the experiments.

      Our electroporation technique results in a small population of rods that express our constructs of interest (~5-15% with a patch). All the experiments were performed in wild type retina which develop normal retinal layers, so analysis of the nuclear position of FLAG-positive cells with the retina is cell autonomous. Migration defects are assessed by differences in the skew of FLAG-labeled rods relative to the boundaries of the wild type ONL, which is marked by Hoechst nuclear stain (also a measure of the FLAG-negative rods). Wild type photoreceptors nuclei are not found within the INL, the nuclei in that layer belong to either horizontal cells or bipolar cells both of which are not targeted by our electroporation approach. As a control, we show that when wild type Arl3-FLAG was expressed FLAG-labeled rods were never observed within the INL. We have now included the % of displaced nuclei in Table 1.

      (3) The way that the authors showed the Y90C mutant of ARL3 is a fast-cycling mutant is not very compelling. In Figure 2C, the authors showed that ARL3 Y90C can bind to PDEδ, its effector, once it is pre-loaded with GTP. The authors also showed that the mutant can bind to its effector even without EDTA as long as an excess amount of GTP is added. The authors used endogenous ARL3 as a control to compare the effects between wild-type and mutants. I see that this experiment has multiple pitfalls. First, ideally, this type of experiment needs to be done with a purified protein using fluorescent guanine nucleotide/radioactive guanine nucleotide (e.g. nucleotide loading assay or nucleotide exchange assay) to directly access the kinetics of nucleotide exchange. However, I do understand that this is out of the authors' expertise. In the authors' experimental setting, I am not sure loading the protein with GTP in the presence of the EDTA means anything more than confirming that the protein is intact. Theoretically, wild-type and a fast-cycling mutant can load GTP with similar efficiency in the presence of EDTA. Then during immuno-precipitation, GTP falls off from the Y90C mutant faster than wild-type (because a fast-cycling mutant theoretically has a lower affinity to guanine nucleotides), assuming that GTP was not added during immuno-precipitation (GTP addition was not mentioned in the method, but could authors confirm this?). But in this case, the kinetic of GTP dissociation can be affected by many factors, including the presence of GAP in the reaction, the dissociation constant of Y90C, the volume of the buffer used, and the number of washing steps. Thus, it is not very easy to estimate the difference between wild-type and Y90C. Besides, using endogenous ARL3 rather than ARL3-wild type FLAG as a control can be dangerous. I have experienced that a tagged protein is cleaved to a protein that has a similar size to endogenous protein. (I expressed GFP-protein X in knockout cells lacking protein X, and saw the band at the position where the endogenous protein is observed in wild-type cells). So, the endogenous band that the authors showed could come from the cleaved FLAG-Arl3. (Authors can easily confirm this by having wild-type not expressing FLAG-tagged ARL3, though).

      An alternative experiment that I would suggest is doing immuno-precipitation in the buffer containing: 1) no guanine nucleotide, 2) 10mM GDP, or 3) 10mM GTP in the cells expressing the following protein: 1) ARL3 wild-type FLAG, 2) ARL3 Y90C FLAG, or 3) ARL3 D129N FLAG. 10mM guanine nucleotide should be added throughout the process including washing. This experiment might also be affected by many factors, but variability should be lower than the experiment presented in Fig 2C. ARL3-wild type FLAG is also a better control here than endogenous protein.

      Variability due to the factors you mention is a concern, but we were able to repeatedly obtain the same results using our method—admittedly our method is testing whether the mutated Arl3 can exchange under a certain condition more than exactly how. We know that we are not providing precise kinetics or elucidating the underlying mechanism for how these mutations lead to what we are calling fast cycling. While that information is important, it is outside the scope of this paper.

      As you mention, an important conclusion from the PDEδ binding experiments is that we confirm the Arl3-Y90C protein is intact by showing it can indeed bind nucleotide as long as there is an excess of GTP (Fig 2B. The interesting finding from these experiments is that Arl3-Y90C binds GTP even in the presence of magnesium, a behavior not observed for wild type Arl3. We feel that showing that endogenous Arl3 is not activated in the presence magnesium in each of our preparations is a lovely internal control. However, we agree that showing wild type Arl3-FLAG in these assays is an important negative control and have now included this blot as Fig 2-Sup Fig 1.

      (4) In Fig.3, the authors attempted to take a snapshot of the interaction between ARL3 and multiple effector proteins. The three bands that were enriched in the Q71L cells were found as RP2, UNC119, and BART by mass spec (Fig.3B). These bands were used as a readout for the subsequent experiments. I am not quite sure why the authors used this approach rather than using the cell line that expresses both FLAG-ARL3 and GFP tagged protein of interest, just like what the authors did in Fig3G. The reasons why I prefer the latter approach are the following: FLAG bands that correspond to the three proteins (RP2, UNC119, and BART) in wild-type cells are very close to the detection limit, 2) authors failed to confirm that the lowest band actually comes from BART, 3) authors cannot access some important effector proteins, such as PDEδ because 293 cells might not express them. All of the problems can be solved by using the approach that was taken in Figur 3G.

      If the authors chose the former approach because of some specific reason, I would appreciate it if the authors could explain that in the main text of the paper.

      In vitro crosslinking experiments were performed to test whether overexpression of Arl3 mutants resulted in an active cellular Arl3 without artificially changing any components of the GTPase cycle. We feel these experiments are highly elegant as they allow us to take a snapshot of native Arl3 activities without compromising the analysis by artificially altering GAP/GEF/effector interactions through overexpression or during lysis (as we show that the concentration of GTP/Mg could alter interactions in Fig 2). While AD293T cells are not rod photoreceptors, we are able to use this system to better understand how the Arl3 mutants alter the level of activity within the cell. Yes, this experimental assay is novel, but we confirmed the identity of the effectors by Western and mass spec, used positive and negative controls in each experiment, and show that the method is highly reproducible. We agree with Reviewers 2 and 3 that using this method to study the cellular activity of fast cycling Arl3 mutants is a strength of our paper.

      (5) ALR3 Y90C causes nuclear migration defect. Given that Y90C is a fast-cycling mutant (hyperactive) and has a high affinity to ARL13B, the nuclear migration defect might come from either the increased activity of ARL3 or sequestration of ARL13B, which can act as a GEF for ARL3 but potentially have other functions. If my understanding is correct, the authors concluded that the defect caused by ARL3-Y90C is likely due to hyper-activation of the protein, as Y90C/T31N mutant, which cannot bind to effectors but still retains the ability to capture ARL13B, did not cause migration defect. But I am a little confused by the fact that Y90C/R149H, which is unable to bind to ARL13B (Fig.2C) but still retains the ability to interact with the effectors (Fig.3F), did not have migration defect (Fig.7B). Wouldn't this mean that the sequestration of ARL13B could contribute to the phenotype?

      If my understanding is correct, the authors are trying to say that both hyper-activation of cytosolic ARL3 and the defect in endogenous ARL3 activation in cilium is necessary to cause migration defect. I am not very convinced by this hypothesis, and still think that the defect could be caused by sequestration of ARL13B to the cytoplasm.

      Then why Y90C/T31N did not cause the defect even though they can sequester ARL13B? This might be explained by the localization of the ARL13B mutants. If Y90C can localize to the cilium while the double mutant, Y90C/T31N, does not, then only Y90C might be able to inhibit the ARL13B function in the cilium. This could explain the lack of the defect in the cells expressing Y90C/T31N.

      It would be helpful to understand how exactly the fast-cycling mutant causes the defect if the authors can provide more information, including localization of ARL3 (wild-type and mutants) as well as key proteins, such as ARL13B and the effector proteins. Assessing ARL13B defect seems to be particularly important to me because ARL13B deficiency has been connected to neuronal migration defect (Higginbotham et al., 2012)

      What I am trying to say here is that how the defect is caused is likely very complex. So, providing more information without sticking to one specific hypothesis might be important for readers/authors to accurately interpret the data.

      Our data shows that for the fast cycling Arl3-Y90C mutation both features: blocking endogenous Arl3 activation in the cilium (through Arl13B binding) and increasing activity of Arl3-Y90C in the cell body are required to produce a nuclear migration defect. We find that we can rescue migration defects by either restoring activation in the cilium or reducing GTP activity outside the cilium. As long as there is more Arl3-GTP activity in the cilium, then the rod can handle aberrant Arl3-GTP activity in the cell body. The Y90C/R149H was a critical result that led to our hypothesis that there is a gradient between the two compartments that is used for proper migration. One interesting point is that absence of any activity does not produce the migration phenotype, further suggesting that an imbalance in the gradient is important.

      We performed new experiments to investigate whether Arl3-Y90C is sequestering Arl13B away from the cilium but found that localization of Arl13B (both endogenous and overexpressed) is not altered by expression of Arl3-Y90C – see Fig 3-SupFig 1-2.

      It is an interesting question as to how different Arl3-FLAG constructs are localized within the photoreceptor. Sadly, we did not analyze the data in a way that would allow us to draw any conclusion about the localization of different Arl3-FLAG constructs. In general, we observed FLAG localization throughout the photoreceptor cell and focused our imaging on the FLAG staining around the nucleus so we could further analyze ONL position. Looking back through our images, some of mutants might have a more prominent localization within a specific subcellular compartment (e.g. Arl3-D67V is more prominent in the inner segment than outer segment and Arl3-Y90C appears to have dominant outer segment localization). Likely, these differences represent each mutant binding a particular effector: D67V to RP2 and Y90C to Arl13B, which we show biochemically. Ideally, Arl3 mutant localization would be analyzed during development to provide a more direct link to the nuclear migration defect, a future direction for our lab. We have updated our manuscript to be more transparent about the potential differences in rod localization of Arl3 mutants.

      (6) The rescue experiments that the authors presented in Fig.5-6 are striking and would build a base for future therapy of the diseases caused by ARL3 defects. However, I believe more examinations are needed to accurately interpret the data. The authors did this rescue experiment by co-injecting ARL3-FLAG and chaperons/cargos if I understand the method section correctly. But I feel we can interpret this data correctly only when ARL3-FLAG and chaperons/cargos are co-expressed in the same cells. I think a better way to analyze the data might be by comparing the nuclear migration phenotype between ARL3-FLAG only and ARL3-FLAG;chaperons/cargos double-positive cells.

      Our lab has found that the initial estimates by the Cepko Lab that co-injection of two plasmids results in above 90% of rods expressing both proteins is accurate (see reference Matsuda and Cepko PNAS 2004). Since we only assess nuclear position of FLAG-labeled rods, it is true that a small percentage of cells in this analysis express the Arl3-FLAG mutant and not the chaperone/cargo; however, inclusion of these cells really only bolsters our findings as complete rescue would likely be even more robust than measured.

      Reviewer #2 (Public Review):

      The small GTPase Arl3 (Arf-like 3) is a well-characterized component of primary cilia, including the outer segment of photoreceptors, which contain specialized cilia. Arl3 is critical for the import of multiple lipid-modified proteins into cilia that are vital to ciliary function. Human mutations in Arl3 are reported to cause both autosomal recessive and dominant inherited retinal dystrophies, but the mechanisms through which these mutations disrupt photoreceptor development are not known. Here the authors show that two dominant Arl3 mutants, Arl3-D67V and Arl3-Y90C exhibit increased activity, but for different reasons. Arl3-D67V is constitutively active (unable to hydrolyze GTP), whereas Arl3-Y90C is a classic rapid-cycling mutant, able to bind GTP spontaneously (independent of its guanine nucleotide exchange factor Arl13) but still able to complete the GTPase cycle by hydrolyzing GTP. Expression of either mutant in developing murine retinas results in a nuclear migration defect, specifically aberrant localization of rod nuclei to the inner rather than outer nuclear layer. In this sense, they phenocopy another well-characterized constitutively active mutant, Arl3-Q71L. Normal nuclear distribution could be restored by overexpression of Arl3 effectors, suggesting that the active mutants disrupt nuclear migration, at least in part, by sequestering Arl3 effectors.

      While the data are reasonably clear and convincing, there are several instances where the conclusions drawn are either confusing or problematic. Specifically:

      1) Although retinal rod cells are ciliated in their outer segment, the authors never actually examine ciliation here. Their only morphological readout is nuclear migration. How does nuclear migration failure impact ciliogenesis in the outer segment?

      Imaging was performed in mature retinas at P21 after outer segment formation is completed. Electroporation only targets a small population of cells for which we observed normal outer segments structures in all conditions tested — therefore we conclude that ciliogenesis is unaffected. Previous literature has also showed that defects in rod nuclear migration do not affect ciliation of the outer segment.

      2) The Arl3-Y90C mutant seems to act physiologically more like a dominant-negative than an activated mutant. A second mutation in Y90C (R149H) that blocks binding to the GEF Arl13 abrogates the nuclear migration defect, suggesting that Y90C is preventing activation of endogenous Arl3 by sequestering the GEF. Yet overexpression of effectors or cargos still rescues nuclear migration in the presence of Y90C, suggesting that it also sequesters effectors. How do the authors explain this?

      We agree with this interpretation. We have now included language about Arl3-Y90C’s role as a dominant negative in that it blocks Arl13B activity. The interesting caveat to this black and white usage is that blocking Arl13B would suggest a reduction in endogenous Arl3 activity in rods (which we find to be true, see Fig 5A). However, the migration defect phenotype mimics overly active Arl3 (Arl3-Q71L) and not a loss of function in Arl3 (Arl3-T31N). Using in vivo crosslinking experiments, we show that the fast cycling nature of Arl3-Y90C also causes GEF-independent activation of Arl3 (Fig 4D-E) that leads to the migration defect. Our rescue data shows that only the combination of both effects – reduced Arl3 activity in the cilium and GEF-independent Arl3 activation outside the cilium - is enough to disrupt the ciliary gradient and produce the migration defect.

      3) Fig. 1 suggests that an Arl3-T31N mutant has no phenotype. This is a canonical mutation in small GTPases that typically renders them dominant negative. The lack of phenotype is surprising since most dominant-negative mutants act by sequestering their GEFs, thereby preventing activation of the endogenous GTPase. Fig. 2C suggests that this may not be the case for Arl3-T31N, which binds Arl13 only weakly. Some of this confusion may arise from the fact that Arl13 is not a typical GEF. It is very unusual for one GTPase to directly promote nucleotide exchange on another. Does Arl3-T31N affect ciliation in the rod outer segment, or in other ciliated cells? Some discussion of this point is warranted here.

      Our paper finds that Arl3 mutants must produce an aberrant activity outside the cilium, whether through constitutive activity (seen for D67V and Q71L) or fast cycling (seen for Y90C and D129N) to cause the migration defect. Since T31N does not cause excess Arl3 activity in cells (see Fig 4) even if it does have some dominant negative activity toward Arl13B, then it is still not enough to cause the migration phenotype. This was directly tested in Fig 5, where we increase T31N binding to Arl13B by introducing Y90C/T31N and still do not see migration defect. Our results are also in line with a previous study showing that despite rapid photoreceptor degeneration in a retina-specific conditional Arl3 knockout mouse the outer segments were initially formed, in contrast the retina-specific conditional Arl13B knockout mouse did disrupt photoreceptor ciliogenesis leading to a more rapid degeneration (Hanke-Gogokhia, JBC 2017). Since complete loss of Arl3 activity did not disrupt ciliogenesis, it is unlikely that expression of Arl3-T31N in wild type retinas would alter outer segment formation, and we observed that outer segments formed in all Arl3 mutants.

      4) Oddly, Arl3-Y90C does robustly bind Arl13 (Fig. 2C), while at the same time binding to effectors (Fig. 3D/E), although less strongly than the canonical Q71L constitutively active mutant (Fig. 2A). As noted in point #2, the Y90C/R149H double mutant, which fails to bind Arl13, abrogates the nuclear migration defect observed with Y90C alone. Although the authors refer to Y90C as "rapid cycling" its phenotype is more similar to a dominant-negative than an activated mutant.

      We agree with this interpretation. We have now included language about Arl3-Y90C’s role as a dominant negative in that it blocks Arl13B activity. However, the rapid cycling behavior is important to cause the phenotype.

      5) The authors also mention that loss of Arl3 has no phenotype in their assay, however, Arl3 knockout mice exhibit severe retinal degeneration. How do they explain this?

      Our study finds that not all human Arl3 mutations will target the same cellular process even though they all result in degeneration. Arl3 knockout mice show drastic alterations in lipidated protein trafficking to the rod outer segment in mature retinas, a phenotype that we did not observe by expressing the dominant Arl3 mutants in wild type rods. Since our tools are not designed to study degeneration of rods, the precise mechanisms of degeneration caused by loss of function or dominant mutations remains to be determined. We outline some ideas in the discussion, but more work needs to be done before making any big statements regarding this. We hope that our manuscript will inspire clinicians to take a closer look at human patients to determine if there are subtle differences between disease presentation for dominant and recessive forms Arl3 inherited mutations. This is beyond the scope of our expertise.

      Reviewer #3 (Public Review):

      This work provides mechanistic insights into two recently described dominant variants of Arl3, a small GTPase, namely mutations D67V and Y90C. The authors identified a phenotype of these dominant variants during the development of rod photoreceptors by in vivo experiments in mice. They specifically observed a defect in rod nuclear migration to their final outer nuclear layer. This phenotype has been previously observed in another constitutively active variant of Arl3, Q71L. The authors performed a series of extensive and thorough biochemical assays to clarify the mode of action of these variants, mostly the Y90C variant, comparing the behavior of these variants to previously described mutants and combining multiple variants by mutagenesis. They also developed a new in vivo crosslinking strategy to be able to identify transient states of protein-protein interactions. They finally performed phenotypic rescue experiments by co-expression of various relevant proteins interacting/involved with Arl3. They finally propose a model based on differential subcellular compartmentalization of Arl3 activation which when disrupted leads to rod nuclei misplacement. These data add to the current understanding of contribution of different Arl3 variants causing human retinal degeneration, which has strong potential translational implications.

      Strengths:

      Relevance of Arl3 dominant variants to human retinal degeneration. Identification of Y90C variant as a "fast cycling" GTPase, and not as a predicted destabilizer of the protein structure.

      New method of crosslinking to enable snapshots of endogenous protein-protein interactions.

      Weaknesses:

      • The relevance of this study is justified by the fact that newly identified dominant variants of Arl3 have been associated to retinal degeneration. However, the authors never assess a degeneration phenotype.

      Electroporation technique allows for rapid expression of constructs, but the sparse expression makes it a poor means to study retinal degeneration. This is important to examine in the future using robust genetic mouse models.

      • The authors show new dominant variants of Arl3, namely Y90C and D67V, cause rod nuclear mislocalization. This phenotype is interesting but this was previously observed with other constitutively active mutation of Arl3, Q71L, and therefore is not novel.

      Yes, the Q71L paper is well cited in our manuscript and set the basis for many of our experiments.

      • The main claim of this paper is that subcellular compartmentalization of Alr3 activation to the cilium (the so called gradient by the authors) is required for proper rod nuclear migration to their final outer nuclear layer destination. The authors provide multiple experiments to support this model, but this is never directly demonstrated.

      We are not aware of any methods that could be done to directly show the subcellular localization of active Arl3-GTP within rod photoreceptors. We agree that we have provided many experiments that support our hypothesis that altering the Arl3-GTP gradient between cilium and cell body produces a nuclear migration defect. Some of our favorites include Fig 6, where we find that the migration phenotype is only rescued with expression of ciliary cargos and not rescued by non-ciliary cargos. Also, the new data requested by reviewers showing Arl13B expression in the cilium can restore the Y90C defect further supports that the Arl3 ciliary gradient is necessary for proper nuclear migration.

    1. Author Response

      Reviewer #1 (Public Review):

      Pan et al. examined the role of oligodendroglial exocytosis, and specifically the role of L-type prostaglandin D synthase (LPGDS), in modulating oligodendrocyte differentiation and myelination. The topic of autocrine and paracrine signaling within the oligodendrocyte lineage is under-studied and the authors use a novel approach for oligodendrocyte precursor-specific inhibition of VAMP-mediated exocytosis using inducible expression of botulinum toxin with the PDGRFa-CreER transgenic mouse line (PD:ibot). Using a combination of in vitro culture systems and immunohistological analysis in vivo, the authors find ibot expression in OPCs leads to reduced oligodendrogenesis and myelination, leading to a behavioral deficit in rotarod performance. Additional transcriptomic analysis in PD:ibot mice revealed Ptgds, the gene encoding LPGDS, was significantly overexpressed in both mature oligodendrocytes and OPCs. Further pharmacological experiments with cultured OPCs showed direct LPGDS inhibition led to a similar inhibition of oligodendrogenesis as PD:ibot mice. Together, this study reveals VAMP-mediated exocytosis in OPCs is required for normal oligodendrogenesis and identifies LPGDS as a new chemical regulator of oligodendrocyte myelination. These findings are strengthened by careful characterization of the PD:ibot mouse line and effective use of culture systems and pharmacology to uncover a cellular mechanism. Quantification is performed at several levels of resolution using immunohistochemistry, electron micrography, and protein/transcriptomic analyses and control experiments were largely carefully considered.

      We thank the reviewer for recognizing the strength of our study.

      Despite these strengths, there are some points that need to be further addressed. The interpretation of autocrine/paracrine signaling relies on a critical culture experiment in which PD:ibot OPCs were cultured in the presence of PD:ibot or control OPC well inserts. However, these results had a marginal effect size, raising questions as to the extent to which VAMP inhibition specifically had effects through the blockade of exocytosis (resulting in an autocrine/paracrine signaling deficit) or inhibited oligodendrogenesis in a cell-intrinsic mechanism (e.g. VAMP-dependent trafficking of critical myelination components, such as PLP (Feldmann et al., 2011)).

      We agree with the reviewer that both cell autonomous and cell non-autonomous effects may contribute to the defect associated with VAMP inhibition. We performed additional experiments to investigate the contribution of cell non-autonomous mechanisms. We took advantage of the fact that all OPCs purified from PD:ibot mice were not botulinum-GFP-expressing (efficiency ~65% Figure 6B, page 24). The GFP- cells in PD:ibot OPC cultures did not express botulinum toxin and were competent in exocytosis. We compared the development of GFP- control cells in cultures generated from PD:ibot mice vs. control cells in cultures generated from control mice. Interestingly, we found that the percentages and sizes of lamellar cells in control cells in PD:ibot cultures were smaller than in control cells in control cultures (Figure 6C, D text page 25). Although both groups of cells were competent in exocytosis, they were surrounded by exocytosis-deficient vs. exocytosis-competent neighbor cells. The differences in the growth capacity of control cells in the presence of different neighbor cells reveal cell non-autonomous contributions of botulinum-expressing cells in oligodendrocyte development.

      As described above under Essential Revisions 4), we performed additional experiments on the role of the secreted protein L-PGDS in oligodendrocyte development. We found that adding a protein that inactivates PGD2, HPGD extracellularly to oligodendrocyte cultures inhibited their development (Figure 7F, G, page 33). Adding L-PGDS protein extracellularly to PD:ibot oligodendrocyte cultures rescued their development defect (Figure 9A, B, page 33). Moreover, overexpressing Ptgds in PD:ibot mice partially rescued the myelination defect (Figure 9E-H, page 36). These observations further strengthened our conclusion that cell non-autonomous mechanisms contribute to the effect of botulinum toxin on oligodendrocyte and myelin development.

      Nevertheless, these results do not rule out the cell autonomous effect of botulinum on oligodendrocyte development and, therefore, we included the potential contribution of both cell autonomous and cell non-autonomous mechanisms in the text.

      Additionally, the authors claim the reduced number of oligodendrocytes in PD:ibot mice in vivo is not due to oligodendrocyte apoptosis and provide evidence by cleaved caspase-3 immunostaining of the cerebral cortex. While statistically not significant, the data is highly variable.

      We thank the reviewer for pointing out the variability of the caspase-3 results. We performed a more thorough analysis of activated caspase-3 at multiple developmental stages. Again, we did not find any statistically significant difference in apoptosis between PD:ibot and control oligodendrocytes, OPCs, or cells of other lineages (Figure 3-figure supplement 1, text page 13).

      If true, this would suggest oligodendrocyte differentiation is inhibited, which would coincide with a reduction of OPC proliferation. A complementary experiment comparing the rates of OPC proliferation between control and PD:ibot mice in vivo would provide further clarity on how oligodendrocyte density is being reduced.

      We analyzed OPC proliferation in vivo by staining and quantifying Ki67+PDGFRa+ cells. Intriguingly, we found a modest increase in OPC proliferation in PD:ibot mice (Figure 3-figure supplement 3, text page 14).

      The relevance of these myelination deficits is assessed with a rotarod assay, however, the mice used for these experiments are several times older (2-5 months) than those used for all other histological quantification (P8-P30). The large variance in results could be due to age-related differences in myelination, and it is unclear whether the deficits at early timepoints show a linear progression with age.

      We thank the reviewer for the insightful comment. We have separately labeled data points from 2 months old and 5 months old mice (Figure 3Q-S, text page 17). With the data we have so far (n=20-27 per genotype), there isn’t a striking progression of phenotype with age. Future analysis at multiple time points may resolve any age-dependent changes in the phenotype.

      Reviewer #3 (Public Review):

      The authors pose an important question of whether oligodendrocyte lineage cells have an autocrine/paracrine signaling loop that contributes to their differentiation and myelination. While prior studies have demonstrated oligodendrocyte lineage cells have cell-intrinsic pathways that impact differentiation and myelination, there isn't a strong precedent for oligodendrocytes to promote their own differentiation via autocrine/paracrine mechanisms. The notion that oligodendrocyte lineage cells promote their own differentiation in an autocrine/paracrine manner is an intriguing one that adds a new layer to our understanding of how oligodendrocyte maturation is controlled. I anticipate this paper will prompt a new direction of future investigations to uncover the extent of oligodendrocyte autocrine/paracrine signaling.

      To test the possible role of oligodendrocyte-secreted molecules on oligodendrocyte development, Pan et al. utilized a mouse model where the release of a subset of secretory vesicles (specifically VAMP1/2/3-dependent vesicles) is blocked. Blocking this vesicular release prevented or delayed the differentiation of oligodendrocytes in vivo and in vitro. Further, the authors identified changes to the mRNA and secreted protein levels of prostaglandin D2 synthase (L-PGDS). Prior RNA sequencing and snRNA sequencing datasets of the oligodendrocyte lineage have identified Ptgds as a highly abundant mRNA transcript in oligodendrocyte lineage cells, particularly mature oligodendrocytes. Ptgds encodes L-PGDS, which has an unknown role in oligodendrocyte function. L-PGDS has been shown to regulate Schwann cell myelin formation in the peripheral nervous system, prompting the question of whether this protein acts similarly in the central nervous system. The paper has a clear set of well-rounded experiments, with a few remaining points that would strengthen the conclusions:

      We thank the reviewer for the positive comments on our study.

      One of the foundational conclusions of the study is that VAMP1/2/3-dependent exocytosis is critical to oligodendrocyte maturation, by using a PDGFRa-CreER mouse line combined with iBot mice that express botulinum toxin in Cre-expressing cells (abbreviated as PD:iBot). Prior work has demonstrated in vitro that oligodendrocyte morphological maturation, myelin gene expression and myelin protein transport can all be impacted by the loss of VAMPs, including VAMP3. This paper establishes the importance of these SNARE proteins in the oligodendrocyte lineage in vivo: the number of mature (CC1+) oligodendrocytes and myelin basic protein staining is substantially reduced in PD:iBot mice.

      1) The data in Figure 3M suggests that PD:iBot oligodendrocytes (GFP+) are lacking MBP+ sheaths and that any myelin formed is by the smaller percent of oligodendrocytes that do not express botulinum (GFP- cells). Furthermore, the efficiency of iBot expression (as evaluated by GFP+ cells) shows that 80% of OPCs and just 60% of oligodendrocyte lineage cells express GFP at P8 and supplementary data shows just 30% of oligodendrocyte lineage cells express GFP at P30. This raises the question of whether PD:iBot cells are unable to differentiate and die. While the authors show no change in caspase-dependent apoptosis in PD:iBot cells in vivo and in vitro, the data still suggests that blocking VAMP-dependent exocytosis itself slows or prevents the progression to a fully myelinating oligodendrocyte in vivo rather than the putative autocrine/paracrine signals are required for OPC differentiation. Confirming whether botulinum-expressing cells also contribute to the population of surviving, differentiated oligodendrocytes in vivo to strengthen the conclusions that autocrine/paracrine secreted molecules contribute to the oligodendrocyte maturation in vivo.

      We thank the reviewers for raising a key point in characterizing the consequence of botulinum toxin expression in oligodendrocyte-lineage cells. We analyzed the overlap between GFP+ botulinum-expressing cells and the population of differentiated oligodendrocytes (Olig2+PDGFRa-CC1+ cells) and found that botulinum-expressing cells can survive and become differentiated oligodendrocytes (Figure 3-figure supplement 2, text page 14). Additionally, we performed a more thorough analysis of activated caspase-3+ apoptotic cells than was included in first submission and did not detect statistically significant differences between PD:ibot and control mice (Figure 3-figure supplement 1, text page 13).

      2) The paper has complementary in vitro data to pinpoint a mechanism that results in hindered oligodendrocyte maturation. The authors conduct a well-designed set of in vitro co-culture experiments in Fig4 K-M that led them to conclude oligodendrocyte morphology is impacted by secreted molecules from other oligodendrocytes.

      2a) The key experiment is the transwell co-culture experiment with control and iBot cells, which suggests that blocking secretion itself has the predominant impact on cell morphology: by eye, both group3 and 4 show the largest reduction in lamellar area and the difference between group 3 and 4 is slight. At day 3 of culture (Fig 4E), the authors show the clearest effect as a reduction in cells with lamellar morphology. The quantification of the lamellar cell area is less obvious than the % of cells with arborized vs lamellar shape, as seen in Figures E & F. I would recommend that the authors show representative images of these observations and quantification of morphologies for the transwell experiments. The impact of secreted factors may be clearer with this measure.

      We added representative images (Figure 6G). We quantified both the % and size of lamellar cells. The size of lamellar cells is significantly higher in group 4 than in group 3. Although the % of lamellar cells is numerically higher in group 4 than in group 3, the difference is not statistically significant. To further assess whether cell non-autonomous mechanisms contribute to the oligodendrocyte development defect in PD:ibot mice, we performed additional analysis in culture. We took advantage of the fact that all OPCs purified from PD:ibot mice were not botulinum-GFP-expressing (efficiency ~65% Figure 6B). The GFP- cells in PD:ibot OPC cultures did not express botulinum toxin and were competent in exocytosis. We compared the development of GFP- control cells in cultures generated from PD:ibot mice vs. control cells in cultures generated from control mice. Interestingly, we found that the percentages and sizes of lamellar cells in control cells in PD:ibot cultures is smaller than in control cells in control cultures (Figure 6C, D, text page 25). Although both groups of cells were competent in exocytosis, they were surrounded by exocytosis-deficient vs. exocytosis-competent neighbor cells. The differences in the growth capacity of control cells in the presence of different neighbor cells reveal cell non-autonomous contributions of botulinum-expressing cells in oligodendrocyte development.

      2b) On a related note, the cell morphology data is dependent on MBP staining. The authors show that MBP protein is reduced in cells from iBot mice. Since MBP+ cell area/arborized or lamellar structure is being quantified, there remains the possibility that the cells could display a more complex morphology (lamellar) that may be missed by only staining for MBP. The authors use a CellMask dye to show cellular morphology, which is a great idea. The authors state that it labels the plasma membrane; however, the methods (and images) indicate that a cytoplasmic CellMask was used (cat.no. H32720 labels nuclei and cytoplasm, not membranes). These conclusions about cell morphology vs simply MBP expression would be strengthened by an alternative membrane label (e.g., a CellMask plasma membrane dye).

      We thank the reviewers for the insightful suggestion. We used the membrane version of CellMask and repeated the transwell co-culture experiment. The new results are consistent with the results based on MBP (Figure 6-figure supplement 1, text page 23). In addition, we used the membrane version of CellMask for all the new cell culture experiments (L-PGDS rescue, HPGD etc.)

      3) The authors sought to identify what secreted factors may be affected by blocking VAMP1/2/3-dependent exocytosis. Pan et al. opted for a strategy of examining transcriptional changes, asserting that important genes may be upregulated in response to compensate for blocked secretion. While this is an indirect way to identify secreted candidates, the authors found a fortuitous result that Ptgds was substantially increased in the PD:iBot oligodendrocyte cells. To confirm that L-PGDS secretion is reduced from iBot cells, the authors show Western blots. By eye the change in L-PGDS is variable, however, the authors conduct several experiments with an inhibitor and product of L-PGDS that nonetheless indicate L-PGDS activity can contribute to the morphological maturation of oligodendrocytes. A caveat is that the AT-56 inhibitor reduces MBP+ cells, and the quantification of morphology is dependent on MBP staining (again, see my note in 2b about the CellMask dye). A report on differentiation (% MBP+ cells) may be a more accurate reflection of the result.

      We repeated the AT-56 experiment using the membrane version of CellMask and again found that AT-56 inhibits oligodendrocyte maturation (Figure 7-figure supplement 2, text page 33).

      The key, compelling experiment demonstrating the role of prostaglandin D2 is the authors' rescue experiment in Fig 4G.

      As described above under Essential Revisions 4), we performed additional rescue experiments on the role of L-PGDS in oligodendrocyte development. We found that adding L-PGDS protein extracellularly to PD:ibot oligodendrocyte cultures rescued their development defect (Figure 9A, B, page 34). Moreover, overexpressing Ptgds in PD:ibot mice partially rescued the myelination defect (Figure 9E-H, page 36).

      4) Although it's not a direct demonstration that L-PDGS secretion from oligodendrocytes is the key factor, the global L-PDGS knockout mice phenocopy many of the observations of the PD:iBot mice. This is a nice set of observations consistent with the author's hypothesis that L-PDGS impacts oligodendrocyte maturation. Future work should pinpoint whether oligodendrocyte-derived L-PDGS is critical.

      We agree with the reviewer that pinpointing whether oligodendrocyte-derived L-PGDS promotes oligodendrocyte development and myelination is an interesting direction to pursue in future work. We are breeding L-PGDS conditional knockout mice to address this question and may report the results in a separate paper in the future.

      Minor points:

      1) The authors demonstrate that PD:iBot expresses botulinum and loses VAMP2 protein levels in oligodendrocyte lineage cells, but there is no demonstration of whether VAMP3 is expressed or similarly affected. Prior work has demonstrated in vitro that oligodendrocytes express both VAMP2 and VAMP3 (VAMP1 not detected). This would more clearly demonstrate which VAMP-mediated vesicular transport is blocked for the effects observed.

      We agree with the reviewer and examined VAMP3 levels with Western blot. We found diminished levels of VAMP3 in oligodendrocyte-lineage cells from PD:ibot mice (Figure 1 J, M, text page 10).

      2) It is satisfying to observe a behavioral effect in the PD:iBot mice. I would advise caution in interpreting any direct link between oligodendrocytes maturation and the rotarod behavioral difference at this time. Blocking secretion from PDGFRa-Cre expressing cells may have many indirect effects (beyond myelination) in both the CNS and other cell types that can express PDGFRa and VAMPs1/2/3. I was pleased that the authors did not conclude any direct links at this time.

      We agree with the reviewer.

      Overall, the authors had a well-rounded manuscript with clearly described and thoughtful experiments. The data support the conclusion that VAMP-mediated exocytosis is critical for oligodendrocyte maturation. The evidence that reduced L-PDGS secretion from the oligodendrocytes can explain the effects of the iBot mice is not as clear cut, but their data does demonstrate that L-PDGS is an important molecule for the differentiation of oligodendrocytes. This work will lead a new direction for future studies to investigate autocrine/paracrine signaling in oligodendrocyte maturation.

      We thank the reviewer for the positive comments on our manuscript. As detailed in Essential Revisions 4), we now provide additional evidence on the potential contribution of L-PGDS in the oligodendrocyte development defect in PD:ibot mice.

    1. Author Response

      Reviewer #3 (Public Review):

      Garratt et al. investigated that transient exposure of young mice during their first two months of life with olfactory cues from con-specific adults would have long-lasting effects on their late-life health and lifespan. They find that the olfactory cues have sex-specific effects on lifespan, which only the lifespan of young females can be extended by odours from adult females but no other combinations, neither young females with adult males nor young males with either sex. Interestingly, their data also suggested that depletion of G protein Gαo in the olfactory system played no role in the lifespan extension, indicating it might be another unknown factor(s) mediating this sex-specific effect on longevity in mice. While the conclusions of this study are well supported by the data, there are some issues with parts of the data analysis and presentation that would need to be clarified and extended.

      1) The authors suggested that the G protein Gαo played no role in lifespan extension in the case that transient exposure of young females with olfactory cues from female adults, as they showed in Figure 1. However, it is not clear if the depletion of G Gαo (Gαo mutant) itself has effects on lifespan, compared to its wild type. It would be important to show the lifespan curves from wild type and Gαo mutant individually alongside the pooled lifespan curves, as well as regarding data in a table, followed with a proper discussion.

      Data for genotypes is now shown individually.

      2) Regarding the functional tests, the authors showed that there was only a small fraction of experiments showed differences between treatments, which were all in figure 2. However, it is necessary to also show the data with no differences, particularly since the conclusion of the study suggested the underlying mechanisms are not clear yet. In my opinion, body weight, plasma glucose, and body temperature all deserve to have their figures individually with all data points.

      This data is now shown.

      3) As the authors mentioned in the Introduction, the age at sexual maturity correlates positively with the median lifespan across mice strains (Yuan et al. 2012, Wang et al. 2018). Also, young female mice that were exposed to male odours during their developmental stage accelerated sexual maturity (Drickamer 1983), and the same happened to young males that were exposed to the odours from the opposite sex (Vandenbergh 1971). It is, therefore, surprising to see in this study, the exposure of young females or young males to the olfactory information from their opposite sex had no effects on lifespan. One of the solutions to solve this disparity is to measure the sexual maturity of the mice in this study. The authors should seek the possibility to check the record of when the first litter of pups was born between treatments (Shindyapina et al. 2022) or examine preputial separation and vaginal opening (Hoffmann 2018), for instance.

      The animals used in the lifespan experiment were not allowed to breed so as not to interfere with the lifespan assessment. Similarly, we did not check animals within the lifespan experiment for sexual maturity as we wanted to minimize the handling of animals after weaning, and this requires daily handling and/or vaginal swabbing.

      We conducted a preliminary experiment prior to the main lifespan experiment (in UM-Het3 mice) to test whether sexual maturity was modulated in the expected directions with the odour exposure protocol we planned to impose. This experiment showed that the odor manipulation we applied has the expected effects on sexual maturity. We have now outlined this experiment and its results in the methods section of the paper to justify the odor treatment protocol.

  2. Nov 2022
    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, the author characterizes the lattice of kinesin-decorated microtubule reconstituted from porcine tubulins in vitro and Xenopus egg extract using cryo-electron tomography and subtomogram averaging. Using the SSTA, they looked at the transition in the lattice of individual microtubules. The authors found that the lattice is not always uniform but contains transitions of different types of lattices. The finding is quite interesting and probably will lead to more investigation of the microtubule lattice inside the cells later on for this kind of lattice transition.

      The manuscript is easy to read and well-organized. The supporting data is very well prepared.

      Overall, it seems the conclusion of the author is justified. However, the manuscript appears to show a lack of data. Only 4 tomograms are done for the porcine microtubules. Increasing the data number would make the manuscript statistically convincing.

      One tomogram can contain one to several tens of microtubules. For example, 64 microtubules were analyzed in the Xenopus-DMSO dataset obtained on 5 tomograms, versus 24 microtubules for the GTP-dataset obtained on 4 tomograms (see Table 1). Hence, taking the number of tomograms to assess the statistical relevance of our work cannot be considered as a valid criterion. Tomograms are taken randomly on the EM-grid sample, solely based on ice quality and the covering of microtubules in the holes as determined at low magnification before tomographic acquisition. No prior knowledge of the structure and lattice-type organization of the microtubules can be obtained before acquisition. It appears to us that a more pertinent criterion is the number of events that we characterized, specifically lattice-type transitions along individual microtubules. In the dataset mentioned by the referee (see Figure 2-figure supplement 3-4 and Table I), 24 microtubules were analyzed and further divided into 195 segments, providing an equivalent number of individual 3D reconstructions. For each 3D reconstruction, almost all lateral interactions could be characterized in terms of lattice-type, i.e., 2091 of the B-type, 460 of the A-type, and 112 not determined (essentially at transition regions). Most importantly, we document in this specific dataset 119 transitions in lattice-type, which we think is sufficient to characterize such molecular events and provide solid statistics for this dataset. Adding the GMPCPP and Xenopus data, we end-up with 938 individual 3D reconstructions (not including the full-length microtubule volumes), 12 463 lateral interactions analyzed (A-, B-, or ND-type), and the observation of 172 lattice-type transitions. Therefore, we respectfully disagree with the referee stating that our work lacks data.

      To highlight the quantity of data used in our work, we have modified the following sentences: L124-131: ' Analysis of 24 microtubules taken on 4 tomograms, representing 195 segments of ~160 nm length (i.e., 2664 lateral interactions), allowed us to characterize 119 lattice type transitions with an average frequency of 3.69 µm-1 (Table 1), but with a high heterogeneity' L160-164: ' Analysis of 31 GMPCPP-microtubules taken on 6 tomograms, representing 338 segments of ~150 nm in length (i.e., 3236 lateral interactions), and using the same strategy as in the presence of GTP (Figure 5—figure supplement 1-2) revealed a transition frequency of 1.25 µm-1 (Table 1), i.e., ~3 fold lower than microtubules assembled in the presence of GTP.' L200-203: ' A total of 64 microtubules taken on 5 tomograms were analyzed in the Xenopus-DMSO dataset (i.e., 419 segments from which we characterized 5446 lateral interactions), and 15 microtubules taken on one tomogram for the Xenopus Ran-dataset (i.e., 86 segments from which we characterized 1118 lateral interactions), (Table 1).'

      In addition, having the same transition with the missing wedge orientation randomly from different subtomograms will allow a better average of transition without the missing wedge artifact.

      In this work, we did not aim at averaging transitions. Transitions in lattice-types are highly heterogeneous in nature, and we wonder what additional information an averaging strategy would have provided. Conversely, each transition is a unique event that we characterized to obtain useful statistics, and the missing data at high angle inherent to electron tomography were not an obstacle to fulfill this task.

      Another thing that I found lacking is the mapping of the transition region/alignment in the raw data.

      In Figure 4, we clearly show the correspondence between the segmented sub-tomogram averages (SSTA) and the raw filtered images at the transition region. This is also the case in Figure 5 where the SSTA (Figure 5A) are compared with the raw tomogram (Figure 5B), and where we clearly visualize the holes that result from the transitions in lattice types.

      However, it is not easy for me or the reader to understand how each segment is oriented relative to each other apart from the simplified seam diagrams in the figures, and also the orientation of the seam corresponding to the missing wedge in the average. With these improvements, I think the conclusion of the manuscript will be better justified.

      The segmentation process is explained in Figure 2-figure supplement 2 and in the Materials and Methods section, which shows that each segment is linearly related to the next. Small rotations can happen between individual segments, and it is important to check that the same protofilaments are followed during the initial modeling (see the online tutorial referenced in the manuscript for full-length microtubules). The segment models are derived from that of the full-length microtubule, as explained in the Materials and Methods section, using a new routine (splitIntoNsegments) implemented into the PEET program. In addition, a detailed protocol describing our SSTA strategy will be submitted following publication of our manuscript.

      Reviewer #2 (Public Review):

      Differences in protofilament and subunit helical-start numbers for in vitro polymerized and cellular microtubules have previously been well characterized. In this work, Guyomar et al. analyze the fine organization of tubulin dimers within the microtubule lattice using cryo-electron tomography and subtomogram averaging. Microtubules were assembled in vitro or within Xenopus egg cytoplasmic extracts and plunge frozen after addition of a kinesin motor domain to mark the position of tubulin dimers. By generating subtomogram averages of consecutive sections of each microtubule and manually annotating their lattice geometry, the authors quantified changes in lattice arrangement in individual microtubules. They found in vitro polymerized microtubules often contained multiple seams and lattice-type changes. In contrast, microtubules polymerized in the cytoplasmic extract more frequently contained a single seam and fewer lattice-type transitions.

      Overall, their segmented subtomogram averaging approach is appropriately used to identify regions of lattice-type transition and quantify their abundance. This study provides new data on how often small holes in the lattice occur and suggests that regulators of microtubule growth in cells also control lateral tubulin interactions. However, not all of the claims are well supported by their data and the presentation of their main conclusions could be improved.

      1 - We have corrected approximative claims and conclusions where necessary. In particular, we now discuss separately the Xenopus-DMSO and the Xenopus-Ran egg extract samples, and have modified our conclusions accordingly. We also deposited onto the EMPIAR all tomograms and PEET models to reproduce the 938 segmented sub-tomogram averages analyzed in this study (see new Supplementary file 2).

      Reviewer #3 (Public Review):

      Protofilament number changes have been observed in in vitro assembled microtubules. This study by Guyomar and colleagues uses cryo-ET and subtomogram averaging to investigate the structural plasticity of microtubules assembled in vitro from purified porcine brain tubulin at high concentrations and from Xenopus egg extracts in which polymerization was initiated either by addition of DMSO or by adding a constitutively active Ran. They show that the microtubule lattice is plastic with frequent protofilament changes and contains multiple seams. A model is proposed for microtubule polymerization whereby these lattice discontinuities/defects are introduced due to the addition of tubulin dimers through lateral contacts between alpha and beta tubulin, thus creating gaps in the lattice and shifting the seam. The study clearly shows quantitatively the lattice changes in two separate conditions of assembling microtubules. The high frequency of defects they observe under their microtubule assembly conditions is much higher than what has been observed in vivo in intact cells. Their observations are clear and supported by the data, but it is not at all clear how generalizable they are and whether the defect frequencies they see are not a result of the assembly conditions, dilutions used and presence of kinesin with which the lattice is decorated. The study definitely has implications for mechanistic studies of microtubules in vitro and raises the question of how these defects vary for protocols from different labs and between different tubulin preparations.

      1 - High tubulin concentration: It has been documented by many laboratories since the discovery of tubulin and the characterization of its assembly properties that a sufficient concentration of free tubulin is necessary to self-assemble microtubules. This is called the critical concentration for self-assembly (the CC, i.e., the critical concentration to overcome the nucleation barrier), and has been reported to be in the range 14~25 µM in the presence of GTP depending on laboratories. For example, in the seminal work of Mitchison and Kirschner the CC was estimated at 14 µM (Fig. 5 of ref. (Mitchison & Kirschner, 1984b)) and self-assembly was induced at concentrations in the range 32-59 µM (Mitchison & Kirschner, 1984a). Our own estimate of the CC for porcine brain tubulin was 21 µM (Fig 2C of (Weis et al., 2010)), and we routinely use a tubulin concentration slightly above the CC when we aim at robust microtubule self-assembly. Hence, we argue that 40 µM, which is ~twice the CC, cannot be considered as a "very high" tubulin concentration to induce microtubule self-assembly.

      2 - Protofilament number and lattice-type transitions in cells: While microtubules with protofilament numbers different than 13 have been observed in different cell types and species (reviewed in (Chaaban & Brouhard, 2017)), we are aware of only one recent study where changes in protofilament numbers along individual microtubules have been reported in cells (Foster et al., 2021), but with no statistics concerning their frequencies. Hence, we cannot compare changes in protofilament number frequencies in Xenopus egg extracts with those that occur in intact cells. Concerning lattice-type transitions, we are not aware of any previous study that documented such features, whether in vitro or in cells.

      3 - Generalization of our results, source of tubulin and protocols: Multi-seams in microtubules assembled in vitro have been reported by several groups in the past (see our Introduction, L49-62), starting from (Kikkawa et al., 1994), the Milligan group (Dias & Milligan, 1999; Sosa et al., 1997), and more recently by the Sindelar group (Debs et al., 2020). In Kikkawa et al. (1994), the authors purified tubulin from porcine brain by three cycles of assembly/disassembly followed by phosphocellulose chromatography. Assembly was carried out at 24 µM in the presence of Taxol. In Sosa and Milligan (1996-1997), the authors used a commercial source (Cytoskeleton) and assembled the microtubules at 30 µM in the presence of Taxol. In Debs et al. (2020), the authors used tubulin purified from porcine brain according to (Castoldi & Popov, 2003), as we did, to assemble GMPCPP microtubules, and bovine brain tubulin (Cytoskeleton) to assemble Taxol-stabilized microtubules. Noticeably, they used an initial tubulin concentration of 100 µM to initiate microtubule polymerization and then added Taxol to continue the reaction.

      We add to these previous studies that microtubules with different numbers of seams are not unique ones, but that both the number and location of seams can vary within individual microtubules. The reason why this was not observed before is that the analytical tools used in those previous studies were not suited to reveal this structural heterogeneity within individual microtubules. By contrast, the SSTA approach that we designed was specifically developed towards this aim. Even in the recent work by Debs et al. (2020) that provides the most comprehensive characterization of multi-seams in microtubules assembled in vitro and that obtained a seam distribution very similar to ours (compare their Figure 3C with our new Figure 10C for GDP microtubules, dark blue bars), their protofilament-based approach could not reveal changes in the number and location of seams within individual microtubules. Yet, they probably could have done it if they had asked whether segments with different seam numbers had been extracted from the same microtubules.

      Here, we designed a specific approach to tackle the structural heterogeneity of individual lattices that permitted this discovery. Not only do we confirm results obtained by others, but we also propose a molecular mechanism that explains how multi-seams form in microtubules assembled in vitro and how they change in location in a cytoplasmic environment. By doing so, we propose a novel molecular event - formation of unique lateral interactions without longitudinal ones - that was not envisioned before, and which to our opinion, must be incorporated in further modelling studies concerning microtubule nucleation and assembly, including the mechanism of dynamic instability (see the Ideas and speculation section).

      4 - Dilution: A 50X dilution was used only for Xenopus egg cytoplasmic extracts to decrease their density on the EM grid just before freezing. These conditions were settled by cryo-fluorescence microscopy to ensure that we had the adequate density of microtubules onto the EM-grid (Figure 7 and Figure 2—figure supplement 1D). Of note, the microtubules analyzed by SSTA were assembled in extracts that were not supplemented with fluorescent tubulin. While we could imagine that dilution may induce the removal of dimers from the microtubule lattice, we cannot foresee how this could change the register between tubulin subunits within the microtubule lattice.

      5 - Kinesin decoration: Like many other laboratories (see the Table in Figure 3 of (Manka & Moores, 2018)), we use the non-processive motor domain of kinesin 1 to decorate microtubules, with the aim to differentiate the - and -tubulin monomers within the microtubule lattice. In particular, it has been shown that lattice parameters such as the protofilament skew and lattice spacing are unmodified when kinesin motor domains are added to GMPCPP- or GDP-microtubules (Zhang et al., 2015, 2018). In addition, we cannot envisage how this non processive motor added to preformed microtubules could change the registry of the -tubulin heterodimers within the microtubule lattice.

    1. Author Response

      .Reviewer #1 (Public Review):

      1) It is important to emphasize that the osteoporotic phenotypes were only demonstrated in males, but not in female mice. The observed phenotypes were not hormone-dependent, as no significant differences in examined bone parameters were observed between wild type andPrdx5KO female mice in an ovariectomy-induced osteoporosis model. However, women over 50 have a four times higher rate of osteoporosis compared with men, and the role of testosterone in the development of osteoporosis in Prdx5KO mice should be investigated. It is known that the osteoporosis is increased in men with low level of testosterone.

      Thanks for your comments regarding osteoporosis phenotypes in Prdx5 KO males and their relation with testosterone levels. Based on your suggestion, we re-examined testosterone levels in the serum of male mice and tested the expression levels of the androgen receptor (AR) in the differentiated osteoblasts and osteoclasts of the mice. We have updated the data in Figure 3-figure supplement 2 and included the revised information in the Results (Pages 13-14) and Discussion (Page 34) sections.

      2) It is misleading for authors to state throughout the manuscript that osteoporotic phenotypes are observed in Prdx5KO mice, while it is only observed in male mice.

      We apologize for this oversight. We have modified the text and indicated that all osteoporotic phenotypes were observed in Prdx5 KO male mice.

      Reviewer #2 (Public Review):

      1) While the abstract emphasizes transcriptomic analysis and mass spectrometry, extensive imaging techniques have also been used and should be highlighted to give an overview of results from the performed techniques.

      In addition, make it clear that it is proteomics-based mass spectrometry, since I was only able to confirm that after seeing Figure 5.

      Thanks for your helpful suggestions. We have modified the Abstract based on your suggestions.

      2) Line 46-53: I would add more details of how balanced bone mass looks on average, how much is too much, when should we be concerned about bone mass, and does some amount of stress benefit bone mass?

      Thank you for the suggestion. We have modified the Introduction. We wanted to explain that for bone as a supporting organ, general mechanical stress is required for its remodeling, although we agree that it is not some necessary information related to our study and may confuse the readers.

    1. Author Response

      Reviewer #3 (Public Review):

      Results of this manuscript provide a new link between oxygen sensing and cholesterol synthesis. In previous studies, this group showed that the cholesterol synthetic enzyme squalene monooxygenase (SM) is subjected to partial proteasomal degradation, which leads to the production of a truncated, constitutively active enzyme. In this study, the authors provide evidence for the physiological significance of SM truncation. In a series of experiments, the authors show that subjecting cells to hypoxia (oxygen deprivation) induces truncation of SM. The synthesis of cholesterol requires 11 molecules of oxygen and SM is the first oxygen-dependent enzyme in the cholesterol-committed branch of the pathway. Evidence is presented that hypoxia causes squalene, the substrate of SM, to accumulate, which results in the enzyme's truncation. In addition, hypoxia stabilizes MARCHF6, the E3 ligase required for sterol-dependent ubiquitination and degradation of SM. Finally, the authors provide an experiment showing that truncation of SM correlates with hypoxia in endometrial cancer tissues.

      Overall, the data presented in this manuscript are compelling for the most part. Hypoxia-induced truncation of SM and MARCHF6 is very clear according to the presented results. The specificity of SM-induced truncation is strong; both direct addition and inhibitor studies are presented. The major strength of this manuscript is that it provides the physiological relevance for the authors' previous finding that squalene accumulation leads to truncation of SM. However, there are a few issues that should be addressed to improve the interpretation of the data presented.

      We thank the reviewer for their useful comments.

      The manner in which quantified immunoblots are presented is very confusing and difficult to interpret. This is evident in experiments in several figures. For example, it is difficult to determine the role of ubiquitination (Figure 2D) and MARCHF6 (Figure 2E) in the generation of truncated SM. The authors should present quantified data of all lanes of the immunoblots to reduce confusion.

      The revised manuscript includes quantification of protein levels for all immunoblot lanes, including in Figure 2D and Figure 2E (now Figure 3A). It also contains updates to the text, figure legends, and axis labels to improve clarity about data normalization. For more information, please refer to our response to Essential Revisions comment #1.

      The other important finding of this manuscript is that hypoxia stabilizes MARCHF6. This is supported by the results of Fig. 3A; however, the result of Figure 3B is not clear. A new band appears upon inhibition of VCP and MG-132 seems to reduce protein expression. These results could be removed from the manuscript without impacting the conclusions drawn.

      As suggested, the revised manuscript contains only the initial observation that hypoxia stabilizes MARCHF6. Other experiments investigating the mechanism have been removed. For more information, please refer to our response to Essential Revisions comment #2.

      Finally, the results shown in Figure 5 showing that truncation of SM correlates with hypoxia in endometrial cancer tissues are a little preliminary. Multiple bands are detected in SM immunoblots, which interferes with interpretation. This experiment could be removed and speculated upon in the discussion.

      As suggested, this experiment is removed from the revised manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      They established a "behavioral transcriptomics" platform as they cultured mouse primary cell explant on an apparatus, imaged the cells over time, and analyzed cells with differential physiological status by scRNA-seq. They showed evidence that the system recapitulated physiological features of airway cells, including chemical-induced damage response. They further utilized the system to isolate cells of different cellular features and analyzed gene expression through scRNA-seq. The study demonstrates an interesting establishment and application of an in vitro system mimicking in vivo.

      However, several major concerns need to be resolved.

      First, whereas the overall study seems to focus on the establishment of airway epithelial cell explant apparatus and its application, take home messages that are delivered by the authors seem to emphasize the transcriptome analysis part. The authors introduced "spatial transcriptomics" and"behavioral transcriptomics" in the abstract but it is hard to appreciate that the study resolves spatial transcriptomics. This causes unnecessary confusion. Second, probably related to the first question, it is hard to find the novelty of the study. Third, probably the last and most important part of the manuscript is to analyze the cells by Smart-seq. But the analysis was performed on the SO2 injured animal only and lacked experiment on wildtype mice. If the authors tried to prove the feasibility of the technique rather than resolving physiological mechanism here, then I would recommend explaining why wild type experiment was not performed.

      The method described in the manuscript consists of two components: a novel tissue imaging platform, and characterization of a cellular behavior. Both steps can be generalized to different tissue contexts and different cellular behaviors, respectively. We have revised the title and abstract to specify the scope of this study and have also revised the text accordingly.

      Live imaging allows us to observe cell behaviors in intact tissues but does not provide information on cell type. By profiling cells that are observed by live imaging to share a behavior at single-cell resolution rather than bulk, we can separate out sources of transcriptional variation, like cell type identity, in order to identify the transcriptional signatures that reflect cell behaviors.

      Single-cell sequencing (via Smart-seq) has been previously performed in wild-type mouse trachea (Montoro et al., 2018), and identified underlying cellular heterogeneity. However, the steady state tracheal epithelium is largely quiescent, characterized by slow turnover and a lack of visible cell motility. We performed daily imaging of trachea explants from uninjured mice over 4 days and did not observe any significant displacement of epithelial cells. Furthermore, we also imaged an uninjured explanted tracheal epithelium every 40 minutes for over 19 hours with no significant directional movement (new Movie 3). We added the following text to the manuscript: “Imaging of trachea explant controls from uninjured mice over 19 hours revealed no cellular displacement in the airway epithelium (Movie 3).”

      In contrast, regeneration activates cell motility followed by cell proliferation. Therefore, we chose tissue regeneration as the more suitable biological context for this study to examine cellular dynamics. We leveraged the gene signatures derived from the previous wild-type study (Montoro et al., 2018) to identify different cell types and make like-for-like comparisons. We used an independent regeneration dataset in the same tissue but with a different injury model (Plasschaert et al., 2018) to test whether the molecular signatures derived in our study that differentiate moving and non-moving cells are generalizable to other contexts.

      Reviewer #2 (Public Review):

      Kwok et al. devise a method that uses a transgenic mouse line to make the link between cell behaviour in intact living tissue and subsequent dissociation into distinct groups forsingle cell sequencing. Specifically, they set up a mouse airway culture system in which it is possible to maintain live cells for multiple days and then preserve the same tissue. The analysed tissue section can be fixed and known cell types identified via classical staining protocols. In this system they imaged a number of tissue phenotypes such as ciliary beating, mucociliary clearing and airway regeneration. With respect to airway regeneration they observe that there was cellular heterogeneity between cells with the capacity to move and so-called non-movers, which the authors were able to quantitively track.To make the link with single cell sequencing, they use the Kaede transgenic mouse lines,which contains a green fluorescent reporter gene, that can be converted into a red fluorescent reported gene by illuminating a defined tissue section, in this case regions enriched for movers or non-movers. After dissociation of the tissue, cells were FACSsorted using the reporter protein. Subsequent single cell RNAseq revealed distinct gene signatures that were associated with the mover versus the non-mover phenotype. These phenotypes could also be detected in previously published data sets.

      The conclusions of the paper are supported by the data that is presented, but the comparison to existing mouse injury data could be improved. A weakness of the paper is the implication that the technique can be used for any of the phenotypes that they have examined. However, in order to be assessed by this method,there need to be a reasonably large number of cells that show similar behaviour in a region that can be photoconverted. If it is indeed possible to do the photoconversion at the single cell level, the authors should demonstrate that such resolution is possible, or otherwise clearly state this limitation of the technique they have developed.

      We recognize that the approach in this study does not involve photoconversion at single-cell resolution. While single-cell photoconversion and subsequent intermittent live imaging has been demonstrated in other systems such as zebrafish (Green and Smith, 2018) and mouse skin (Park et al., 2017), the throughput of doing downstream single-cell analysis would be limited, especially in a cell type-specific manner. Having observed a relatively homogeneous behavior of cells within a small region (~200 μm diameter, Movie 1 and Movie 2) of the airway epithelium, we photoconverted a small area with several hundred cells. Subsequent single cell sequencing allowed us to compare differences in gene expression between basal cells of slow/non-moving regions to basal cells of fast/moving regions.

      Reviewer #3 (Public Review):

      In this manuscript, the authors identify a pressing need to couple visualized in situ cell behaviour with deep molecular profiling of visualized cells, aiming to move beyond inferences made from time-lapse tissue sampling approaches or the analysis of transcriptional kinetics to identify the molecular pathways that drive cellular behaviour in situ. The authors identify live cell imaging combined with deep molecular profiling of theimaged cells as one possible solution. To this end, the authors establish a novel platform for live cell imaging of tracheal epithelial cells using explants of mouse trachea that allows long-term visualization of cell behaviour, and try to couple live-cell imaging to the transcriptional cell states.

      Combining single-cell RNA-seq analyses with live cell imaging offers the unique opportunity to link transcriptional and anatomic, morphological or movement phenotypes of individual cells. To be able to do this in intact tissues at baseline and in response to injury would allow a far more detailed and integral analysis of cellular behaviour in their physiological context. As such, the approach of the authors is interesting and clearly focused on achieving this goal. The only data that can support a claim of successfully achieving this ambitious goal are presented in figure 3, where an advanced mouse model(the Kaede-Green mouse) is used that allows labelling individual cells by photo-conversion, followed by isolation of individual cells by flow cytometry and plate-basedsc RNA-seq analysis of sorted cells. By taking this approach, the authors are able to identify transcriptional differences at the group level between tracheal epithelial cell subsets that differ in their movement after injury.

      While this in itself is a remarkable accomplishment, and an interesting observation, the relationship between the 'behaviour' of the cells observed with live cell imaging (the movement after injury) versus the transcriptional phenotype remains rather elusive. One explanation could be that active movement of cells depends on a specific transcriptional program, that is lacking from the non-moving cells. Another explanation could be that the tracheal epithelial cells are inherently heterogeneous, and one subset has the capacity to move whereas others do not, and the transcriptional profile merely identifies these heterogeneous populations. The observation that non-mover cell populations contain both basal and club cells, whereas mover regions only have basal cells seems to support this notion to some extent. However, the authors then claim to use basal-cell derived signatures (excluding the club cells) from mover and non-mover regions and compare this to literature data from another injury model to show that these signatures also identify distinct subsets in a mouse model of polidocanol-induced injury. How the distinction basal vs club cells in the non-mover regions is made remains unclear, and would seem challenging from the number of cells analyzed (as presented in figure 3).

      The identification of two behavioural phenotypes of basal cells (mover vs non-mover) in this manuscript is based on group-level phenotypes: the cells belong to a region of moversor a region of non-movers. This is relevant for figures 2 (including supplemental) and 3. In figure 2 supplemental 2C, it seems evident that within one region (or focussing only on all moving regions?), the behaviour of all cells within that region/selection is quite uniform:the variation is really very limited, and all cells seem to speed up and slow down in a highly coordinated fashion within the selected regions shown. At the same time, in figure2D, the distribution of regions across speed categories at 26-36 hours pi (the peak of the movement in suppl 2C) seems almost bimodal, with regions belonging either to non-mover(range 0.5 - 2.5 uM/hr) or mover (range 3.0-7.0 uM/hr) phenotypes. However, all regions display an increased movement at 16h pi compared to the pre-injury movements (Figure2C), indicating that all cells will be induced to induce movement to some extent.

      My main concern with this analysis is that the behavioural phenotype of the epithelial cells is assumed to be homogeneous within each region, allowing a contrast to be made in figure3 for the transcriptional phenotypes on the basis of moving phenotypes rather than on the basis of the main variation within the dataset.

      For instance, from the t-SNE plot (3B) - for what it's worth of course - and the heatmap (3C) there seems to be at least one non-mover cell that transcriptionally has a higher resemblance to the mover cells than to the other non-mover cells. Of course that can just be the variability present in the dataset, but it could also indicate that non-mover regions are not completely homogeneous, and even more so, that the moving vs non-moving associated transcriptional phenotype is a gradual transition rather than 2 clearly separate sub-phenotypes.

      All-in-all, this manuscript describes an interesting technical advance and shows some of the applications thereof. However, the approach also has its limitations: The requirement to mark cells with specific behavioural features for follow-up transcriptomic analysis (such as by photoconversion) necessitates the division of the epithelial cells into major categories on the basis of certain cellular phenotypes (such as movement) that can be visualized by live cell imaging. This limits the analysis opportunities to group-based contrasts in cellular behaviour as also used here by the authors.

      Also, the use of explanted tissue is of course less ideal than in vivo imaging, but most likely the only technically feasible approach at this moment. At the same time, the capacity to combine image-based features with single-cell transcriptomic data is an important advance, even when initially only possible in explanted tissue from mouse models carrying all kinds of fluorescent reporters. To strengthen the manuscript, it would therefore be important to discuss the limitations of the approach, as well as to provide a more comprehensive overview of the possible applications that the authors foresee.

      We thank the reviewer for the feedback. Our data demonstrates that the movement behavior is an injury-induced phenotype. 24 hours after injury (hpi), the “mover” transcriptional program is transiently enriched, while the “non-mover” transcriptional program is also transiently decreased, consistent with a cell state that is induced by injury (see Figure 4A, 24-hpi).

      SO2 removes nearly all the luminal cells (Rock et al., 2009) so we removed the club cells to compare injury response in basal cells. Distinguishing basal vs club cells is done by hierarchical clustering and comparison to established cell type signatures (Montoro et al., 2018). We apologize that the initial presentation did not make this clear. In the revised manuscript, we have provided an additional figure supplement demonstrating the hierarchical clustering (Figure 3 - figure supplement 1A), and the disjoint expression of canonical markers Krt5 (basal) and Scgb1a1 (club), which enabled us to assign unambiguous cell-type identities to discovered clusters (Figure 3 - figure supplement 1B).

      We agree with the reviewer that all cells, including cells that we classified as “mover” and “nonmover” are induced to move compared to pre-injury as suggested by Figure 2c. However, “mover” and “non-mover” cells differ dramatically in the amplitude and collective directionality of movement. We investigated the movement phenotypes in detail, including high-resolution imaging at shorter time intervals (10 min). We found that the slow “non-movers” had a large circular directionality variance (akin to oscillations), whereas the rapid “movers” moved directionally across the field of view. We quantified this with particle image velocimetry in Figure 2 – figure supplement 3C-D, and we revised the text to provide additional details about this result.

      The reviewer also raises concern about whether the movement is homogeneous enough to account for the variation in the datasets. We used our imaging data to determine the time points in which the mover and non-mover phenotypes varied the most (around 40 hrs post injury) between different regions (Figure 2 - figure supplement 2A, C) but we have also demonstrated that the movement within each region is indeed relatively homogeneous (~200 μm diameter, Movie 1 and Movie 2).

      We acknowledge that the presented data did not eliminate the possibility of another main variation within the dataset. We now perform PCA on the dataset, which confirmed that while the first principal component (PC) is associated with a solitary pulmonary neuroendocrine cell, the second PC is strongly associated with the difference between moving and non-moving cells (p=0.003, Wald test). When analyzing only the basal cells, we find that PC-1 provides a very clean separation and overlaps perfectly with the moving vs non-moving distinction (p<2 x 10-16, Wald test, Figure 3 - figure supplemental 2a). Taken together, with this additional analysis we can confirm that our focus on this behavioral phenotype reflects the main variation within the dataset.

      We appreciate the reviewer’s nuanced question about the single outlier cell. While we do observe a transcriptional phenotype that is clearly distinct, as the reviewer points out, there is a very small degree of overlap between the two cell type clusters visible on the t-SNE plot in Figure 3B. Given that the physical process of movement is a matter of degree, it is possible that this particular cell is simply not moving as much, and thus activating movement-related transcriptional programs to a lower degree. To analyze this question further in response to this question, we analyzed the separability of these groups by training a machine learning (k-nearest neighbor) classifier to distinguish these clusters (new Figure 3 - figure supplement 2b). We found that the groups could be distinguished with a high accuracy of 98.7% (95% CI: 92.7-99.9) using 5 or more of the signature genes that we defined in Figure 3C. This additional analysis we continue to conclude that while the groups have a very small degree of overlap, the moving and non-moving phenotypes are strongly separable.

      We acknowledge the limitations of this approach to groups of cells (see response to Reviewer 1) and both the limitations and advantages of using a tissue rather than cells, and we added these points to the discussion section.

    1. Author Response

      Reviewer 1 (Public Review):

      1) The finding that thalamic activity exhibits a low dimension structure is in my opinion less of a finding, but rather an assumption that motivates the use of dimensionality reduction techniques. When the authors ask (line 101) "whether thalamic task activity exhibits similar low dimensional structure", what is the alternative hypothesis? I think it is a foregone conclusion that with a restricted number of tasks, and the intrinsic smoothness of fMRI activity data, there are always K<<N components that capture 50,75, 90% of the variance. If you had measured the spiking of the entire population of thalamic neurons or increased the threshold to 99%, the structure of activity would be more high dimensional. So I believe you can either frame this as an assumption going in, or you build carefully an alternative hypothesis of what a "high-dimensional" structure would look like. Generating activity data i.i.d would be the simplest case, but given that both signal and measurement noise in fMRI are reasonably smooth, this would be a VERY trivial null hypothesis.

      We thank the reviewer for pointing out this inherent assumption in our analysis. We agree that given the smoothed nature of BOLD signal and the restricted task design we likely cannot effectively test an alternative high dimensional organization hypothesis. We have revised our introduction accordingly and clarify that we use a dimensionality reduction technique with the assumption that we will observe a low dimension structure of thalamic task fMRI data, similar to past fMRI studies that focused on cortical ROIs (line 102). Furthermore, we have revised the discussion section to remove discussion highlighting the low-dimension organization as a novel finding (line 404).

      2) The measure of "task hub" properties that is central to the paper would need to be much better explained and justified. You motivate the measure to be designed to find voxels that are "more flexibly recruited by multiple thalamic activity components", but it is not clear to me at this point that the measure defined on line 634 does this. First, sum_n w_i^2 is constrained to be the variance of the voxel across tasks, correct? Would sum_n abs(w) be higher when the weights are distributed across components? Given that each w is weighted by the variance (eigenvalue) of the component across the thalamus, would the score not be maximal if the voxel only loaded on the most important eigenvector, rather than being involved in a number of components? Also, the measure is clearly not rotational invariant - so would this result change after some rotation PCA solution? Some toy examples and further demonstrations that show why this measure makes sense (and what it really captures) would be essential. The same holds for the participation index for the resting state analysis.

      Please see our response to essential revision point #1.

      3) For the activity flow analysis, the null models (which need to be explained better) appear weak (i.e. no differences across tasks?), and it is no small wonder that the thalamus does significantly better. The Pearson correlations are not overwhelmingly impressive either. To give the reader a feel for how good/bad the prediction actually is, it would be essential that the authors would report noise ceilings - i.e. based on the reliability of the cortical activity patterns and thalamic activity patterns, what correlation would the best model achieve (see King et al., 2022, BioRxiv, as an example).

      Please see our response to essential revision point #4.

      4) Overall it has not been made clear what the RDM analysis adds to the prediction of the actual activity patterns. If you predicted the activity patterns themselves up to the noise ceiling, you would also hit the RDM correctly. The opposite is not the case, you could predict the correct RDM, but not the spatial location of the activity. However, the two prediction performances are never related to each other and it remains unclear what is learned from the latter (less specific) analysis.

      We agree that the utility of the RDM analysis is not clear, and we have removed it from the manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper details the creation and data behind the website http://pandemics.okstate.edu/covid19/. The authors attempt to explore if there is a cause and effect between the detection of unusually increased mutation activity in the genomic surveillance databases and subsequent near-term surges in SARS-CoV-2 case numbers.

      Overall the premise is interesting as other than following case numbers reported to health authorities and observing what is happening in another country, there is no reliable way to predict when a surge is going to occur. Unfortunately, the data demonstrate that there was no reliable metric that could be used to predict surge events. Interestingly, the website has issued a "surge alert" currently for the month of September. It will be interesting to observe whether their model indeed has predictive power or whether the current analysis is merely coincidental with the surges but not necessarily predictive of them.

      In this work, we investigated a number of metrics for finding a reliable signal of surge prediction. The commonly used ratio ka/ks or the derivative of ka/ks with respect to time did not provide a reliable metric. However for the same data, ka has provided a fairly robust surveillance signal so far. We believe ka/ks studies provide insights into genome changes, but not as a function of short time periods such as days (at least not in the case of SARS-CoV-2). As the motivation of our work is to provide the community with a genomic surveillance approach in real time, we believe that the current data shows that ka is, at present, a useful and fairly reliable metric.

      As the reviewer mentioned, while this manuscript was being reviewed, we issued a warning on September 7th 2022. Several different types of data (including number of new infections, number of hospitalizations, and COVID19 related deaths) has indicated that our warning was accurate since there was a surge in reported number of cases in September and reached a peak in October. For instance, plots shown in Figure S6 indicate that there was a surge in number of cases around Europe at large, and several individual countries including France, United Kingdom, Germany and Italy. Similarly our earlier warning in June also was followed by surges being reported across many countries and collectively across the world (Figure S5). Therefore, we believe the presented methodology has been validated.

      Reviewer #2 (Public Review):

      In this manuscript, Najer et al., perform a comprehensive bioinformatic analysis of SARS-CoV-2 sequences available from public repositories. Through a comparison with the genome sequence of the original Wuhan 2020 strain, they identify the total accumulation of non-synonymous mutations as a predictor of the evolution of new strains. The manuscript provides data for three structural proteins - spike (S), membrane (M), and envelope (E) proteins, as well as data for the non-structural RNA-dependent RNA polymerase (RDRp) protein that serves as a negative control. However, the predictivity of this approach is most marked only for the Omicron variant, with considerable variation in the predictive power of SARS-CoV-2 proteins for other variants. Focusing on a spike, the method does not detect the alpha variant or delta variant surges, which were mostly driven by changes in spike protein, although the level of sequencing data available for the delta variant might have been less. Notably, although the authors conclude that other parameters such as the ratio of non-synonymous to synonymous mutations or the rate of accumulation of non-synonymous mutations are not predictive, they appear to have similar success in predicting the omicron surge.

      We agree with the reviewer, the case of spike protein during the Alpha surge could have been affected by insufficient number of sequences. In case of Gamma/Delta variants, we did notice changes in the spike and the membrane protein. For the case of Omicron and its various sub-variants, the use of ka provides a reliable signal due to changes in the spike, membrane and envelope proteins.

    1. **Author Response""

      Reviewer #2 (Public Review):

      The work systematically reassesses fungal mi/miRNA-like characteristics and annotation confidence and identifies that many of the loci fail to meet the key points of the methods developed for animal or plant miRNAs. Therefore, the authors establish a set of criteria suitable for the annotation of fungal miRNAs and provide a centralized annotation of identified mi/milRNA hairpin RNAs in fungi based on their established rules.

      Here are some comments and suggestions for the manuscript to be improved:

      1) The title mentions "ancestral links", however, the main context of this paper does not include the evolution of fungal mi/milRNAs or show the origins of conserved mi/milRNAs in fungi. The authors are suggested to consider a more appropriate title for this work.

      Agreed, we have modified our title to include a more fitting description of the outcome of the study:

      “Comprehensive re-analysis of hairpin small RNAs in fungi reveals loci with conserved links”

      2) The work proposes a fungal mi/milRNAs hairpin precursor recovery pipeline with three minimal criteria to annotate fungal mi/milRNA loci, which allows nearly half of the loci to pass these rules. To highlight the innovation of this annotation, it is strongly suggested that the authors compare their established pipeline and criteria for fungi with those used in animal or plant miRNAs in detail, and emphasize the advantages of the established pipeline. A figure showing the established pipeline and detailed parameters is needed.

      We have now included a clear workflow diagram for establishing miRNA annotation records and confidence tiers (Figure 1-supplemental 3). As for the comparison with rules in plants and animals, this is stated in Table S6, where it shows some rules employed by other tools/papers/species. We believe these combined supplementals give a strong overview of our approach and how it differs from rules in other approaches.

      3) The established "standard rules" for fungal mi/milRNA annotation still require more evaluation. It would be better if there is experimental validation to improve confidence.

      Sequencing evidence is generally regarded as the gold-standard of experimental support for identifying and annotating miRNAs (Axtell and Meyers, 2018) though the rules are not clear yet in fungi. We agree that developing a standard-rule-set is a high-priority for identifying complete annotation standards. We had a statement (~ line 290) affirming this need, and have now modified this sentence to highlight the need for a sufficient standard.

      “While this minimal rule-set is useful for filtering the lowest-confidence loci, it is likely not sufficient to form the basis of an annotation and this analysis further confirms the need for a standardized pipeline and set of criteria for miRNA annotation in fungi.”

      To address the question of experimental validation, we have included descriptions of loci with strong-functional support in Table S5, including a section discussing top-tier loci in the discussion, described in the response to reviewer 3.

    1. Author Response

      Reviewer #1 (Public Review):

      By studying the effect of Treg depletion in a CD8+ T cell-dependent diabetes model the group around Ondrej Stepanek described that in the absence of Treg cells antigen-specific CD8+ OT-I T cells show an activated phenotype and accelerate the development of diabetes in mice. These cells - termed KILR cells - express CD8+ effector and NK cell gene signatures and are identified as CD49d- KLRK1+ CD127+ CD8+ T cells. The authors suggest that the generation of these cells is dependent on TCR stimulation and IL-2 signals, either provided due to the absence of Treg cells or by injection of IL-2 complexed to specific antiIL-2 mAbs. In vivo, these cells show improved target cell killing properties, while the authors report improved anti-tumor responses of combination treatments with doxorubicin combined with IL-2/JES6 complexes. Finally, the authors identified a similar human subset in publicly available scRNAseq datasets, supporting the translational potential of their findings.

      The conclusions are mostly well supported, except for the following two considerations:

      We are happy for the positive overall evaluation of our manuscript by both reviewers and we are thankful for their specific insightful comments, which helped us to improve the manuscript.

      1) From Fig. 4A and B it is not conclusively shown, that Tregs limit IL-2 necessary for the expansion of OT-I cells and subsequent induction of diabetes. An IL-2 depletion experiment (e.g. with combined injection of the S4B6 and JES6-1 antibodies) would further strengthen this claim. Along these lines, the authors claim "IL-2Rα expression on T cells can be induced by antigen stimulation or by IL-2 itself in a positive feedback loop [20]. Accordingly, downregulation of IL-2Rα in OT-I T cells in the presence of Tregs might be a consequence of the limited availability of IL-2.". The cited reference 20 did observe CD25 upregulation by IL-2 on T cells but the observed effect might only be caused by upregulation of CD25 on Treg cells, which increases the MFI for the whole T cell population. Did the authors observe significant upregulation of CD25 on effector CD4+ and CD8+ T cells in their experiments with IL-2/S4B6 or IL-2/JES6 treatment?

      We added another reference to support our claim (Sereti, I., et al., Clin Immunol, 2000. 97(3): p. 266-76.). Along this line, we also observed that addition of IL-2 in vitro leads to IL-2Rα upregulation on CD8+ T cells (shown in Fig. 4C), which was IL-2Rα level was lower if Tregs were present. We also observed upregulation of IL-2Rα in vivo upon the stimulation of OT-I T cells with OVA and IL-2ic, which is now shown in the Fig. S6C of the revised manuscript.

      To further explore if Tregs limit expansion of OT-I and diabetes progression via IL-2 limitations, we performed the proposed experiment using a combined injection of S4B6 and JES6-1 anti-IL-2 antibodies. At the beginning, we were skeptical that we could completely block the IL-2 using this approach for the following reasons. First, IL-2 is produced locally in the spleen and lymph nodes and might not be easily accessible for the antibodies for a complete block. Second, IL-2 has a relatively short turnover and is continuously produced, but the half-life of the injected antibodies is unknown, which questions the duration of such a block. Third, it is possible that some IL-2 molecules would bound only to one of the two antibodies, which will make it a hyper-stimulating immune-complex, instead of neutralizing it.

      Anyway, we were curious enough to perform this experiment. We used a condition that based on our experience leads to diabetes manifestation in Tregs depleted, but not in Treg replete mice (10 k OT-I T cells, OVA + LPS immunization). One additional group of Treg-depleted mice received a single dose of S4B6 and JES6-1 anti-IL-2 (200 µg of each antibody per mouse). We observed that this IL-2 blocking delayed, but not prevented the development of diabetes in most animals (Fig. 1 below).

      Overall, we believe that this experiment is rather supporting our conclusions concerning the importance of IL-2, although the effect is only partial. However, we decided not to include this experiment in the manuscript, because we do not have the evidence about how efficient the IL-2 blocking was (see above), which makes the interpretation difficult. Because the reviews and the point-by-point response is public in eLife, we believe that showing the data here is appropriate.

      Figure 1. Role of IL-2 blocking on the development of experimental diabetes. Two independent experiments were performed. Statistical significance was calculated using Log-rank (Mantel-Cox) test for survival, and Kruskal-Wallis test for blood glucose (p-value is shown in italics).

      2) The anti-tumor efficacy of KILR cells is intriguing but currently, it is unclear if it is indeed mediated by KILR cells. Have KILR cells been identified by flow cytometry in the BCL1 and B16F10 models treated with doxorubicin and IL-2/JES6? Were specific KILR cell depletion studies conducted, e.g. with an anti-KLRK1 depleting antibody? Additional experiments addressing these questions would be desirable to further support the authors' claims.

      We are thankful to both reviewers for their similar comments concerning the analysis of CD8+ T cells in the tumor model. Addressing these comments lead to very useful data and significantly improved our manuscript.

      We performed the analysis of splenic CD8+ T cells in the BCL1 leukemia model (spleen is the major site of the leukemic cells in this model). We observed that KLRK1+ T cells represented almost half of CD8+ T cells in mice treated with DOX+IL-2, which was much higher frequency than in the control and DOX-only treated mice. Although not all KLRK1+ cells were bona fide KILR cells, the frequencies of KLRK1+ IL-7R+ and KLRK1+ CD49d- cells were also strongly elevated in the Dox+IL-2ic treated mice. Overall, the survival of DOX+IL-2ic treated mice correlated with the frequencies of KILR T cells and KLRK1+ T cells. Moreover, GZMB was almost exclusively expressed by KLRK1+ T cells. We are showing these data in Fig. 7C and Fig. S7B in the revised manuscript.

      In the B16 melanoma model, we analyzed CD8+ T cells in the spleens and also in the tumors. We observed a huge population of KLRK1+ GZMB+ CD8+ T-cell population in the spleen of DOX+IL-2ic-treated mice, but not in the untreated or DOX-only treated mice (Fig. 7F). Both KLRK1+ CD49d+ and KLRK1+ CD49d- CD8+ T cells were substantially more frequent in the DOX+IL-2ic-treated, but not in the untreated or DOX-only treated mice (Fig. S7F). In the tumor, the KLRK1+ CD49d- CD8+ T cells were found at large numbers only in the DOX+IL-2ic-treated mice (Fig. 7G). Moreover, these KLRK1+ CD49d- CD8+ T cells expressed high levels of IL-7R and GZMB only in DOX+IL-2ic-treated, but not in untreated and DOX-only treated mice (Fig. 7H).

      We believe that these new data provide evidence that the combination of immunogenic chemotherapy with IL-2 treatment induced KILR cells in the spleens and in the tumors and that this correlates with the better survival.

      Because the majority of non-naïve CD8+ T cells (and vast majority of GZMB+ CD8+ T cells) in the spleens and tumors of the tumor-bearing mice treated with DOX+IL-2ic were KLRK1+ and because we have shown that the protective effect of the DOX+IL-2ic therapy is largely CD8+ T cell-dependent, we did not find it essential to perform the depletion of KLRK1+ T-cells. We believe that it is almost inevitable that the depletion of KLRK1+ T cells would lead to increased tumor growth as it would probably deplete the majority of antigenspecific CD8+ T cells, mimicking the overall CD8+ T cell depletion. Moreover, we do not have this protocol established.

      Reviewer #2 (Public Review):

      In this study, the authors determine the superior cell killing abilities of KLRK1+ IL7R+ (KILR) CD8+ effector T cells in experimental diabetes and tumor mouse model. They also provide evidence that Tregs suppress the formation of this previously uncharacterized subset of CD8+ effector T cells by limiting IL-2.

      Strength and Limitation

      This study focuses on the relationship between Tregs and CD8+ T cells. They used different experimental diabetes mouse models to reveal that Tregs suppress the CD8+ effector T cells by limiting IL-2. They also found a unique subset of KLRK1+ IL7R+ (KILR) CD8+ effector T cells with superior cell killing abilities through single-cell sequencing, but killing abilities could be inhibited by Tregs. They also tested their theory in in vivo tumor model. The data, in general, support the conclusions; however, some issues need to be fully addressed, as detailed below.

      We are happy for the positive overall evaluation of our manuscript by both reviewers and we are thankful for their specific insightful comments, which helped us to improve the manuscript.

      1) This study used the concentration of urine glucose as the standard for diabetes ({greater than or equal to} 1000 mg/dl for two consecutive days). However, multiple reasons may lead to a high level of urine glucose. As a type I diabetes mouse model, authors could use immunohistological analysis of islet to show the proportion of T cells and islet cells in islet, which can display the geographic distribution of immune cells, severity and histology structure of damaged pancreas islet directly. If possible, different subsets of immune cells, especially CD4 vs CD8+ cells should be stained for their location.

      We added the histological examination of the pancreas in control, DEREG-, and DEREG+ mice using contrast H&E staining and immuno-fluorescence (Fig. 1D-E in the revised manuscript). We observed that the high glucose and blood levels are preceded by the destruction of the pancreatic islets (morphology and decreased insulin production) as well as by the infiltration of the islets with immune cells including CD4+ and CD8+ T cells.

      2) This article shows that KILR effector CD8+ T cells have strong cytotoxic properties. However, they do not describe the potential proliferation ability vs apoptosis of this subset from islets.

      We analyzed the proliferation (KI67 expression) and apoptosis (Annexin V, cleaved Caspase 3) in T cells isolated from the pancreas of DEREG- and DEREG+ mice on day 4 after the induction of diabetes using flow cytometry (Figure 2 below). We did not observe any differences between DEREG- and DEREG+ mice or among different subsets of OT-I T cells in the DEREG+ mice. Essentially, all T cells were proliferative (KI67+) and there was a very low percentage of Annexin V or cleaved Caspase 3 positive cells.

      Figure 2. Lymphocytes were isolated from the pancreas of DEREG- RIP.OVA and DEREG+ RIP.OVA mice on day 4 after the induction of diabetes, and analyzed using flow cytometry. Two independent experiments were performed. Gated on OT-I T cells. Top: proliferation rate based on Ki-67 staining. Representative histogram and MFI (median is shown). Middle: Apoptosis rate based on Annexin V staining. Representative histogram shows Annexin V staining in three populations of OT-I T cells from DEREG+ mouse (“AE” - CD49d+ KLRK1-, “++” - CD49d+ KLRK1+, KILR - CD49d- KLRK1+), total OT-I T cells from DEREG-, and a positive control: WT CD8+ T cells treated with hydrogen peroxide. Middle right: Percentage of Annexin V+ cells and MFI (median is shown). Bottom: Apoptosis rate based on cleaved Caspase 3 staining. Representative dot plots show cleaved Caspase 3 staining of OT-I T cells from DEREG+, DEREG-, and a positive control: WT CD8+ T cells treated with hydrogen peroxide. Bottom right: percentage of cleaved Caspase 3+ cells (median is shown).

      However, we found question concerning proliferation and apoptosis of KILR cells interesting and worth further investigation. For this reason, we assessed the proliferation, survival, and phenotypic stability of naïve, KILR, and effector T cells by their competitive transfer into CD3ε-/- mice. The phenotype of all these three subsets remained stable for 4 days (Fig. 6F), documenting that KILR cells are not just a very transient stage. Moreover, the KILR cells were ~2 fold more abundant then effector cells 3 days after their 1:1 cotransfer into CD3ε-/- mice (Fig. 6G, Fig. 6SE). This was probably caused by their slight advantages in both proliferation and survival (Fig. 6SF-G).

      3) Figure 7 shows that the antitumor efficacy of IL-2 depends on CD8+ T cells. But in this part, there is no data to show the change of KLRK1+ IL7R+ CD8+ effector T cells in tumor tissue. Therefore, the article needs to add more data to verify that IL-2 enhances antitumor ability via KLRK1+ IL7R+ CD8+ effector T cells.

      We are thankful to both reviewers for their similar comments concerning the analysis of CD8+ T cells in the tumor model. Addressing these comments lead to very useful data and significantly improved our manuscript.

      We performed the analysis of splenic CD8+ T cells in the BCL1 leukemia model (spleen is the major site of the leukemic cells in this model). We observed that KLRK1+ T cells represented almost half of CD8+ T cells in mice treated with DOX+IL-2, which was much higher frequency than in the control and DOX-only treated mice. Although not all KLRK1+ cells were bona fide KILR cells, the frequencies of KLRK1+ IL-7R+ and KLRK1+ CD49d- cells were also strongly elevated in the Dox+IL-2ic treated mice. Overall, the survival of DOX+IL-2ic treated mice correlated with the frequencies of KILR T cells and KLRK1+ T cells. Moreover, GZMB was almost exclusively expressed by KLRK1+ T cells. We are showing these data in Fig. 7C and Fig. S7B in the revised manuscript.

      In the B16 melanoma model, we analyzed CD8+ T cells in the spleens and also in the tumors. We observed a huge population of KLRK1+ GZMB+ CD8+ T-cell population in the spleen of DOX+IL-2ic-treated mice, but not in the untreated or DOX-only treated mice (Fig. 7F). Both KLRK1+ CD49d+ and KLRK1+ CD49d- CD8+ T cells were substantially more frequent in the DOX+IL-2ic-treated, but not in the untreated or DOX-only treated mice (Fig. S7F). In the tumor, the KLRK1+ CD49d- CD8+ T cells were found at large numbers only in the DOX+IL-2ic-treated mice (Fig. 7G). Moreover, these KLRK1+ CD49d- CD8+ T cells expressed high levels of IL-7R and GZMB only in DOX+IL-2ic-treated, but not in untreated and DOX-only treated mice (Fig. 7H).

      We believe that these new data provide evidence that the combination of immunogenic chemotherapy with IL-2 treatment induced KILR cells in the spleens and in the tumors and that this correlates with the better survival.

      4) It is unclear why the authors chose Dox to combine with IL-2/JES6. The authors should provide a more rational introduction to bridge such a combination. Authors should also explain the reason why there is no antitumor effect of IL-2/JES6 treatment alone.

      The experiments with OT-I mice showed that the formation of KILR cells required both the antigenic stimulation and IL-2 signals. We believe that there is only very week antigenic stimulation by the tumor itself. For this reason, we combined the treatment with the chemotherapy Doxorubicin, which is known to induce immunogenic cell death of the tumor cells (e.g., Casares et al. 2005, PMID: 16365148). We believe that doxorubicin induces the death of (some) tumor cells and the release and presentation of their tumorspecific antigens. Without it, the tumor are simply too “cold” to induce sufficient T-cell response. We emphasized this in the revised version of the manuscript.

      Importantly, some of us observed a similar effect of IL-2ic in a combination with check-point blockade therapy (without chemotherapy) in a different tumor model, which documents that the chemotherapy is not essential for this effect (unpublished data).

    1. Author Response

      Reviewer #1 (Public Review):

      The authors are trying to determine how time is valued by humans relative to energy expenditure during non-steady-state walking - this paper proposes a new cost function in an optimal control framework to predict features of walking bouts that start and stop at rest. This paper's innovation is the addition of a term proportional to the duration of the walking bout in addition to the conventional energetic term. Simulations are used to predict how this additional term affects optimal trajectories, and human subjects experiments are conducted to compare with simulation predictions.

      I think the paper's key strengths are its simulation and experimental studies, which I regard as cleverly-conceived and well-executed. I think the paper's key weakness is the connection between these two studies, which I regard as tenuous for reasons I will now discuss in detail.

      The Title asserts that "humans dynamically optimize walking speed to save energy and time". Directly substantiating this claim would require independently manipulating the (purported) energy and time cost of walking for human subjects, but these manipulations are not undertaken in the present study. What the Results actually report are two findings:

      1. (simulation) minimizing a linear combination of energy and time in an optimal control problem involving an inverted-pendulum model of walking bouts that (i) start and stop at rest and (ii) walk at constant speed yields a gently-rounded speed-vs-time profile (Fig 2A);

      2. (experiment) human subject walking bouts that started and stopped at rest had self-similar speed-vs-time profiles at several bout lengths after normalizing by the average duration and peak speed of each subject's bouts (Fig 4B).

      If the paper established a strong connection between (1.) and (2.), e.g. if speed-vs-time trajectories from the simulation predicted experimental results significantly better than other plausible models (such as the 'steady min-COT' and 'steady accel' models whose trajectories are shown in Fig 2A), this finding could be regarded as providing indirect evidence in support of the claim in the paper's Title. Personally, I would regard this reasoning as rather weak evidence - it would be more accurate to assert 'brief human walking bouts look like trajectories of an inverted-pendulum model that minimize a linear combination of energy and time' (of course this phrasing is too wordy to serve as a replacement Title -- I am just trying to convey what assertion I think can be directly substantiated by the evidence in the paper). But unfortunately, the connection between (1.) and (2.) is only discussed qualitatively, and the other plausible models introduced in the Results are not revisited in the Discussion. To my naive eye, the representative 'steady min-COT' trace in Fig 2A seems like a real contender with the 'Energy-Time' trace for explaining the experimental results in Fig 4, but this candidate is rejected at the end of the third-to-last paragraph in the 'Model Predictions' subsection of Results based on the vague rationale that is never revisited.

      We have addressed most of this comment above, but respond here regarding Fig. 4. The argument against steady min-COT should also point out the peak speed. The Results have been revised thus: “In contrast to the min-COT hypothesis, the human peak speeds increased with distance, many well below the min-COT speed of about 1.25 m/s. The human speed trajectories did not resemble the trapezoidal profiles of the steady min-COT hypothesis for all distances, nor the triangular profiles of steady acceleration.”

      An additional limitation of the approach not discussed in the manuscript is that a fixed step length was prescribed in the simulations. The 'Optimal control formulation' subsection in the Methods summarizes the results of a sensitivity analysis conducted by varying the fixed step length, but all results reported here impose a constant-step-length constraint on the optimal control problem. Although this is a reasonable modeling simplification for steady-state walking, it is less well-motivated for the walking bouts considered here that start and stop at rest. For instance, the representative trial from a human subject in Figure 8 clearly shows initiation and termination steps that differ in length from the intermediate steps (visually discernable via the slope of the dashed line interpolating the black dots). Presumably different trajectories would be produced by the model if the constant-step-length constraint were removed. It is unclear whether this change would significantly alter predictions from either the 'Energy-Time' or 'steady min-COT' model candidates, and I imagine that this change would entail substantial work that may be out of scope for the present paper, but I think it is important to discuss this limitation.

      This is addressed elsewhere (Essential Revisions 2), but we explain more here. One of the parameter studies included step length increasing with speed according to the human preferred relationship. This is included in Fig. 3, and so we concluded that variable step lengths are not critical to the speed trajectories. A related assumption is that the energetic cost of modulating step length/frequency is small compared to the step-to-step transition cost. We believe that humans expend substantial energy for both costs, but that the overall cost of walking is still dominated by step-to-step transitions.

      With my concerns about the paper's framing and through-line noted as above, I want to emphasize that I regard the computational and empirical work reported here to be top-notch and potentially influential. In particular, the experimental study's use of inexpensive wearable sensors (as opposed to more conventional camera-based motion capture) is an excellent demonstration of efficient study design that other researchers may find instructive. To maximize potential impact, I encourage the authors to release their data, simulations, and details about their experimental apparatus (the first two I regard as essential for reproducibility - the third a selfless act of service to the scientific community).

      I think the most important point to emphasize is that the bulk of prior work on human walking has focused on steady-state movement - not because of the real-world relevance (since one study reports 50% of walking bouts in daily life are < 16 steps as summarized in Fig 1B), but rather because steady walking is a convenient behavior to study in the laboratory. Significantly, this paper advances both our theoretical and empirical understanding of the characteristics of non-steady-state walking.

      It is also significant to note the relationship between this study, where time was incorporated as an additive term in the cost of walking, with previous studies that incorporated time in a multiplicative discount in the cost of eye and arm movements. There is an emerging consensus that time plays a key role in the generation of movement across the body - future studies will discern whether and when additive or multiplicative effects dominate.

      We have acknowledged this in a brief sentence: “Indeed, we have found a similar valuation of time to explain how reaching durations and speed trajectories vary with reaching distance (Wong et al., 2021).” As an aside, in that reference we measured metabolic cost of cyclic arm reaching, combined it with a linear time cost, and predicted reaching durations vs. distance and bell-shaped hand speed trajectories. Others (Shadmehr et al. Curr Biol. 2016) have proposed multiplicative (hyperbolic) temporal discounting to explain durations, but the cost formulas are not dynamical, and cannot produce trajectories. We agree with reviewer’s point, but we think the evidence for hyperbolic discounting is not strong. Linear time costs are simpler and work at least as well. This is of great interest to us, but we didn’t discuss beyond the brief mention above, because we fear it is too far afield.

      Reviewer #2 (Public Review):

      This paper provides a novel approach to quantifying the tradeoff between energetic optimality during walking and the valuation of time to travel a given distance. Specifically, the authors investigated the relationships between walking speed trajectories, distance traveled, and the valuation of (completion) time. Time has been proposed as a potential factor influencing movement speed, but less is understood about how individuals balance energetic optimality and time constraints during walking. The authors used a simple, sagittal-plane walking model to test competing hypotheses about how individuals optimize gait speed from gait initiation to gait termination. Their approach extends literature in the space by identifying optimal gaits for shorter, partially non-steady speed walking bouts.

      The authors successfully evaluated three competing walking objectives (constant acceleration, minimum cost of transport at steady speed, and the energy-time objective), showing that the energy-time objective best matched experimental data in able-bodied adults. Although other candidate objectives may exist, the paper's findings provide a likely-generalizable explanation of how able-bodied humans select movement strategies that encompass studies of steady-speed walking.

      Overall, this paper provides a foundation for future studies testing the validity of the energy-time hypothesis for human gait speed selection in able-bodied and patient populations. Extensions of this work to patient populations may explain differences in walking speed during clinical assessments and provide insight into how individual differences in time valuation impact performance on assessments. For example, understanding whether physical capacity or time valuation (or something comparable) better explains individual differences in walking speed may suggest distinct approaches for improving walking speed.

      Strengths:

      The authors presented a compelling rationale for the tradeoffs between energetic optimality and time and their results provide strong support for a majority of their conclusions. In particular, significant reductions in the variance of experimental speed trajectories provides good support for the scaling of speeds across individuals and the plausibility of the energy-time hypothesis. Comparison to theoretical (model-based) reductions across difference time valuation (cT) parameters would further enhance confidence in the practical significance of the variance reductions. Further, while additional work is needed to determine the range of "normal" valuations of time, the authors present experimental ranges that appear reasonable and are well explained. The computational and analytical methods are rigorous and are supported by the literature. Overall, the paper's conclusions are consistent with experimental and computational results.

      The introduction of a model-based analytical approach to quantify the effects of time valuation of walking could generalize to test other cost functions, populations, or locomotion modes. Further, models of varying complexity could be implemented to test more individualized estimates of metabolic cost, ranging from 3D dynamic walking models (Faraji et al., Scientific Reports, 2018) or physiologically-detailed models (Falisse et al., Journal of The Royal Society Interface. 2019). The relatively simple set of analyses used in this paper is consistent with prior literature and should generalize across applications and populations.

      The authors justified simplifications in the analysis and addressed major limitations of the paper, such as using a fixed step length in model predictions, using a 2D model, and basing energy estimates on the mechanical work of a simple model. It is unlikely that the paper's conclusions would change given additional model complexity. For example, a 3D walking model would need to control frontal plane stability. However, in able-bodied adults, valuation of frontal-plane stability during normal walking would not likely alter the overall shape of the predicted speed profiles.

      Weaknesses:

      The primary weakness of this work is that alternative objectives may provide similar speed profiles and thus be plausible objectives for human movement. For example, the authors tested an objective minimizing the steady-speed cost of transport. This cost function is consistent with the literature, but (as predicted) unlikely to explain acceleration and deceleration during gait. An objective more comparable to the energy-time hypothesis would be to minimize the net energy cost over the entire bout, including accelerations and decelerations. This may produce results similar to the energy-time hypothesis. However, a more complex model that incorporates non-mechanical costs (e.g., cost of body weight support) may be needed to test such objectives. Therefore, the energy-time hypothesis should be considered in the context of a simple model that may be incapable of testing certain alternative hypotheses.

      We have addressed some of this comment in Essential Revisions 4.

      We are unsure what is meant by “net energy over the entire bout, including accelerations and decelerations.” Our hypothesis uses total (gross) energy over the entire bout, and already includes accelerations and decelerations. If “net” refers to the customary definition of metabolic energy minus resting, then it differs from our gross cost (Fig. 6A) only in the amount of constant offset, namely resting cost. Removing the offset is equivalent to a decrease in C_T. As shown in Fig. 3, this would reduce peak speeds magnitudes but not change the shape of the speed, peak speed, and duration patterns. There is also another interpretation where the cost of walking includes only net energy, and the cost of time includes the resting metabolic rate (Fig. 6C). This interpretation yields the same predictions, the only difference is whether resting rate is treated as an energy or a time cost. We have not made further changes, because we are unsure what the reviewer meant. The difference between net and total is at most one of degree, not of qualitatively different behavior.

      We do not address the proposed “cost of body weight support” because we are unsure of the definition. There is a hypothesis by Kram & Taylor (1990) that defines a metabolic cost rate proportional to body weight divided by ground contact time. It is unclear if this is what reviewer is referring to, so we did not include it in the manuscript. However, IF this is what reviewer means, we do not consider the Kram & Taylor (“K&T”) cost to be a viable hypothesis for computational models. It is a correlation observed from data, which is inadequate as a model, for several reasons. First, in a model optimization, it leads to absurd predictions, because metabolic cost could then be reduced simply by increasing stance (contact) time. A model could do so simply by walking with very long double support phases, or running with a very brief aerial phase, both of which people clearly do not do. In walking, extended double support durations result in much higher metabolic cost (Gordon et al., APMR 2009). Models must operate quite literally on whatever objective they are given, and here, a literal interpretation of K&T makes absurd predictions.

      Another issue with the K&T cost is that it is not mechanistic. A mechanistic model is concerned with the forces and work performed by an actuator such as muscle. Muscles experience forces far greater than body weight, not captured by the K&T cost. Of course, overall cost for animal locomotion is roughly proportional to body weight, but what a model needs is a cost associated with its control inputs, e.g. actuator forces.

      We have also examined the K&T hypothesis in previous publications. In Schroeder & Kuo (Plos Comp Biol 2021), we used a simple model of running that minimizes an energetic cost dominated by mechanical work. Even though the model has no cost similar to K&T, its predicted metabolic cost is correlated with the K&T cost. Correlation does not imply causation, which is known in this model.

      We have also examined the K&T hypothesis in experimental data. In Riddick & Kuo (Sci Rep 2022), we examined human data and found that there are many variables that correlate quite well with metabolic cost, including the K&T correlate. We use human data to show how mechanical work could explain metabolic cost, and even if it does, the K&T cost appears as a correlate. In our interpretation, both model and data that experience an energetic cost proportional to mechanical work may have a number of variables correlated to energy cost. Those correlates need not have any causal influence.

      There are, of course, many similar correlates that could be or have been proposed to explain the metabolic cost of running. Most such correlates are not operational enough to work in a model, and it is also difficult to predict what a reader might consider plausible, even if we do not.

      We agree with this statement: “the energy-time hypothesis should be considered in the context of a simple model that may be incapable of testing certain alternative hypotheses.” In fact, in Discussion of limitations we listed other potential factors (e.g. forced leg motion, stability, 3D motion), and stated “We did not explore more complex models here, but would expect similar predictions to result if similar, pendulum-like principles of work and energetic cost apply.” We had also cited other models that include such factors and are compatible with the step-to-step transition concept. Finally, we already stated, “the Energy-Time hypothesis should be regarded as a subset of the many factors that should govern human actions, rendered here in a simple but quantitative form.” We believe this is already aligned with reviewer’s comment.

      An experimental design involving an intervention to perturb the valuation of time would provide stronger support for time being a critical factor influencing gait speed trajectories. The authors noted this limitation as an area of future work.

      While the results are compelling, the limited sample size and description of participants limit the obvious generalizability of the results. Older adults tend to have higher metabolic costs of walking than younger adults, which may alter the predicted time-energy relationships (Mian OS, et al., Acta physiologica. 2006). As noted in the introduction, differences in walking speeds have been observed in different living environments. General information on where participants lived (city, small town, etc...) may provide readers with insight into the generalizability of the paper's conclusions. Additionally, the experimental results figures show group-level trends, but individual-specific trends and the existence of exceptional cases are unclear.

      We wish to defend the “limited sample size.” The present sample size was (in our opinion) sufficient to test the hypothesis, and we have reported confidence intervals and other statistics where appropriate. (As always, it is up to the individual reader to decide whether they are convinced or not.) It is true that the data may be insufficient for other purposes, but we cannot anticipate or address all other purposes.

      We appreciate the relevant connection to aging. We have added to Discussion, “We do not know whether that family [of trajectories] also applies to older adults, who prefer slower steady speeds and expend more energy to walk the same speed (Malatesta, 2003). Perhaps an age-related valuation of time might explain some of the differences in speed.”

      We agree about the participants, and have added “Subjects were recruited from the community surrounding the University of Calgary; the city has a moderately affluent population of about 1.4 M, with a developed Western culture.”

      No specific reviewer recommendation was made about individual-specific trends, but there are several indicators already included in the manuscript. First, all trials from all subjects are shown in Fig. 4A. Any truly exceptional cases should be visible as substantial deviations from the group. Second, the normalization by peak speed in Fig. 4B shows how individuals tend to be fairly consistent in their preferred speeds, in that shorter and longer bouts of an individual are consistent with each other, even if some walk faster than others. Third, this observation is analyzed more quantitatively by the reduction in standard deviations with normalization (Results). Fourth, we will provide a data repository with all the data, to allow readers to inspect individuals more carefully (Data availability statement).

      The authors' interpretation of clinical utility is vague and should be interpreted with caution. A simple pendulum-based walking model is unlikely to generalize to patient populations, whose gait energetics may involve greater positive and negative mechanical work (Farris et al., 2015; Holt et al., 2000). Additionally, the proposed analytical framework based on mechanical work as a proxy for the metabolic cost may not generalize to patient populations who have heterogeneous musculotendon properties and increased co-contraction (e.g., children with cerebral palsy; Ries et al., 2018). Consequently, the valuation of time for an individual could be incorrectly estimated if the estimates of metabolic cost were inaccurate. Therefore, as the authors noted for their able-bodied participants, more precise measures of metabolic rates will be critical for translating this work into clinical settings.

      We agree, and did not intend to say that clinical populations must walk the same way, rather that the Normal patterns could be used as a basis of comparison. To make this clearer, we have amended the Discussion of clinical implications (new text emphasized): “it may be possible to predict the duration and steady speed for another distance, referenced from a universal family of walking trajectories. We have identified one such family that applies to healthy individuals with pendulum-like gait. Of course, some clinical conditions might be manifested by a deviance from that family, perhaps in the acceleration or deceleration phases, or in how the trajectories vary with distance. If quantified, such deviance might prove clinically useful… the characterization of distance-dependent speed trajectories can potentially provide more information than available from steady speed alone.”

      We agree that the valuation of time can be inaccurate if the metabolic cost is inaccurate. That is why we did not make a precise estimate of the valuation. We have amended the text to help clarify that our rough estimates are based on previous data. In addition, our general scientific intent is to reveal behavioral sensitivities, for example of walking duration to bout distance, as opposed to absolute numerical quantities.

    1. Author Response

      Reviewer #2 (Public Review):

      One other major concern I have regards the conclusion that the participants in these studies use an additive rather than a multiplicative rule to integrate the risk information. The additive rule is problematic in general because it fails to predict the reversal in the effect of probability on payoffs when the payoffs change sign. More specifically, increasing the probability of winning increases the probability of choosing an option when the payoff is positive, but the effect reverses when the payoff is negative. One needs to impose some pretty ad hoc assumptions to make the additive model account for this fundamental interaction between probability and payoff. Of course, the experiments reported here did not include negative payoffs, and so didn't run into this problem. In fact, when the payoffs are positive, it is possible to transform the multiplicative model to an additive model by a log transform. This transformation is only possible for the simple type of gamble investigated in this manuscript - a single amount to win with some probability of winning, otherwise win or lose nothing. If the gambles involved more than one outcome, then the theorist needs to deal with a sum of products and the log transform is no longer possible. For these reasons I am very skeptical about the general application of a summation rule for probability and value in risk choice. The authors do address this issue to some extent. They point out the abundance of other research supporting a multiplicative rule, and they speculate that the additive rule may have occurred within the restrictions of this special situation. The latter discussion is a good start, but I suggest that the authors discuss this fundamental issue in more depth.

      Thank you for this very insightful comment. We have now included more in-depth discussions about the decision rules (multiplicative vs. additive) in our Discussion, in which we have absorbed and reflected many of the insights offered by Reviewer #2.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper tests the hypothesis that 1/f exponent of LFP power spectrum reflects E-I balance in a rodent model and Parkinson's patients. The authors suggest that their findings fit with this hypothesis, but there are concerns about confirmation bias (elaborated on below) and potential methodological issues, despite the strength of incorporating data from both animal model and neurological patients.

      First, the frequency band used to fit the 1/f exponent varies between experiments and analyses, inviting concerns about potentially cherry-picking the data to fit with the prior hypothesis. The frequency band used for fitting the exponent was 30-100 Hz in Experiment 1 (rodent model), 40-90 Hz in Experiment 2 (PD, levodopa), and 10-50 Hz in Experiment 3 (PD, DBS). Ad-hoc reasons were given to justify these choices, such as " to avoid a spectral plateau starting > 50 Hz" in Experiment 3. However, at least in Experiment 3 (Fig. 3), if the frequency range was shifted to 1-10 Hz, the authors would have uncovered the opposite effect, where the exponent is smaller for DBS-on condition.

      We agree that parameter choice is crucial, in particular, choice of the fitting range. In addition to the 40-90 Hz range (Figure 2C), we have performed aperiodic fitting for five other frequency ranges to test to what extent the reported results are sensitive to the selected frequency range (Figure S2A). This analysis showed that the results are robust when a broad frequency range from 30 to 95 Hz was chosen, which is consistent with what has been suggested by Gao et al., 2017 to make inferences on the E/I ratio.

      Accordingly, we have now repeated the analyses for the animal data with the same fitting range used for the ON-OFF medication comparison in humans. Along with Figure S2A where different frequency ranges were tested for data used in Figure 2, this shows that the results in Figure 1 and 2 hold up with higher aperiodic exponents when STN spiking is low and vice versa. Therefore, a broad fitting range from 30 to 90 Hz (excluding harmonics of mains interference) generates consistent results for both human and animal data.

      We opted against a fitting range from 1-10 Hz because of two restraints highlighted in Gerster et al., 2022. First, a fitting range starting at 1 Hz could have a larger y-intercept due to the presence of low-frequency oscillations. This could lead to a larger aperiodic exponent and could be misinterpreted as stronger neural inhibition. Therefore, the lower fitting bound should be chosen to best avoid known oscillations in the delta/theta range (Gerster et al., 2022). Second, frequencies should be chosen to avoid oscillations crossing fitting range limits. In Figure 3A, oscillations in the theta/alpha band both ON and OFF stimulation would complicate parameterisation and would likely result in spurious fits.

      We also tested the effect of changing the peak threshold, peak width limits and the aperiodic fitting mode on FOOOF parameterisation. Increasing and decreasing the peak threshold from its default value (at 2 standard deviations) did not change results (Figure S2B). Similarly, adapting the peak width limits did not affect the exponent difference between medication states (Figure S2C). Finally, choosing the ‘knee’ mode instead of ‘fixed’ resulted in fundamentally different aperiodic fits that did not differ anymore with medication (Figure S2D). This is most likely a consequence of the near linear PSD in log-log space from 40 to 90 Hz (Figure 2B). If there is no bend in the PSD, the FOOOF algorithm will be forced to assign a ‘random’ knee and the aperiodic fit will then mostly reflect the slope of the spectrum above the knee point.

      Second, there are important, fine-grained features in the spectra that are ignored in the analyses, which confounds the interpretation.

      One salient example of this is Fig. 2, where based on the plots in B, one would expect that the power of beta-band oscillations to be higher in the Med-On condition, as the oscillatory peaks rise higher above the 1/f floor and reach the same amplitude level as the Med-OFF condition (in other words, similar total power is subtracted by a smaller 1/f power in the Med-ON condition). But this impression is opposite to the model-fitting results in C, where beta power is lower in the Med-ON condition.

      We agree that PSDs over a broad frequency range (e.g. 5-90 Hz) typically do not have a single 1/f property. Instead, there can be multiple oscillatory peaks and ‘knees/bends’ in the aperiodic component. For these cases, fitting should be performed using the knee mode. To extract periodic beta power, we parameterise the PSD between 5 and 90 Hz and select the largest oscillatory component between 8 and 35 Hz (this range was extended to include the large oscillatory peaks in hemispheres 27 and 28 at ~ 10 Hz, see Figure R1). We now use the knee mode, to model the aperiodic component between 5 and 90 Hz when periodic beta power is calculated (see our previous comments). Figure R1 provides an overview of all PSDs ON and OFF medication, the aperiodic fits (5-90 Hz (knee) and 40-90 Hz (fixed)) and the detected beta peaks. In spite of this modification in our pipeline, periodic beta power is still larger OFF medication (Figure 2C), in keeping with previous studies (Kim et al., 2022; Kühn et al., 2006; Neumann et al., 2017; Ray et al., 2008). We acknowledge the reviewer’s point that the average spectra in Figure 2B are misleading in that respect and for clarity provide here all 30 spectra in both conditions. Note that the calculation of aperiodic exponents between 40 and 90 Hz is not affected by this change in our pipeline. Figures 2B, D+E were revised accordingly.

      We have repeated the analysis of our animal data using the ‘knee mode’ with a fitting range from 30 to 100 Hz. However, using the knee mode did not improve the goodness of fit or fitting error and, in fact, made them slightly worse (Figure S5). Based on this, we think the fixed mode would provide a more holistic model for the PSDs used in this analysis. We have now added this comparison in Figure S5 to justify the choice of the fixed mode.

      Figure R1. PSDs from all 30 hemispheres ON and OFF medication. Aperiodic fits are shown between 5-90 Hz (knee mode), which was used to calculate the power of beta peaks, and between 40-90 Hz (fixed mode), which was used to estimate the aperiodic exponent of the spectrum.

      Another example is Fig. 1C, where the spectra for high and low STN spiking epochs are identical between 10 and 20 Hz, and the difference in higher frequency range could be well-explained by an overall increase of broadband gamma power (e.g. as observed in Manning et al., J Neurosci 2012, Ray & Maunsell PLoS Biol 2011). This increase of broadband gamma power is trivially expected, as broadband gamma power is tightly coupled with population spiking rate, which was used to define the two conditions.

      We agree with the reviewer that in Figure 1C, high and low STN spiking states could well be separated by average gamma power (Figure 1E), too. However, the difference of aperiodic exponents is more prominent between both conditions (Figure 1D+E, based on p-values). What is more, in human LFP data recorded from clinical macroelectrodes, medication states can be reasonably well distinguished using the aperiodic exponent between 40-90 Hz (Figure 2C), but average gamma power does not separate both states (Figure S3A). This suggests that the aperiodic exponent reflects more than just power differences in the high gamma regions. In addition, power changes do not inevitably change the aperiodic exponent and vice versa as elaborated in (Donoghue et al., 2020).

      Manning et al., 2009 show that the power spectrum is shifted to higher power values at all observed frequencies (2-150 Hz) as firing rates increase. As the reviewer points out, power spectra of our data are almost identical between 10-20 Hz (despite the marked spiking differences) and only drift apart from > 20 Hz (Figure 1C). This is a relevant difference between our study and Manning et al., 2009 and suggests that power differences in the gamma range are not solely explained by differences in spiking. This is confirmed when cortical activity at different spikes/sec is modelled (Miller et al., 2009). The entire spectrum is shifted to higher power values if spiking rates increase.

      Ray & Maunsell, 2011 reported low (30-80 Hz) and high (> 80 Hz) gamma activity in the macaque visual cortex, with a positive correlation between spiking activity and high gamma activity. However, activities in the low gamma range (30-80 Hz), which largely overlaps with the frequency range in our study, does not necessarily correlate with firing rates.

      In conclusion, the link between gamma power and spiking activity is not as strong as alluded. Even if the change in spiking activities can lead to changes of both gamma power and the aperiodic exponent, the aperiodic exponent would still constitute a measure to separate E/I levels and medication states.

      The above consideration also speaks to a major weakness of the general approach of considering the 1/f spectrum a monolithic spectrum that can be captured by a single exponent. As the authors' Fig. 1C shows, there are distinct frequency regions within the 1/f spectrum that have different slopes. Indeed, this tripartite shape of the 1/f spectrum, including a "knee" feature around 40-70 Hz which is well visible here, was described in multiple previous papers (Miller et al., PLoS Comput Biol 2009; He et al., Neuron 2010), and have been successfully modeled with a neural network model using biologically plausible mechanisms (Chaudhuri et al., Cereb Cortex, 2017). The neglect of these fine-grained features confounds the authors' model fitting, because an overall increase in the broadband gamma power - which can be explained straightforwardly by the change in population firing rates - can result in the exponent, fit over a larger spectral frequency region, to decrease. However, this is not due to the exponent actually changing, but the overall increase of power in a specific sub-frequency-region of the broadband 1/f activity.

      We have now used the knee mode for aperiodic fits between 5 and 90 Hz when periodic beta power is calculated. We agree that this broad frequency range is unlikely to have a single 1/f component.

      We have also repeated the analysis of our animal data using the knee mode for aperiodic fits between 30 and 100 Hz (Figure S5). However, the goodness of fits had barely changed. In fact, the R2 and error become slightly worse. In addition, the knee parameter complicates interpretation of the aperiodic exponent and has to be considered along with the knee frequency. What is more, we do not see this bend around 40-70 Hz in all subjects. We show PSDs of representative LFP channels in Figure R2 and need to assert that the knee around 40-70 Hz is not a robust finding in our data set. Therefore, we chose the fixed mode for parameterisation within this frequency band.

      Please see our answer to the previous comment regarding the link between broad gamma power and changes in population firing rates.

      Figure R2. PSDs of representative PSD channels for each animal (data used in Figure 1C). The knee around 40-70 Hz is not a robust finding in all PSDs.

    1. Author Response

      Reviewer #3 (Public Review):

      Argenty et al. investigated the role of Lissencephaly gene 1 (LIS1), a dynein-binding protein, in thymic development and T cell proliferation. They find that LIS1 is essential for the early stages of T and B cell development, and demonstrate that loss of LIS1 has a negative impact on the transition from DN3 to DN4 thymocytes and on the maturation of pre-pro-B cells into pro-B cells in the bone marrow. Using a CD2Cre Lis1fl/fl murine model, they observe that in thymocytes LIS1 is critical for DN3 proliferation and completion of cell division. Then, using a CD4Cre Lisfl/fl model (Cd4 promoter is up-regulated just in later stages of thymic development and, thus, does not impact DN3 thymocytes) they show that LIS1-deficient CD4 T cells have proliferation defects following both TCR-dependent or -independent stimulation, which results in apoptosis. They also confirm previous reports that show that LIS1-deficient CD8 T cells do not have their proliferation impaired upon TCR stimulation, which suggests that these two cell types rely on different mechanisms to regulate the cell cycle. Finally, the authors make efforts to determine how LIS1 regulates proliferation in thymocytes and CD4 T cells. Interestingly, they show that LIS1 is important for chromosome alignment and centrosome integrity and provide data that support a model where LIS1 would facilitate the assembly of active dyneindynactin complexes. These data provide interesting insights into how different cell types use distinct strategies to undergo mitosis and how this can impact on their proliferation and fate decisions. The conclusions of the manuscript are mostly supported by the provided data, although certain aspects can be further investigated and clarified.

      Strengths of the paper:

      By combining a re-assessment of previous reports with new findings, the data from this manuscript convincingly demonstrates that LIS1 is crucial for cell proliferation in certain development steps/cell types. Furthermore, the manuscript provides clear evidence of how LIS1 loss causes proliferation defects by disrupting centrosome integrity and chromosome alignment both in CD4+ T cells and thymocytes.

      Weakness of the paper:

      Although authors successfully address the mechanistic role of LIS in thymocyte and CD4+ T cell division, the manuscript would be strengthened by both providing further evidence to support some of their conclusions and a review of some speculations raised in the discussion.

      In Figure 1, the authors claim that LIS1 is not required for pre-TCR assembly, but for expansion/proliferation of DN3 thymocytes as a step prior to reaching the DN4 stage. However, authors indeed observe increased expression of CD5 (which is a downstream event of Notch and IL-7R signalling). Thus, from the data provided it is not clear whether signalling through Notch or IL-7R is definitely not affected, which could be clarified by assessing the expression of other downstream targets of these molecules.

      CD5 is a downstream target of the pre-TCR signaling but to our knowledge, it is not a downstream target of Notch or IL-7R signaling. The sentence p7 of the initial manuscript was re-formulated since we understand that it could be misleading. However, we fully agree with the reviewer’s comment on Notch and IL-7R signaling and included new data in the revised version of the manuscript to address this point. Notch signaling stimulates metabolic changes which lead to the increase of thymocyte cell-size following the b-selection checkpoint (Ciofani M. et al., Nature Immunology, 2005; Maillard I. et al., The Journal of Experimental Medicine, 2006) and to the up-regulation of the transferrin receptor CD71 (Kelly, A.P. et al., The EMBO journal, 2007). We now show in Figure 1E of the revised manuscript that the loss of LIS1 does not affect the average cell-size of post-b-selection thymocytes and the expression level of CD71 in these cells, suggesting that Notch signaling is preserved in the absence of LIS1. This was confirmed in vitro following stimulation of DN3a thymocytes with OP9-dl1 cells (Figure 2D of the revised manuscript). In this Figure, we also analyzed the expression level of Bcl-2, which is regulated by IL-7R signaling (von Freeden-Jeffry, U. et al., Immunity, 1997). We show that Bcl-2 is comparable in abundance in LIS1 wild-type and LIS1-deficient thymocytes following stimulation with OP-9dl1, suggesting that Il-7R signaling is not affected by the absence of LIS1.

      In Figure 3, the authors mostly confirm previous data from Ngoi, Lopez, Chang, Journal of Immunology, 2016 (reference 34), but also provide evidence of a role of LIS1 in CD4+ T cell proliferation in more physiological setups, using OT2-CD4-Cre Lis1flox/flox (or OT2 Lisflox/flox as controls) in adoptive transfer experiments followed by antigen-specific immunization. However, the evidence provided by the authors about proliferation defects in LIS1-deficient cells in this context is limited by the early timepoint chosen: day 3 post-immunization.

      We choose to analyze CD4+ T cells at day 2 and 3 after immunization because we sought to catch early cell-division waves through CTV dilution. We also wanted to show that LIS1 deficient CD4+ T cells could normally survive and migrate to lymph nodes before they start to proliferate. Given the dramatic effect of LIS1 on CD4+ T-cell proliferation at day 3, we anticipated that very low numbers of LIS1 deficient cells would survive at later time points after immunization. To address the reviewer’s comment, we transferred OT2+CD45.1+ CD4+ T cells stained with CTV in C57BL/6 mice and analyzed the percentages and numbers of CD45.1+ T cells as well as the dilution of CTV in those cells at day 7 after immunization. As expected, all CD45.1+ cells were negative for CTV at this time of analysis (data not shown). The percentages and numbers of CD45.1+ T cells were strongly decrease in the absence of LIS1 in comparison to wild-type controls (Figure 3 - Figure Supplement 2C), confirming results obtained at day 3 after immunization.

      In the discussion, the authors speculate about the differences observed between CD4 and CD8 T cells, as the latter do now show proliferative defects upon TCR-triggered stimulation, and come up with the hypothesis that LIS1 might be important for symmetric cell divisions, but not for asymmetric cell divisions. However, the arguments used by the authors have few caveats, especially because CD4+ T cells can also undergo asymmetric cell division following TCR-triggered stimulation upon the first cognate antigen encounter (Chang et al., Science, 2007, Ref. 8).

      We agree that CD4+ T cells can undergo asymmetric division (actually, this is mentioned and referenced p3 and p18 of the manuscript). However, it is unknown whether these divisions occur systematically or whether they occur with variable frequency which could be context-dependent. It is also unclear whether CD4+ and CD8+ T cells have similar rates of asymmetric division. The literature is lacking of comparative studies in which cellular events associated to mitosis would be investigated side-by-side in those two subsets. As mentioned to reviewer-1, only one study to our knowledge performed a comparative analysis of T-bet repartition in daughter cells after a first round of cell division in CD4+ and CD8+ T cells (Chang, J. T. et al., Immunity, 2011). They found that T-bet segregates unequally in daughter cells in both CD4+ and CD8+ T cells. However, the disparity between daughter cells was higher in CD8+ T cells as compared to that in CD4+ T cells (5- versus 3-fold). This suggests that key molecules are either more equally (or less unequally) distributed in daughter cells from the CD4+ lineage or that the rate of symmetric divisions is higher in CD4+ T cells than in the CD8+ T cells. Those results are in accordance with our interpretation and previous findings (Yingling, J. et al., Cell, 2008; Zimdahl, B. et al, Nature Genetics, 2014), suggesting that LIS1 is predominantly involved in mitosis associated to symmetric divisions. Another possibility to explain this difference is that asymmetrical division might occur at different stages in CD4+ and CD8+ T cells. Although some asymmetrical divisions have been detected early after antigen encounter in CD4+ T cells, a more recent study from the same group suggest that asymmetric division might occur mainly later after several rounds of divisions of CD4+ T cells to enable self-renewal to be coupled to production of differentiated effector CD4+ T cells (Nish, S. A., Journal of Experimental Medicine, 2017). It is therefore possible that LIS1 could be critical early in CD4+ T cell expansion, when cells mainly divide through symmetrical process, and less critical later when cells are engaged in asymmetrical division. This is now discussed in greater details p18 of the revised version of the manuscript.

      Finally, the authors discuss that mono-allelic LIS1 defects might contribute to malignancies. Certainly not all points raised in the discussion need to be experimentally addressed, but for this particular hypothesis the authors would likely have the tools to achieve that, which would broaden the relevance of understanding LIS1 function.

      We have addressed this point experimentally in the revised version of the manuscript. We show that mono-allelic LIS1 deficiency does not have a significant impact on the percentages of thymocyte populations in Cd2-Cre Lis1flox/+ mice (Figure 1 - Figure Supplement 1B) and on the numbers of peripheral T cells in Cd4-Cre Lis1flox/+ (Figure 3 - Figure Supplement 1E), suggesting that LIS1 does not operate in a dose-dependent fashion in the context of T-cell development and T-cell homeostatic maintenance. Additionally, Cd4-Cre Lis1flox/+ CD4+ T cells proliferate effectively following TCR and CD28 stimulation (Figure 3 - Figure Supplement 2A), indicating further that mono-allelic LIS1 dosage is sufficient to support cell division of CD4+ T cells. The part of the discussion related to Lis1 haplo-deficiency has been rephrased according to this new set of data.

    1. Author Response

      Reviewer #1 (Public Review):

      1) One nagging concern is that the category structure in the CNN reflects the category structure baked into color space. Several groups (e.g. Regier, Zaslavsky, et al) have argued that color category structure emerges and evolves from the structure of the color space itself. Other groups have argued that the color category structure recovered with, say, the Munsell space may partially be attributed to variation in saturation across the space (Witzel). How can one show that these properties of the space are not the root cause of the structure recovered by the CNN, independent of the role of the CNN in object recognition?

      We agree that there is overlap with the previous studies on color structure. In our revision, we show that color categories are directly linked to the CNN being trained on the objectrecognition task and not the CNN per se. We repeated our analysis on a scene-trained network (using the same input set) and find that here the color representation in the final layer deviates considerably from the one created for object classification. Given the input set is the same, it strongly suggests that any reflection of the structure of the input space is to the benefit of recognizing objects (see the bottom of “Border Invariance” section; Page 7). Furthermore, the new experiments with random hue shifts to the input images show that in this case stable borders do not arise, as might be expected if the border invariance was a consequence of the chosen color space only.

      A crucial distinction to previous results is also, is that in our analysis, by replacing the final layer, specifically, we look at the representation that the network has built to perform the object classification task on. As such the current finding goes beyond the notion that the color category structure is already reflected in the color space.

      2) In Figure 1, it could be useful to illustrate the central observation by showing a single example, as in Figure 1 B, C, where the trained color is not in the center of the color category. In other words, if the category structure is immune to the training set, then it should be possible to set up a very unlikely set of training stimuli (ones that are as far away from the center of the color category while still being categorized most of the time as the color category). This is related to what is in E, but is distinctive for two reasons: first, it is a post hoc test of the hypothesis recovered in the data-driven way by E; and second, it would provide an illustration of the key observation, that the category boundaries do not correspond to the median distance between training colors. Figure 5 begins to show something of this sort of a test, but it is bound up with the other control related to shape.

      We have now added a post-hoc test where we shift the training bands from likely to unlikely positions using the original paradigm: Retraining output layers whilst shifting training bands from the left to the right category-edge (in 9 steps) we can see the invariance to the category bounds specifically (see Supp. Inf.: Figure S11). The most extreme cases (top and bottom row) have the training bands right at the edge of the border, which are the interesting cases the reviewer refers to. We also added 7 steps in between to show how the borders shift with the bands.

      Similarly, if the claim is that there are six (or seven?) color categories, regardless of the number of colors used to train the data, it would be helpful to show the result of one iteration of the training that uses say 4 colors for training and another iteration of the training that uses say 9 colors for training.

      We have now included the figure presented in 1E, but for all the color iterations used (see SI: Figure S10. We are also happy to include a single iteration, but believe this gives the most complete view for what the reviewer is asking.

      The text asserts that Figure 2 reflects training on a range of color categories (from 4 to 9) but doesn’t break them out. This is an issue because the average across these iterations could simply be heavily biased by training on one specific number of categories (e.g. the number used in Figure 1). These considerations also prompt the query: how did you pick 4 and 9 as the limits for the tests? Why not 2 and 20? (the largest range of basic color categories that could plausibly be recovered in the set of all languages)?

      The number of output nodes was inspired by the number of basic color categories that English speakers observe in the hue spectrum (in which a number of the basic categories are not represented). We understand that this is not a strong reason, however, unfortunately the lack of studies on color categories in CNNs forced us to approach this in an explorative manner. We have adapted the text to better reflect this shortcoming (Bottom page 4). Naturally if the data would have indicated that these numbers weren’t a good fit, we would have adapted the range. (if there were more categories, we would have expected more noise and we would have increased the number of training bands to test this). As indicated above, we have now also included the classification plots for all the different counts, so the reader can review this as well (SI: Section 9).

      3) Regarding the transition points in Figure 2A, indicated by red dots: how strong (transition count) and reliable (consistent across iterations) are these points? The one between red and orange seems especially willfully placed.

      To answer the question on the consistency we have now included a repetition of the ResNet18, with the ResNet34, ResNet50 and ResNet101 in the SI (section 1). We have also introduced a novel section presenting the result of alternate CNNs to the SI (section S8). Despite small idiosyncrasies the general pattern of results recurs.

      Concerning the red-orange border, it was not willfully placed, but we very much understand that in isolation it looks like it could simply be the result of noise. Nevertheless, the recurrence of this border in several analyses made us confident that it does reflect a meaningful invariance. Notably:

      • We find a more robust peak between red and orange in the luminance control (SI section 3).

      • The evolutionary algorithm with 7 borders also places a border in this position.

      • We find the peak recurs in the Resnet-18 replication as well as several of the deeper ResNets and several of the other CNNs (SI section 1)

      • We also find that the peak is present throughout the different layers of the ResNet-18.

      4) Figure 2E and Figure 5B are useful tests of the extent to which the categorical structure recovered by the CNNs shifts with the colors used to train the classifier, and it certainly looks like there is some invariance in category boundaries with respect to the specific colors uses to train the classifier, an important and interesting result. But these analyses do not actually address the claim implied by the analyses: that the performance of the CNN matches human performance. The color categories recovered with the CNN are not perfectly invariant, as the authors point out. The analyses presented in the paper (e.g. Figure 2E) tests whether there is as much shift in the boundaries as there is stasis, but that’s not quite the test if the goal is to link the categorical behavior of the CNN with human behavior. To evaluate the results, it would be helpful to know what would be expected based on human performance.

      We understand the lack of human data was a considerable shortcoming of the previous version of the manuscript. We have now collected human data in a match-to-sample task modeled on our CNN experiment. As with the CNN we find that the degree of border invariance does fluctuate considerably. While categorical borders are not exact matches, we do broadly find the same category prototypes and also see that categories in the red-to-yellow range are quite narrow in both humans and CNNs. Please, see the new “Human Psychophysics” (page 8) addition in the manuscript for more details.

      5) The paper takes up a test of color categorization invariant to luminance. There are arguments in the literature that hue and luminance cannot be decoupled-that luminance is essential to how color is encoded and to color categorization. Some discussion of this might help the reader who has followed this literature.

      We have added some discussion of the interaction between luminance and color categories (e.g., Lindsay & Brown, 2009) at the bottom of page 6/ top of page 7. The current analysis mainly aimed at excluding that the borders are solely based on luminance.

      Related, the argument that “neighboring colors in HSV will be neighboring colors in the RGB space” is not persuasive. Surely this is true of any color space?

      We removed the argument about “neighboring colors”. Our procedure requires the use of a hue spectrum that wraps around the color space while including many of the highly saturated colors that are typical prototypes for human color categories. We have elected to use the hue spectrum from the HSV color space at full saturation and brightness, which is represented by the edges of the RGB color cube. As this is the space in which our network was trained, it does not introduce any deformations into the color space. Other potential choices of color space either include strong non-linear transformations that stretch and compress certain parts of the RGB cube, or exclude a large portion of the RGB gamut (yellow in particular).

      We have adapted the text to better reflect our reasoning (page 6, top of paragraph 2).

      6) The paper would benefit from an analysis and discussion of the images used to originally train the CNN. Presumably, there are a large number of images that depict manmade artificially coloured objects. To what extent do the present results reflect statistical patterns in the way the images were created, and/or the colors of the things depicted? How do results on color categorization that derive from images (e.g. trained with neural networks, as in Rosenthal et al and presently) differ (or not) from results that derive from natural scenes (as in Yendrikhovskij?).

      We initially hoped we could perhaps analyze differences between colors in objects and background like in Rosenthal, unfortunately in ImageNet we did not find clear differences between pixels in the bounding boxes of objects provided with ImageNet and pixels outside these boxes (most likely because the rectangular bounding boxes still contain many background pixels). However, if we look at the results from the K-means analysis presented in Figure S6 (Suppl. Inf.) of the supplemental materials and the color categorization throughout the layers in the objecttrained network (end of the first experiment on page 7) as well as the color categorization in humans (Human Psychophysics starting on page 8), we see very similar border positions arise.

      7) It could be quite instructive to analyze what's going on in the errors in the output of the classifiers, as e.g. in Figure 1E. There are some interesting effects at the crossover points, where the two green categories seem to split and swap, the cyan band (hue % 20) emerges between orange and green, and the pink/purple boundary seems to have a large number of green/blue results. What is happening here?

      One issue with training the network on the color task, is that we can never fully guarantee that the network is using color to resolve the task and we suspected that in some cases the network may rely on other factors as well, such as luminance. When we look at the same type of plots for the luminance-controlled task (see below left) presented in the supplemental materials we do not see these transgressions. Also, when we look at versions of the original training, but using more bands, luminance will be less reliable and we also don’t see these transgressions (see right plot below).

      8) The second experiment using an evolutionary algorithm to test the location of the color boundaries is potentially valuable, but it is weakened because it pre-determines the number of categories. It would be more powerful if the experiment could recover both the number and location of the categories based on the "categorization principle" (colors within a category are harder to tell apart than colors across a color category boundary). This should be possible by a sensible sampling of the parameter space, even in a very large parameter space.

      The main point of the genetic algorithm was to see whether the border locations would be corroborated by an algorithm using the principle of categorical perception. Unfortunately, an exact approach to determining the number of borders is difficult, because some border invariances are clearly stronger than others. Running the algorithm with the number of borders as a free parameter just leads to a minimal number of borders, as 100% correct is always obtained when there is only one category left. In general, as the network can simply combine categories into a class at no cost (actually, having less borders will reduce noise) it is to be expected that less classes will lead to better performance. As such, in estimating what the optimal category count would be, we would need to introduce some subjective trade-off between accuracy and class count.

      9) Finally, the paper sets itself up as taking "a different approach by evaluating whether color categorization could be a side effect of learning object recognition", as distinct from the approach of studying "communicative concepts". But these approaches are intimately related. The central observation in Gibson et al. is not the discovery of warm-vscool categories (these as the most basic color categories have been known for centuries), but rather the relationship of these categories to the color statistics of objects-those parts of the scene that we care about enough to label. This idea, that color categories reflect the uses to which we put our color-vision system, is extended in Rosenthal et al., where the structure of color space itself is understood in terms of categorizing objects versus backgrounds (u') and the most basic object categorization distinction, animate versus inanimate (v'). The introduction argues, rightly in our view, that "A link between color categories and objects would be able to bridge the discrepancy between models that rely on communicative concepts to incorporate the varying usefulness of color, on the one hand, and the experimental findings laid out in this paragraph on the other". This is precisely the link forged by the observation that the warmcool category distinction in color naming correlates with object-color statistics (Gibson, 2017; see also Rosenthal et al., 2018). The argument in Gibson and Rosenthal is that color categorization structure emerges because of the color statistics of the world, specifically the color statistics of the parts of the world that we label as objects, which is the same approach adopted by the present work. The use of CNNs is a clever and powerful test of the success of this approach.

      We are sorry we did not properly highlight the enormous importance of these two earlier papers in our previous version of the manuscript. We have now elaborated our description of Gibson’s work to better reflect the important relation between the usefulness of colors and color categories (Page 2, middle and Page 19 par. above methods). We think our work nicely extends the earlier work by showing that their approach works even at a more general level with more color categories,

    1. Author Response

      Reviewer #3 (Public Review):

      In this paper, for the first time, metabolomics, proteomics, and lipidomics are combined to multi-dimensionally obtain more objective and scientific clues about early and advanced PMI, compared to the traditional methods of PMI estimation that relies on the subjective judgment of morphology. The "ForensOMICS" pipeline establishes a multi-omics analysis pipeline based on the LC-MS platform, which will bring influence and inspiration to the related research of PMI estimation based on molecular biological markers in the foreseeable future. However, due to the limitation of the availability of bone samples and metadata (which might contain covariates with latent influences on the PMI estimation), the current research is still a proof-of-concept study which is incomplete for the "ForensOMICS" approach to be applied in court.

      Strengths:

      Combing multiple omics and bioinformatics, as claimed by the authors, the "ForensOMICS" approach is more accurate and precise than the conventional morphological methods and molecular biological methods using single omics. Moreover, the research does not stop at developing time-dependent models using several omics biomarkers but carries on the enrichment analysis of relevant markers to further explore the pathophysiology mechanism behind the great changes in the internal environment after death, so as to provide meaningful reference data for the basic forensic research of death.

      Data Integration Analysis for Biomarker discovery using Latent variable approaches for Omics studies (DIABLO) method and multiple features selecting tools are used in the bioinformatic process to analyze multiple omics data, and PMI classification model constructed based on PLS-DA, with parameters optimized by 3-fold/100 repeats cross-validation. The overall analysis process is relatively complete, and the data and classification model provided have scientific values for reference.

      The "ForensOMICS" workflow in principle is compatible across metabolomics, proteomics, and lipidomics data obtained in different domains of proof-of-concept studies focusing on forensic-related time estimation (e.g. post-mortem submersion interval and time since deposit), for offering relatively complete analysis process.

      Weaknesses:

      Although the paper does have strengths in principle, the limitation of the availability of bone samples and metadata leads to the major weaknesses of the paper. Therein, age bias samples with single bone type and lack of analysis for environmental factors are the major weaknesses that argue against the key claims in the manuscript by the data presented.

      The mean age of body donors is 74 years with {plus minus}11.6 years of standard deviation, while there was only one type of bone tissue (left anterior midshaft tibia). Different structures and locations of the sampled bone tissue as well as metabolic changes and bone degeneration caused by aging may lead to significant discrepancies in different multi-omics data. Moreover, most of the dead found at crime scenes are in the prime of life, and in addition to the tibia, other skeletal remains found at the scenes are commonly skull, ribs, upper limb bones, and teeth. Therefore, the relevant conclusions obtained from the research based on the limited bone samples cannot meet the actual needs for estimating the PMI of skeletal remains. As mentioned by the authors in the discussion, due to the difficulty in acquiring human remain samples with definite post-mortem intervals, this study is still proof-of-concept. If possible, the authors can focus on a larger sample set of different bone remains in younger age groups in future studies.

      The reviewer is describing exactly the purpose of this manuscript. As highlighted by them, this paper is not intended to be an applicable method for PMI estimation at this stage, as we are aware of the differences that may exist between multiple skeletal elements and the omics results (at least, for proteomics data, as we published several papers on this topic). However, this is the proof of concept to demonstrate the potential that multiple omics combined together may have in addressing the PMI. We are committed to increase our sample size in order to develop a forensic technique for PMI estimation, that should anyway be then validated on multiple skeletal elements.

      Tibia is frequently recovered from scenes also involving the presence of incomplete human remains subjected to long PMIs; our previous studies have also demonstrated that midshaft tibia may be an ideal candidate for proteomics analyses, due to its small intra-individual variability in comparison with other bones. Therefore, the selected sample for this pilot has been the anterior midshaft tibia. We do agree with the reviewer that such samples may not be representative of the whole bone proteome, metabolome, and lipidome composition (with particular regards to cortical and trabecular parts); however, this could be addressed as part of future studies on the topic.

      We do agree with the reviewer about the possible confounding factor related to the relatively high variability in terms of age at death differences, that was indeed due to the difficult in acquiring human bodies with a known PMI.

      Although in-life physiological and/or pathological conditions (i.e., osteoporosis) might be responsible for variability among baseline samples and between baseline and different long PMIs’ samples seen in several metabolites and proteins, we believe the biological phenomena underlying PMI are strong enough to overcome such limitations in the design of the experiment. This is also supported by the small inter-individual variability observed amongst the fresh/baseline samples.

      It is suggested that metadata which may be influence factors of PMI such as temperature, humidity, UV-exposure, and deposition context (which is already recorded) should be recorded and statistically analyzed, so as to further optimize the "ForensOMICS" classification model by considering these possible environmental covariates. In addition, according to the No Free Lunch theorem, PLS-DA is very likely not to be the optimal solution for all the above-mentioned PMI classification tasks based on multi-omics data under different environmental conditions. It is recommended to develop and compare more different classification models for improving the generalization performance of the "ForensOMICS" approach.

      We agree with the reviewer that these factors are crucial in the decomposition process. In our opinion, however, at this stage it is not appropriate to include these metadata in the statistical analysis as covariates by applying additional classification models, due to the small sample size available. Additionally, the main focus of the paper is exclusively on PMI-driven modifications. Environmental data have been added for reference in Supplementary File 2 and will be taken into account in future works when a bigger sample size will be evaluated.

      Due to the limitation of sample size and the discrete-time gradients, the omics data obtained in the paper could only be applied to build a classification model rather than the regression model. Since such a model does not give a specific predicted PMI with MSE and RMSE indicating its performance, and the current "ForensOMICS" approach failed to distinguish different samples of late PMI (219-834 days), there is still a distance for "ForensOMICS" approach to apply in the actual forensic practice.

      Thank you for your comments. We agree, and stressed across the whole manuscript, that this is far from being appliable to forensic practice. The proof-of-concept nature of the study represents a mandatory step for the building of a regression model than can be challenged in the future with the highly rigorous standard required in the forensic setting (i.e., Daubert criteria). We appreciate the understanding of the reviewer for the choice of modelling the data using classification rather than regression.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors define regulatory networks across 77 tissue contexts using software they have previously published (PECA2, Duren et al. 2020). Each regulatory network is a set of nodes (transcription factors (TF), target genes (TG), and regulatory elements (RE)) and edges (regulatory scores connecting the nodes). For each context, the authors define context-specific REs, as those that do not overlap REs from any of the other 76 contexts, and context-specific regulatory networks as the collection of TFs, TGs, and REs connected to at least one context-specific RE. This approach essentially creates annotations that are aggregated across genes, elements, and specific contexts. For each tissue, the authors use linkage disequilibrium score regression (LDSC) to calculate enrichment for complex trait heritability within the set of all REs from the corresponding context-specific regulatory network. Heritability enrichments in context-specific regulatory network REs are compared with heritability enrichments in regions defined using other approaches.

      We thank the reviewers for the pertinent and precise summary of our paper.

      Reviewer #2 (Public Review):

      In this manuscript the authors develop a method, SpecVar, to perform heritability estimation from regulatory networks derived from gene expression and chromatin accessibility data. They apply this approach to public datasets available in ENCODE and Roadmap Epigenomics consortia as well as GWAS phenotype associations in UK Biobank. It promises to be a powerful method to interpret mechanisms from genetic associations. Below are some strengths and weaknesses of the paper.

      Strengths

      • The method performs heritability enrichment on two major genomic data types: gene expression and chromatin accessibility.

      • This method leverages gene regulatory networks to perform the heritability estimation, which may better capture complex disease architecture.

      • The authors perform an extensive comparison to other LDSC-based approaches using different tissue datasets.

      Weaknesses

      (1) This approach may represent a modest advance over existing LDSC methods when looking at other complex traits.

      (2) The authors only compare with LDSC using different functional annotations as input, which may not be appropriate. A more broad comparison with other heritability methods would be helpful.

      (3) The method seems to be applied to "paired" data, but this is still bulk profiles not paired single-cell RNA/ATAC data.

      The authors successfully applied a regulatory network approach to improving the heritability estimation of complex traits by using both gene expression and chromatin accessibility data. While the results could be further strengthened by comparing them to other network and non-network-based methods, it provides important insight into a few traits beyond the standard LDSC model with different functional annotations.

      Given that this method is based on the widely used LDSC approach it should be broadly applied in the field. However, the authors should consider adapting this to single-cell data as well as admixed human population genetic data.

      We thank the reviewer for the positive comment on our work by specifically pointing out that SpecVar is a powerful method to interpret mechanisms from genetic associations. We appreciate that the reviewer’s summarized “Strength” part well captures our major contribution in building an atlas of regulatory networks by integrating paired gene expression and chromatin accessibility data, leveraging regulatory networks to perform the heritability enrichment, and identifying relevant tissues and estimate relevance correlation. We also thank the reviewer for pointing out the weakness to further enhance our results. To address the comments, we (1) performed ablation studies and added more description to clarify the novelty of our methods; (2) conducted extensive comparison to another network-based method CoCoNet and non-network-based method RolyPoly; (3) discussed the promising direction in identification of relevant contexts at cell type level by leveraging single cell multi-omics profiles and application on admixed populations.

      Reviewer #3 (Public Review):

      Identifying the critical tissues and cell types in which genetic variants exert their effects on complex traits is an important question that has attracted increasing attention. Feng et al propose a new method, SpecVar, to first construct context-specific regulatory networks by integrating tissue-specific chromatin states and gene expression data, and then run stratified LD score regression (LDSC) to test if the constructed regulatory network in tissue is significantly associated with the trait, measured by a statistic called trait relevance score in this study. They apply their method to 6 traits for which there exists prior evidence on the most relevant tissues in the literature, and then further apply to 206 traits in the UK Biobank. They find that compared to LDSC using other sources of information to define context-specific annotations, their method can "improve heritability enrichment", "accurately detect relevant tissues", helps to "interpret SNPs" identified from GWAS, and "better reveals shared heritability and regulations of phenotypes" between traits.

      We thank the reviewer for the summary and appreciation of our efforts to address the important question: identifying the critical tissues and cell types in which genetic variants exert their effects on complex traits.

      However, I think it requires more work to understand where exactly the benefits come from and the statistical properties of their proposed test statistic (e.g., how to perform hypothesis tests with their relevance score and whether the false positive rate is under control). In addition, it's not clear to me what they can conclude about the shared heritability (which means genetic correlation) by comparing their relevance score correlation across tissues to the phenotypic correlation between traits.

      We thank the reviewer’s advice to do more work to enhance the statistical rigorousness of SpecVar. We have added the significant test of heritability enrichment and our proposed R score in the revision. We also clarified that SpecVar can use common relevant contexts and shared SNP-associated regulatory networks as potential explanation for the correlation between traits.

      They show that SpecVar gives much higher heritability enrichment than the other methods in the trait-relevant tissues (Fig. 2). The fold enrichment from SpecVar is extremely high, e.g., more than 600x in the right lobe of the liver for LDL. First, I think a standard error should be given so that the significance of the differences can be assessed. Second, it is very rare (hence suspicious) to observe such a huge enrichment. Since SpecVar is based on LDSC, the same methodology that other methods in comparison depend on, the differences to the other methods must come from the set of SNPs annotated for each tissue. I think it is important to understand the difference between the SpecVar annotated SNPs and those from other methods. For example, is the extra heritability enrichment mainly from the SpecVar-specific annotation or from the intersection narrowed down by SpecVar?

      The reviewer has pinpointed a question about one important advantage of our method to improve heritability enrichment. We addressed this question by first providing standard errors, p values, and q values of heritability enrichment. Second, we conduct the ablation analysis to study the source of extra heritability enrichment. This question greatly helps us to clarify the main contribution of our method.

      They propose to use the relevance score (R score) to prioritise trait-relevant tissues. In Fig. 3, they show tissue-trait pairs with the highest R scores, and from there they prioritise several tissues for each trait (Table 1). I can see that some tissue has an outstanding R score, however, it is not clear to me where they draw the line to declare a positive result. The threshold doesn't seem to be even consistent across traits. For example, for LDL, only the right lobe of the liver is identified although other tissues have R scores greater than 100, whereas, for EA, Ammor's horn and adrenal gland are identified although their R scores are apparently smaller than 100. It seems to me they use some subjective criteria to pick the results. It leads to a serious question on how to apply their R score in a hypothesis test: how to measure the uncertainty of their R score? What significance threshold should be used? Whether the false positive rate is under control? (Without knowing these statistical properties, readers won't be able to use this method with confidence in their own research.

      We thank the reviewer to raise the question about the hypothesis test of the R score. We used the block Jackknife stratagem to estimate standard errors, p values, and q values in our revision. We added the new result to the main text and they greatly enhanced the statistical rigorousness of our method.

      Another related comment to the above is to investigate false positive associations, they should show the results for all tissues tested to see if SpecVar tends to give higher R scores even in tissues that are not relevant to the trait. It would also be useful to include some negative control traits, such as height for brain tissues.

      We agree that negative control is important and the six phenotypes in our manuscript are negative for each other. For example, LDL is relevant to liver tissue and not relevant to brain tissue. Educational attainment is relevant to brain tissue but not relevant to liver tissue.

      Fig. 3 shows that tissues prioritised by LDSC-SAP and LDSC-SEG seem to make less sense than those from SpecVar. However, some of the results are not consistent with the LDSC-SEG paper (Finucane et al 2018). For example, LDL was significantly associated with the liver in Finucane et al (Fig. 2), but not in this study. How to explain the difference? (Question 3)

      We checked the results in Figure 3 and found that even though the liver was not ranked to be top 5 tissues, it has a significant P-value to LDL in our implementation. There is indeed some difference in heritability enrichment and P-value between the LDSC-SEG paper and our implementation. And the difference was from the different sets of tissues (77 tissues in our paper and 53 tissues in the LDSC-SEG paper) for the two applications.

      The authors highlight an example where SpecVar facilitates the interpretation of GWAS signals near FOXC2. They find GWAS-significant SNPs located in a CNCC-specific RE downstream of FOXC2 and reason these SNPs affect brain shape by regulating the expression of FOXC2. I think more work can be done to consolidate the conclusion. For example, if the GWAS signals are colocalised with the eQTL for FOXC2 in the brain. Also, note that the top GWAS signal is actually on the left of the CNCC-specific RE (Fig. 4b). A deeper investigation should be warranted.

      We agree that more work should be done to consolidate the regulation of FOXC2. In our revision, we used the HiChIP loop in the brain to support the SNP-associated regulation of FOXC2. We also thank the reviewer’s suggestion for the idea of eQTL colocalization and we conduct eQTL colocalization analysis on our method-revealed SNP-associated regulation to show our method can facilitate the fine mapping of GWAS signals. Lastly, brain shape is a complex trait and may be relevant to multiple tissues. Hence it is reasonable to suspect that the top GWAS signal may be active in other relevant tissues’ regulatory elements.

      They show that SpecVar's relevance score correlation across tissues can better approximate phenotypic correlation between traits. However, the estimation of the phenotypic correlation between traits is neither very interesting nor a thing difficult to do (it can be directly estimated from GWAS summary statistics). A more interesting question is to which extent the observed phenotypic correlation is due to common genetic factors acting in the shared tissues/cell types/pathways/regulatory networks between traits. Note that in their Abstract, they use words "depict shared heritability and regulations" but I don't seem to see results supporting that.

      We are sorry that we didn’t make it clear how SpecVar “depict shared heritability and regulations”. We added more results and one example in the UKBB application to show SpecVar can use common relevant contexts and shared SNP-associated regulatory networks as potential explanation for the correlation between traits.

      Line 396-402: "For example, ... heritability could select most relevant tissues ... but failed to get correct tissues for other phenotypes ... P-value could obtain correct tissues for CP ... but failed to get correct tissues for ... SpecVar could prioritize correct relevant tissues for all the six phenotypes." Honestly, I find hard to judge which tissues are "correct" or "incorrect" for a trait in real life. It would be more straightforward to compare methods using simulation where we know which tissues are causal.

      We thank the reviewers to pinpoint the improper statement of “correct”. It is difficult to find phenotypes with gold-standard relevant tissues and we used six relatively well-studied phenotypes with prior knowledge of possible relevant tissues in our paper. We revised the “correct” statement in our revision.

    1. Author Response

      Reviewer #1 (Public Review):

      Trudel and colleagues aimed to uncover the neural mechanisms of estimating the reliability of the information from social agents and non-social objects. By combining functional MRI with a behavioural experiment and computational modelling, they demonstrated that learning from social sources is more accurate and robust compared with that from non-social sources. Furthermore, dmPFC and pTPJ were found to track the estimated reliability of the social agents (as opposed to the non-social objects). The strength of this study is to devise a task consisting of the two experimental conditions that were matched in their statistical properties and only differed in their framing (social vs. non-social). The novel experimental task allows researchers to directly compare the learning from social and non-social sources, which is a prominent contribution of the present study to social decision neuroscience.

      Thank you so much for your positive feedback about our work. We are delighted that you found that our manuscript provided a prominent contribution to social decision neuroscience. We really appreciate your time to review our work and your valuable comments that have significantly helped us to improve our manuscript further.

      One of the major weaknesses is the lack of a clear description about the conceptual novelty. Learning about the reliability/expertise of social and non-social agents has been of considerable concern in social neuroscience (e.g., Boorman et al., Neuron 2013; and Wittmann et al., Neuron 2016). The authors could do a better job in clarifying the novelty of the study beyond the previous literature.

      We understand the reviewer’s comment and have made changes to the manuscript that, first, highlight more strongly the novelty of the current study. Crucially, second, we have also supplemented the data analyses with a new model-based analysis of the differences in behaviour in the social and non-social conditions which we hope makes clearer, at a theoretical level, why participants behave differently in the two conditions.

      There has long been interest in investigating whether ‘social’ cognitive processes are special or unique compared to ‘non-social’ cognitive processes and, if they are, what makes them so. Differences between conditions could arise during the input stage (e.g. the type of visual input that is processed by social and non-social system), at the algorithm stage (e.g. the type of computational principles that underpin social versus non-social processes) or, even if identical algorithms are used, social and non-social processes might depend on distinct anatomical brain areas or neurons within brain areas. Here, we conducted multiple analyses (in figures 2, 3, and 4 in the revised manuscript and in Figure 2 – figure supplement 1, Figure 3 – figure supplement 1, Figure 4 – figure supplement 3, Figure 4 – figure supplement 4) that not only demonstrated basic similarities in mechanism generalised across social and non-social contexts, but also demonstrated important quantitative differences that were linked to activity in specific brain regions associated with the social condition. The additional analyses (Figure 4 – figure supplement 3, Figure 4 – figure supplement 4) show that differences are not simply a consequence of differences in the visual stimuli that are inputs to the two systems1, nor does the type of algorithm differ between conditions. Instead, our results suggest that the precise manner in which an algorithm is implemented differs when learning about social or non-social information and that this is linked to differences in neuroanatomical substrates.

      The previous studies mentioned by the reviewer are, indeed, relevant ones and were, of course, part of the inspiration for the current study. However, there are crucial differences between them and the current study. In the case of the previous studies by Wittmann, the aim was a very different one: to understand how one’s own beliefs, for example about one’s performance, and beliefs about others, for example about their performance levels, are combined. Here, however, instead we were interested in the similarities and differences between social and non-social learning. It is true that the question resembles the one addressed by Boorman and colleagues in 2013 who looked at how people learned about the advice offered by people or computer algorithms but the difference in the framing of that study perhaps contributed to authors’ finding of little difference in learning. By contrast, in the present study we found evidence that people were predisposed to perceive stability in social performance and to be uncertain about non-social performance. By accumulating evidence across multiple analyses, we show that there are quantitative differences in how we learn about social versus non-social information, and that these differences can be linked to the way in which learning algorithms are implemented neurally. We therefore contend that our findings extend our previous understanding of how, in relation to other learning processes, ‘social’ learning has both shared and special features.

      We would like to emphasize the way in which we have extended several of the analyses throughout the revision. The theoretical Bayesian framework has made it possible to simulate key differences in behaviour between the social and non-social conditions. We explain in our point-by-point reply below how we have integrated a substantial number of new analyses. We have also more carefully related our findings to previous studies in the Introduction and Discussion.

      Introduction, page 4:

      [...] Therefore, by comparing information sampling from social versus non-social sources, we address a long-standing question in cognitive neuroscience, the degree to which any neural process is specialized for, or particularly linked to, social as opposed to non-social cognition 2–9. Given their similarities, it is expected that both types of learning will depend on common neural mechanisms. However, given the importance and ubiquity of social learning, it may also be that the neural mechanisms that support learning from social advice are at least partially specialized and distinct from those concerned with learning that is guided by nonsocial sources. However, it is less clear on which level information is processed differently when it has a social or non-social origin. It has recently been argued that differences between social and non-social learning can be investigated on different levels of Marr’s information processing theory: differences could emerge at an input level (in terms of the stimuli that might drive social and non-social learning), at an algorithmic level or at a neural implementation level 7. It might be that, at the algorithmic level, associative learning mechanisms are similar across social and non-social learning 1. Other theories have argued that differences might emerge because goal-directed actions are attributed to social agents which allows for very different inferences to be made about hidden traits or beliefs 10. Such inferences might fundamentally alter learning about social agents compared to non-social cues.

      Discussion, page 15:

      […] One potential explanation for the assumption of stable performance for social but not non-social predictors might be that participants attribute intentions and motivations to social agents. Even if the social and non-social evidence are the same, the belief that a social actor might have a goal may affect the inferences made from the same piece of information 10. Social advisors first learnt about the target’s distribution and accordingly gave advice on where to find the target. If the social agents are credited with goal-directed behaviour then it might be assumed that the goals remain relatively constant; this might lead participants to assume stability in the performances of social advisors. However, such goal-directed intentions might not be attributed to non-social cues, thereby making judgments inherently more uncertain and changeable across time. Such an account, focussing on differences in attribution in social settings aligns with a recent suggestion that any attempt to identify similarities or differences between social and non-social processes can occur at any one of a number of the levels in Marr’s information theory 7. Here we found that the same algorithm was able to explain social and non-social learning (a qualitatively similar computational model could explain both). However, the extent to which the algorithm was recruited when learning about social compared to non-social information differed. We observed a greater impact of uncertainty on judgments about social compared to non-social information. We have shown evidence for a degree of specialization when assessing social advisors as opposed to non-social cues. At the neural level we focused on two brain areas, dmPFC and pTPJ, that have not only been shown to carry signals associated with belief inferences about others but, in addition, recent combined fMRI-TMS studies have demonstrated the causal importance of these activity patterns for the inference process […]

      Another weakness is the lack of justifications of the behavioural data analyses. It is difficult for me to understand why 'performance matching' is suitable for an index of learning accuracy. I understand the optimal participant would adjust the interval size with respect to the estimated reliability of the advisor (i.e., angular error); however, I am wondering if the optimal strategy for participants is to exactly match the interval size with the angular error. Furthermore, the definitions of 'confidence adjustment across trials' and 'learning index' look arbitrary.

      First, having read the reviewer’s comments, we realise that our choice of the term ‘performance matching’ may not have been ideal as it indeed might not be the case that the participant intended to directly match their interval sizes with their estimates of advisor/predictor error. Like the reviewer, our assumption is simply that the interval sizes should change as the estimated reliability of the advisor changes and, therefore, that the intervals that the participants set should provide information about the estimates that they hold and the manner in which they evolve. On re-reading the manuscript we realised that we had not used the term ‘performance matching’ consistently or in many places in the manuscript. In the revised manuscript we have simply removed it altogether and referred to the participants’ ‘interval setting’.

      Most of the initial analyses in Figure 2a-c aim to better understand the raw behaviour before applying any computational model to the data. We were interested in how participants make confidence judgments (decision-making per se), but also how they adapt their decisions with additional information (changes or learning in decision making). In the revised manuscript we have made clear that these are used as simple behavioural measures and that they will be complemented later by more analyses derived from more formal computational models.

      In what we now refer to as the ‘interval setting’ analysis (Figure 2a), we tested whether participants select their interval settings differently in the social compared to non-social condition. We observe that participants set their intervals closer to the true angular error of the advisor/predictor in the social compared to the non-social condition. This observation could arise in two ways. First, it could be due to quantitative differences in learning despite general, qualitative similarity: mechanisms are similar but participants differ quantitatively in the way that they learn about non-social information and social information. Second, it could, however, reflect fundamentally different strategies. We tested basic performance differences by comparing the mean reward between conditions. There was no difference in reward between conditions (mean reward: paired t-test social vs. non-social, t(23)= 0.8, p=0.4, 95% CI= [-0.007 0.016]), suggesting that interval setting differences might not simply reflect better or worse performance in social or non-social contexts but instead might reflect quantitative differences in the processes guiding interval setting in the two cases.

      In the next set of analyses, in which we compared raw data, applied a computational model, and provided a theoretical account for the differences between conditions, we suggest that there are simple quantitative differences in how information is processed in social and nonsocial conditions but that these have the important impact of making long-term representations – representations built up over a longer series of trials – more important in the social condition. This, in turn, has implications for the neural activity patterns associated with social and non-social learning. We, therefore, agree with the reviewer, that one manner of interval setting is indeed not more optimal than another. However, the differences that do exist in behaviour are important because they reveal something about the social and non-social learning and its neural substrates. We have adjusted the wording and interpretation in the revised manuscript.

      Next, we analysed interval setting with two additional, related analyses: interval setting adjustment across trials and derivation of a learning index. We tested the degree to which participants adjusted their interval setting across trials and according to the prediction error (learning index, Figure f); the latter analysis is very similar to a trial-wise learning rate calculated in previous studies11. In contrast to many other studies, the intervals set by participants provide information about the estimates that they hold in a simple and direct way and enable calculation of a trial-wise learning index; therefore, we decided to call it ‘learning index’ instead of ‘learning rate’ as it is not estimated via a model applied to the data, but instead directly calculated from the data. Arguably the directness of the approach, and its lack of dependence on a specific computational model, is a strength of the analysis.

      Subsequently in the manuscript, a new analysis (illustrated in new Figure 3) employs Bayesian models that can simulate the differences in the social and non-social conditions and demonstrate that a number of behavioural observations can arise simply as a result of differences in noise in each trial-wise Bayesian update (Figure 3 and specifically 3d; Figure 3 – figure supplement 1b-c). In summary, the descriptive analyses in Figure 2a-c aid an intuitive understanding of the differences in behaviour in the social and non-social conditions. We have then repeated these analyses with Bayesian models incorporating different noise levels and showed that in such a way, the differences in behaviour between social and non-social conditions can be mimicked (please see next section and manuscript for details).

      We adjusted the wording in a number of sections in the revised manuscript such as in the legend of Figure 2 (figures and legend), Figure 4 (figures and legend).

      Main text, page 5:

      The confidence interval could be changed continuously to make it wider or narrower, by pressing buttons repeatedly (one button press resulted in a change of one step in the confidence interval). In this way participants provided what we refer to as an ’interval setting’.

      We also adjusted the following section in Main text, page 6:

      Confidence in the performance of social and non-social advisors

      We compared trial-by-trial interval setting in relation to the social and non-social advisors/predictors. When setting the interval, the participant’s aim was to minimize it while ensuring it still encompassed the final target position; points were won when it encompassed the target position but were greater when it was narrower. A given participant’s interval setting should, therefore, change in proportion to the participant’s expectations about the predictor’s angular error and their uncertainty about those expectations. Even though, on average, social and non-social sources did not differ in the precision with which they predicted the target (Figure 2 – figure supplement 1), participants gave interval settings that differed in their relationships to the true performances of the social advisors compared to the non-social predictors. The interval setting was closer to the angular error in the social compared to the non-social sessions (Figure 2a, paired t-test: social vs. non-social, t(23)= -2.57, p= 0.017, 95% confidence interval (CI)= [-0.36 -0.4]). Differences in interval setting might be due to generally lower performance in the nonsocial compared to social condition, or potentially due to fundamentally different learning processes utilised in either condition. We compared the mean reward amounts obtained by participants in the social and non-social conditions to determine whether there were overall performance differences. There was, however, no difference in the reward received by participants in the two conditions (mean reward: paired t-test social vs. non-social, t(23)= 0.8, p=0.4, 95% CI= [-0.007 0.016]), suggesting that interval setting differences might not simply reflect better or worse performance

      Discussion, page 14:

      Here, participants did not match their confidence to the likely accuracy of their own performance, but instead to the performance of another social or non-social advisor. Participants used different strategies when setting intervals to express their confidence in the performances of social advisors as opposed to non-social advisors. A possible explanation might be that participants have a better insight into the abilities of social cues – typically other agents – than non-social cues – typically inanimate objects.

      As the authors assumed simple Bayesian learning for the estimation of reliability in this study, the degree/speed of the learning should be examined with reference to the distance between the posterior and prior belief in the optimal Bayesian inference.

      We thank the reviewer for this suggestion. We agree with the reviewer that further analyses that aim to disentangle the underlying mechanisms that might differ between both social and non-social conditions might provide additional theoretical contributions. We show additional model simulations and analyses that aim to disentangle the differences in more detail. These new results allowed clearer interpretations to be made.

      In the current study, we showed that judgments made about non-social predictors were changed more strongly as a function of the subjective uncertainty: participants set a larger interval, indicating lower confidence, when they were more uncertain about the non-social cue’s accuracy to predict the target. In response to the reviewer’s comments, the new analyses were aimed at understanding under which conditions such a negative uncertainty effect might emerge.

      Prior expectations of performance First, we compared whether participants had different prior expectations in the social condition compared to the non-social condition. One way to compare prior expectations is by comparing the first interval set for each advisor/predictor. This is a direct readout of the initial prior expectation with which participants approach our two conditions. In such a way, we test whether the prior beliefs before observing any social or non-social information differ between conditions. Even though this does not test the impact of prior expectations on subsequent belief updates, it does test whether participants have generally different expectations about the performance of social advisors or non-social predictors. There was no difference in this measure between social or non-social cues (Figure below; paired t-test social vs. non-social, t(23)= 0.01, p=0.98, 95% CI= [-0.067 0.68]).

      Figure. Confidence interval for the first encounter of each predictor in social and non-social conditions. There was no initial bias in predicting the performance of social or non-social predictors.

      Learning across time We have now seen that participants do not have an initial bias when predicting performances in social or non-social conditions. This suggests that differences between conditions might emerge across time when encountering predictors multiple times. We tested whether inherent differences in how beliefs are updated according to new observations might result in different impacts of uncertainty on interval setting between social and non-social conditions. More specifically, we tested whether the integration of new evidence differed between social and non-social conditions; for example, recent observations might be weighted more strongly for non-social cues while past observations might be weighted more strongly for social cues. This approach was inspired by the reviewer’s comments about potential differences in the speed of learning as well as the reduction of uncertainty with increasing predictor encounters. Similar ideas were tested in previous studies, when comparing the learning rate (i.e. the speed of learning) in environments of different volatilities 12,13. In these studies, a smaller learning rate was prevalent in stable environments during which reward rates change slower over time, while higher learning rates often reflect learning in volatile environments so that recent observations have a stronger impact on behaviour. Even though most studies derived these learning rates with reinforcement learning models, similar ideas can be translated into a Bayesian model. For example, an established way of changing the speed of learning in a Bayesian model is to introduce noise during the update process14. This noise is equivalent to adding in some of the initial prior distribution and this will make the Bayesian updates more flexible to adapt to changing environments. It will widen the belief distribution and thereby make it more uncertain. Recent information has more weight on the belief update within a Bayesian model when beliefs are uncertain. This increases the speed of learning. In other words, a wide distribution (after adding noise) allows for quick integration of new information. On the contrary, a narrow distribution does not integrate new observations as strongly and instead relies more heavily on previous information; this corresponds to a small learning rate. So, we would expect a steep decline of uncertainty to be related to a smaller learning index while a slower decline of uncertainty is related to a larger learning index. We hypothesized that participants reduce their uncertainty quicker when observing social information, thereby anchoring more strongly on previous beliefs instead of integrating new observations flexibly. Vice versa, we hypothesized a less steep decline of uncertainty when observing non-social information, indicating that new information can be flexibly integrated during the belief update (new Figure 3a).

      We modified the original Bayesian model (Figure 2d, Figure 2 – figure supplement 2) by adding a uniform distribution (equivalent to our prior distribution) to each belief update – we refer to this as noise addition to the Bayesian model14,21 . We varied the amount of noise between δ = [0,1], while δ= 0 equals the original Bayesian model and δ= 1 represents a very noisy Bayesian model. The uniform distribution was selected to match the first prior belief before any observation was made (equation 2). This δ range resulted in a continuous increase of subjective uncertainty around the belief about the angular error (Figure 3b-c). The modified posterior distribution denoted as 𝑝′(σ x) was derived at each trial as follows:

      We applied each noisy Bayesian model to participants’ choices within the social and nonsocial condition.

      The addition of a uniform distribution changed two key features of the belief distribution: first, the width of the distribution remains larger with additional observations, thereby making it possible to integrate new observations more flexibly. To show this more clearly, we extracted the model-derived uncertainty estimate across multiple encounters of the same predictor for the original model and the fully noisy Bayesian model (Figure 3 – figure supplement 1). The model-derived ‘uncertainty estimate’ of a noisy Bayesian model decays more slowly compared to the ‘uncertainty estimate’ of the original Bayesian model (upper panel). Second, the model-derived ‘accuracy estimate’ reflects more recent observations in a noisy Bayesian model compared to the ‘accuracy estimate’ derived from the original Bayesian model, which integrates past observations more strongly (lower panel). Hence, as mentioned beforehand, a rapid decay of uncertainty implies a small learning index; or in other words, stronger integration of past compared to recent observations.

      In the following analyses, we tested whether an increasingly noisy Bayesian model mimics behaviour that is observed in the non-social compared to social condition. For example, we tested whether an increasingly noisy Bayesian model also exhibits a strongly negative ‘predictor uncertainty’ effect on interval setting (Figure 2e). In such a way, we can test whether differences in noise in the updating process of a Bayesian model might reproduce important qualitative differences in learning-related behaviour seen in the social and nonsocial conditions.

      We used these modified Bayesian models to simulate trial-wise interval setting for each participant according to the observations they made when selecting a particular advisor or non-social cue. We simulated interval setting at each trial and examined whether an increase in noise produced model behaviours that resembled participant behaviour patterns observed in the non-social condition as opposed to social condition. At each trial, we used the accuracy estimate (Methods, equation 6) – which represents a subjective belief about a single angular error -- to derive an interval setting for the selected predictor. To do so, we first derived the point-estimate of the belief distribution at each trial (Methods, equation 6) and multiplied it with the size of one interval step on the circle. The step size was derived by dividing the circle size by the maximum number of possible steps. Here is an example of transforming an accuracy estimate into an interval: let’s assume the belief about the angular error at the current trial is 50 (Methods, equation 6). Now, we are trying to transform this number into an interval for the current predictor on a given trial. To obtain the size of one interval step, the circle size (360 degrees) is divided by the maximum number of interval steps (40 steps; note, 20 steps on each side), which results in nine degrees that represents the size of one interval step. Next, the accuracy estimate in radians (0,87) is multiplied by the step size in radians (0,1571) resulting in an interval of 0,137 radians or 7,85 degrees. The final interval size would be 7,85.

      Simulating Bayesian choices in that way, we repeated the behavioural analyses (Figure 2b,e,f) to test whether intervals derived from more noisy Bayesian models mimic intervals set by participants in the non-social condition: greater changes in interval setting across trials (Figure 3 – figure supplement 1b), a negative ‘predictor uncertainty' effect on interval setting (Figure 3 – figure supplement 1c), and a higher learning index (Figure 3d).

      First, we repeated the most crucial analysis -- the linear regression analysis (Figure 2e) and hypothesized that intervals that were simulated from noisy Bayesian models would also show a greater negative ‘predictor uncertainty’ effect on interval setting. This was indeed the case: irrespective of social or non-social conditions, the addition of noise (increased weighting of the uniform distribution in each belief update) led to an increasingly negative ‘predictor uncertainty’ effect on confidence judgment (new Figure 3d). In Figure 3d, we show the regression weights (y-axis) for the ‘predictor uncertainty’ on confidence judgment with increasing noise (x-axis). This result is highly consistent with the idea that that in the non-social condition the manner in which task estimates are updated is more uncertain and more noisy. By contrast, social estimates appear relatively more stable, also according to this new Bayesian simulation analysis.

      This new finding extends the results and suggests a formal computational account of the behavioural differences between social and non-social conditions. Increasing the noise of the belief update mimics behaviour that is observed in the non-social condition: an increasingly negative effect of ‘predictor uncertainty’ on confidence judgment. Noteworthily, there was no difference in the impact that the noise had in the social and non-social conditions. This was expected because the Bayesian simulations are blind to the framing of the conditions. However, it means that the observed effects do not depend on the precise sequence of choices that participants made in these conditions. It therefore suggests that an increase in the Bayesian noise leads to an increasingly negative impact of ‘predictor uncertainty’ on confidence judgments irrespective of the condition. Hence, we can conclude that different degrees of uncertainty within the belief update is a reasonable explanation that can underlie the differences observed between social and non-social conditions.

      Next, we used these simulated confidence intervals and repeated the descriptive behavioural analyses to test whether interval settings that were derived from more noisy Bayesian models mimic behavioural patterns observed in non-social compared to social conditions. For example, more noise in the belief update should lead to more flexible integration of new information and hence should potentially lead to a greater change of confidence judgments across predictor encounters (Figure 2b). Further, a greater reliance on recent information should lead to prediction errors more strongly in the next confidence judgment; hence, it should result in a higher learning index in the non-social condition that we hypothesize to be perceived as more uncertain (Figure 2f). We used the simulated confidence interval from Bayesian models on a continuum of noise integration (i.e. different weighting of the uniform distribution into the belief update) and derived again both absolute confidence change and learning indices (Figure 3 – figure supplement 1b-c).

      ‘Absolute confidence change’ and ‘learning index’ increase with increasing noise weight, thereby mimicking the difference between social and non-social conditions. Further, these analyses demonstrate the tight relationship between descriptive analyses and model-based analyses. They show that a noise in the Bayesian updating process is a conceptual explanation that can account for both the differences in learning and the difference in uncertainty processing that exist between social and non-social conditions. The key insight conveyed by the Bayesian simulations is that a wider, more uncertain belief distribution changes more quickly. Correspondingly, in the non-social condition, participants express more uncertainty in their confidence estimate when they set the interval, and they also change their beliefs more quickly as expressed in a higher learning index. Therefore, noisy Bayesian updating can account for key differences between social and non-social condition.

      We thank the reviewer for making this point, as we believe that these additional analyses allow theoretical inferences to be made in a more direct manner; we think that it has significantly contributed towards a deeper understanding of the mechanisms involved in the social and non-social conditions. Further, it provides a novel account of how we make judgments when being presented with social and non-social information.

      We made substantial changes to the main text, figures and supplementary material to include these changes:

      Main text, page 10-11 new section:

      The impact of noise in belief updating in social and non-social conditions

      So far, we have shown that, in comparison to non-social predictors, participants changed their interval settings about social advisors less drastically across time, relied on observations made further in the past, and were less impacted by their subjective uncertainty when they did so (Figure 2). Using Bayesian simulation analyses, we investigated whether a common mechanism might underlie these behavioural differences. We tested whether the integration of new evidence differed between social and non-social conditions; for example, recent observations might be weighted more strongly for non-social cues while past observations might be weighted more strongly for social cues. Similar ideas were tested in previous studies, when comparing the learning rate (i.e. the speed of learning) in environments of different volatilities12,13. We tested these ideas using established ways of changing the speed of learning during Bayesian updates14,21. We hypothesized that participants reduce their uncertainty quicker when observing social information. Vice versa, we hypothesized a less steep decline of uncertainty when observing non-social information, indicating that new information can be flexibly integrated during the belief update (Figure 5a).

      We manipulated the amount of uncertainty in the Bayesian model by adding a uniform distribution to each belief update (Figure 3b-c) (equation 10,11). Consequently, the distribution’s width increases and is more strongly impacted by recent observations (see example in Figure 3 – figure supplement 1). We used these modified Bayesian models to simulate trial-wise interval setting for each participant according to the observations they made by selecting a particular advisor in the social condition or other predictor in the nonsocial condition. We simulated confidence intervals at each trial. We then used these to examine whether an increase in noise led to simulation behaviour that resembled behavioural patterns observed in non-social conditions that were different to behavioural patterns observed in the social condition.

      First, we repeated the linear regression analysis and hypothesized that interval settings that were simulated from noisy Bayesian models would also show a greater negative ‘predictor uncertainty’ effect on interval setting resembling the effect we had observed in the nonsocial condition (Figure 2e). This was indeed the case when using the noisy Bayesian model: irrespective of social or non-social condition, the addition of noise (increasing weight of the uniform distribution to each belief update) led to an increasingly negative ‘predictor uncertainty’ effect on confidence judgment (new Figure 3d). The absence of difference between the social and non-social conditions in the simulations, suggests that an increase in the Bayesian noise is sufficient to induce a negative impact of ‘predictor uncertainty’ on interval setting. Hence, we can conclude that different degrees of noise in the updating process are sufficient to cause differences observed between social and non-social conditions. Next, we used these simulated interval settings and repeated the descriptive behavioural analyses (Figure 2b,f). An increase in noise led to greater changes of confidence across time and a higher learning index (Figure 3 – figure supplement 1b-c). In summary, the Bayesian simulations offer a conceptual explanation that can account for both the differences in learning and the difference in uncertainty processing that exist between social and non-social conditions. The key insight conveyed by the Bayesian simulations is that a wider, more uncertain belief distribution changes more quickly. Correspondingly, in the non-social condition, participants express more uncertainty in their confidence estimate when they set the interval, and they also change their beliefs more quickly. Therefore, noisy Bayesian updating can account for key differences between social and non-social condition.

      Methods, page 23 new section:

      Extension of Bayesian model with varying amounts of noise

      We modified the original Bayesian model (Figure 2d, Figure 2 – figure supplement 2) to test whether the integration of new evidence differed between social and non-social conditions; for example, recent observations might be weighted more strongly for non-social cues while past observations might be weighted more strongly for social cues. [...] To obtain the size of one interval step, the circle size (360 degrees) is divided by the maximum number of interval steps (40 steps; note, 20 steps on each side), which results in nine degrees that represents the size of one interval step. Next, the accuracy estimate in radians (0,87) is multiplied by the step size in radians (0,1571) resulting in an interval of 0,137 radians or 7,85 degrees. The final interval size would be 7,85.

      We repeated behavioural analyses (Figure 2b,e,f) to test whether confidence intervals derived from more noisy Bayesian models mimic behavioural patterns observed in the nonsocial condition: greater changes of confidence across trials (Figure 3 – figure supplement 1b), a greater negative ‘predictor uncertainty' on confidence judgment (Figure 3 – figure supplement 1c) and a greater learning index (Figure 3d).

      Discussion, page 14: […] It may be because we make just such assumptions that past observations are used to predict performance levels that people are likely to exhibit next 15,16. An alternative explanation might be that participants experience a steeper decline of subjective uncertainty in their beliefs about the accuracy of social advice, resulting in a narrower prior distribution, during the next encounter with the same advisor. We used a series of simulations to investigate how uncertainty about beliefs changed from trial to trial and showed that belief updates about non-social cues were consistent with a noisier update process that diminished the impact of experiences over the longer term. From a Bayesian perspective, greater certainty about the value of advice means that contradictory evidence will need to be stronger to alter one’s beliefs. In the absence of such evidence, a Bayesian agent is more likely to repeat previous judgments. Just as in a confirmation bias 17, such a perspective suggests that once we are more certain about others’ features, for example, their character traits, we are less likely to change our opinions about them.

      Reviewer #2 (Public Review):

      Humans learn about the world both directly, by interacting with it, and indirectly, by gathering information from others. There has been a longstanding debate about the extent to which social learning relies on specialized mechanisms that are distinct from those that support learning through direct interaction with the environment. In this work, the authors approach this question using an elegant within-subjects design that enables direct comparisons between how participants use information from social and non-social sources. Although the information presented in both conditions had the same underlying structure, participants tracked the performance of the social cue more accurately and changed their estimates less as a function of prediction error. Further, univariate activity in two regions-dmPFC and pTPJ-tracked participants' confidence judgments more closely in the social than in the non-social condition, and multivariate patterns of activation in these regions contained information about the identity of the social cues.

      Overall, the experimental approach and model used in this paper are very promising. However, after reading the paper, I found myself wanting additional insight into what these condition differences mean, and how to place this work in the context of prior literature on this debate. In addition, some additional analyses would be useful to support the key claims of the paper.

      We thank the reviewer for their very supportive comments. We have addressed their points below and have highlighted changes in our manuscript that we made in response to the reviewer’s comments.

      (1) The framing should be reworked to place this work in the context of prior computational work on social learning. Some potentially relevant examples:

      • Shafto, Goodman & Frank (2012) provide a computational account of the domainspecific inductive biases that support social learning. In brief, what makes social learning special is that we have an intuitive theory of how other people's unobservable mental states lead to their observable actions, and we use this intuitive theory to actively interpret social information. (There is also a wealth of behavioral evidence in children to support this account; for a review, see Gweon, 2021).

      • Heyes (2012) provides a leaner account, arguing that social and non-social learning are supported by a common associative learning mechanism, and what distinguishes social from non-social learning is the input mechanism. Social learning becomes distinctively "social" to the extent that organisms are biased or attuned to social information.

      I highlight these papers because they go a step beyond asking whether there is any difference between mechanisms that support social and nonsocial learning-they also provide concrete proposals about what that difference might be, and what might be shared. I would like to see this work move in a similar direction.

      References<br /> (In the interest of transparency: I am not an author on these papers.)

      Gweon, H. (2021). Inferential social learning: how humans learn from others and help others learn. PsyArXiv. https://doi.org/10.31234/osf.io/8n34t

      Heyes, C. (2012). What's social about social learning?. Journal of Comparative Psychology, 126(2), 193.

      Shafto, P., Goodman, N. D., & Frank, M. C. (2012). Learning from others: The consequences of psychological reasoning for human learning. Perspectives on Psychological Science, 7(4), 341-351.

      Thank you for this suggestion to expand our framing. We have now made substantial changes to the Discussion and Introduction to include additional background literature, the relevant references suggested by the reviewer, addressing the differences between social and non-social learning. We further related our findings to other discussions in the literature that argue that differences between social and non-social learning might occur at the level of algorithms (the computations involved in social and non-social learning) and/or implementation (the neural mechanisms). Here, we describe behaviour with the same algorithm (Bayesian model), but the weighing of uncertainty on decision-making differs between social and non-social contexts. This might be explained by similar ideas put forward by Shafto and colleagues (2012), who suggest that differences between social and non-social learning might be due to the attribution of goal-directed intention to social agents, but not non-social cues. Such an attribution might lead participants to assume that advisor performances will be relatively stable under the assumption that they should have relatively stable goal-directed intentions. We also show differences at the implementational level in social and non-social learning in TPJ and dmPFC.

      Below we list the changes we have made to the Introduction and Discussion. Further, we would also like to emphasize the substantial extension of the Bayesian modelling which we think clarifies the theoretical framework used to explain the mechanisms involved in social and non-social learning (see our answer to the next comments below).

      Introduction, page 4:

      [...]<br /> Therefore, by comparing information sampling from social versus non-social sources, we address a long-standing question in cognitive neuroscience, the degree to which any neural process is specialized for, or particularly linked to, social as opposed to non-social cognition 2–9. Given their similarities, it is expected that both types of learning will depend on common neural mechanisms. However, given the importance and ubiquity of social learning, it may also be that the neural mechanisms that support learning from social advice are at least partially specialized and distinct from those concerned with learning that is guided by nonsocial sources.

      However, it is less clear on which level information is processed differently when it has a social or non-social origin. It has recently been argued that differences between social and non-social learning can be investigated on different levels of Marr’s information processing theory: differences could emerge at an input level (in terms of the stimuli that might drive social and non-social learning), at an algorithmic level or at a neural implementation level 7. It might be that, at the algorithmic level, associative learning mechanisms are similar across social and non-social learning 1. Other theories have argued that differences might emerge because goal-directed actions are attributed to social agents which allows for very different inferences to be made about hidden traits or beliefs 10. Such inferences might fundamentally alter learning about social agents compared to non-social cues.

      Discussion, page 15:

      […] One potential explanation for the assumption of stable performance for social but not non-social predictors might be that participants attribute intentions and motivations to social agents. Even if the social and non-social evidence are the same, the belief that a social actor might have a goal may affect the inferences made from the same piece of information 10. Social advisors first learnt about the target’s distribution and accordingly gave advice on where to find the target. If the social agents are credited with goal-directed behaviour then it might be assumed that the goals remain relatively constant; this might lead participants to assume stability in the performances of social advisors. However, such goal-directed intentions might not be attributed to non-social cues, thereby making judgments inherently more uncertain and changeable across time. Such an account, focussing on differences in attribution in social settings aligns with a recent suggestion that any attempt to identify similarities or differences between social and non-social processes can occur at any one of a number of the levels in Marr’s information theory 7. Here we found that the same algorithm was able to explain social and non-social learning (a qualitatively similar computational model could explain both). However, the extent to which the algorithm was recruited when learning about social compared to non-social information differed. We observed a greater impact of uncertainty on judgments about social compared to non-social information. We have shown evidence for a degree of specialization when assessing social advisors as opposed to non-social cues. At the neural level we focused on two brain areas, dmPFC and pTPJ, that have not only been shown to carry signals associated with belief inferences about others but, in addition, recent combined fMRI-TMS studies have demonstrated the causal importance of these activity patterns for the inference process […]

      (2) The results imply that dmPFC and pTPJ differentiate between learning from social and non-social sources. However, more work needs to be done to rule out simpler, deflationary accounts. In particular, the condition differences observed in dmPFC and pTPJ might reflect low-level differences between the two conditions. For example, the social task could simply have been more engaging to participants, or the social predictors may have been more visually distinct from one another than the fruits.

      We understand the reviewer’s concern regarding low-level distinctions between the social and non-social condition that could confound for the differences in neural activation that are observed between conditions in areas pTPJ and dmPFC. From the reviewer’s comments, we understand that there might be two potential confounders: first, low-level differences such that stimuli within one condition might be more distinct to each other compared to the relative distinctiveness between stimuli within the other condition. Therefore, simply the greater visual distinctiveness of stimuli in one condition than another might lead to learning differences between conditions. Second, stimuli in one condition might be more engaging and potentially lead to attentional differences between conditions. We used a combination of univariate analyses and multivariate analyses to address both concerns.

      Analysis 1: Univariate analysis to inspect potential unaccounted variance between social and non-social condition

      First, we used the existing univariate analysis (exploratory MRI whole-brain analysis, see Methods) to test for neural activation that covaried with attentional differences – or any other unaccounted neural difference -- between conditions. If there were neural differences between conditions that we are currently not accounting for with the parametric regressors that are included in the fMRI-GLM, then these differences should be captured in the constant of the GLM model. For example, if there are attentional differences between conditions, then we could expect to see neural differences between conditions in areas such as inferior parietal lobe (or other related areas that are commonly engaged during attentional processes).

      Importantly, inspection of the constant of the GLM model should capture any unaccounted differences, whether they are due to attention or alternative processes that might differ between conditions. When inspecting cluster-corrected differences in the constant of the fMRI-GLM model during the setting of the confidence judgment, there were no clustersignificant activation that was different between social and non-social conditions (Figure 4 – figure supplement 4a; results were familywise-error cluster-corrected at p<0.05 using a cluster-defining threshold of z>2.3). For transparency, we show the sub-threshold activation map across the whole brain (z > 2) for the ‘constant’ contrasted between social and nonsocial condition (i.e. constant, contrast: social – non-social).

      For transparency we additionally used an ROI-approach to test differences in activation patterns that correlated with the constant during the confidence phase – this means, we used the same ROI-approach as we did in the paper to avoid any biased test selection. We compared activation patterns between social and non-social conditions in the same ROI as used before; dmPFC (MNI-coordinate [x/y/z: 2,44,36] 16), bilateral pTPJ (70% probability anatomical mask; for reference see manuscript, page 23) and additionally compared activation patterns between conditions in bilateral IPLD (50% probability anatomical mask, 20). We did not find significantly different activation patterns between social and non-social conditions in any of these areas: dmPFC (confidence constant; paired t-test social vs nonsocial: t(23) = 0.06, p=0.96, [-36.7, 38.75]), bilateral TPJ (confidence constant; paired t-test social vs non-social: t(23) = -0.06, p=0.95, [-31, 29]), bilateral IPLD (confidence constant; paired t-test social vs non-social: t(23) = -0.58, p=0.57, [-30.3 17.1]).

      There were no meaningful activation patterns that differed between conditions in either areas commonly linked to attention (eg IPL) or in brain areas that were the focus of the study (dmPFC and pTPJ). Activation in dmPFC and pTPJ covaried with parametric effects such as the confidence that was set at the current and previous trial, and did not correlate with low-level differences such as attention. Hence, these results suggest that activation between conditions was captured better by parametric regressors such as the trial-wise interval setting, i.e. confidence, and are unlikely to be confounded by low-level processes that can be captured with univariate neural analyses.

      Analysis 2: RSA to test visual distinctiveness between social and non-social conditions

      We addressed the reviewer’s other comment further directly by testing whether potential differences between conditions might arise due to a varying degree of visual distinctiveness in one stimulus set compared to the other stimulus set. We used RSA analysis to inspect potential differences in early visual processes that should be impacted by greater stimulus similarity within one condition. In other words, we tested whether the visual distinctiveness of one stimuli set was different to the visual distinctiveness of the other stimuli set. We used RSA analysis to compare the Exemplar Discriminability Index (EDI) between conditions in early visual areas. We compared the dissimilarity of neural activation related to the presentation of an identical stimulus across trials (diagonal in RSA matrix) with the dissimilarity in neural activation between different stimuli across trials (off-diagonal in RSA matrix). If stimuli within one stimulus set are very similar, then the difference between the diagonal and off-diagonal should be very small and less likely to be significant (i.e. similar diagonal and off-diagonal values). In contrast, if stimuli within one set are very distinct from each other, then the difference between the diagonal and off-diagonal should be large and likely to result in a significant EDI (i.e. different diagonal and off-diagonal values) (see Figure 4g for schematic illustration). Hence, if there is a difference in the visual distinctiveness between social and non-social conditions, then this difference should result in different EDI values for both conditions – hence, visual distinctiveness between the stimuli set can be tested by comparing the EDI values between conditions within the early visual processing. We used a Harvard-cortical ROI mask based on bilateral V1. Negative EDI values indicate that the same exemplars are represented more similarly in the neural V1 pattern than different exemplars. This analysis showed that there was no significant difference in EDI between conditions (Figure 4 – figure supplement 4b; EDI paired sample t-test: t(23) = -0.16, p=0.87, 95% CI [-6.7 5.7]).

      We have further replicated results in V1 with a whole-brain searchlight analysis, averaging across both social and non-social conditions.

      In summary, by using a combination of univariate and multivariate analyses, we could test whether neural activation might be different when participants were presented with a facial or fruit stimuli and whether these differences might confound observed learning differences between conditions. We did not find meaningful neural differences that were not accounted for with the regressors included in the GLM. Further, we did not find differences in the visual distinctiveness between the stimuli sets. Hence, these control analyses suggest that differences between social and non-social conditions might not arise because of differences in low-level processes but are instead more likely to develop when learning about social or non-social information.

      Moreover, we also examined behaviourally whether participants differed in the way they approached social and non-social condition. We tested whether there were initial biases prior to learning, i.e. before actually receiving information from either social or non-social information sources. Therefore, we tested whether participants have different prior expecations about the performance of social compared to non-social predictors. We compared the confidence judgments at the first trial of each predictor. We found that participants set confidence intervals very similarly in social and non-social conditions (Figure below). Hence, it did not seem to be the case that differences between conditions arose due to low level differences in stimulus sets or prior differences in expectations about performances of social compared to non-social predictors. However, we can show that differences between conditions are apparent when updating one’s belief about social advisors or non-social cues and as a consequence, in the way that confidence judgments are set across time.

      Figure. Confidence interval for the first encounter of each predictor in social and non-social conditions. There was no initial bias in predicting the performance of social or non-social predictors.

      Main text page 13:

      [… ]<br /> Additional control analyses show that neural differences between social and non-social conditions were not due to the visually different set of stimuli used in the experiment but instead represent fundamental differences in processing social compared to non-social information (Figure 4 – figure supplement 4). These results are shown in ROI-based RSA analysis and in whole-brain searchlight analysis. In summary, in conjunction, the univariate and multivariate analyses demonstrate that dmPFC and pTPJ represent beliefs about social advisors that develop over a longer timescale and encode the identities of the social advisors.

      References

      1. Heyes, C. (2012). What’s social about social learning? Journal of Comparative Psychology 126, 193–202. 10.1037/a0025180.
      2. Chang, S.W.C., and Dal Monte, O. (2018). Shining Light on Social Learning Circuits. Trends in Cognitive Sciences 22, 673–675. 10.1016/j.tics.2018.05.002.
      3. Diaconescu, A.O., Mathys, C., Weber, L.A.E., Kasper, L., Mauer, J., and Stephan, K.E. (2017). Hierarchical prediction errors in midbrain and septum during social learning. Soc Cogn Affect Neurosci 12, 618–634. 10.1093/scan/nsw171.
      4. Frith, C., and Frith, U. (2010). Learning from Others: Introduction to the Special Review Series on Social Neuroscience. Neuron 65, 739–743. 10.1016/j.neuron.2010.03.015.
      5. Frith, C.D., and Frith, U. (2012). Mechanisms of Social Cognition. Annu. Rev. Psychol. 63, 287–313. 10.1146/annurev-psych-120710-100449.
      6. Grabenhorst, F., and Schultz, W. (2021). Functions of primate amygdala neurons in economic decisions and social decision simulation. Behavioural Brain Research 409, 113318. 10.1016/j.bbr.2021.113318.
      7. Lockwood, P.L., Apps, M.A.J., and Chang, S.W.C. (2020). Is There a ‘Social’ Brain? Implementations and Algorithms. Trends in Cognitive Sciences, S1364661320301686. 10.1016/j.tics.2020.06.011.
      8. Soutschek, A., Ruff, C.C., Strombach, T., Kalenscher, T., and Tobler, P.N. (2016). Brain stimulation reveals crucial role of overcoming self-centeredness in self-control. Sci. Adv. 2, e1600992. 10.1126/sciadv.1600992.
      9. Wittmann, M.K., Lockwood, P.L., and Rushworth, M.F.S. (2018). Neural Mechanisms of Social Cognition in Primates. Annu. Rev. Neurosci. 41, 99–118. 10.1146/annurev-neuro080317-061450.
      10. Shafto, P., Goodman, N.D., and Frank, M.C. (2012). Learning From Others: The Consequences of Psychological Reasoning for Human Learning. Perspect Psychol Sci 7, 341– 351. 10.1177/1745691612448481.
      11. McGuire, J.T., Nassar, M.R., Gold, J.I., and Kable, J.W. (2014). Functionally Dissociable Influences on Learning Rate in a Dynamic Environment. Neuron 84, 870–881. 10.1016/j.neuron.2014.10.013.
      12. Behrens, T.E.J., Woolrich, M.W., Walton, M.E., and Rushworth, M.F.S. (2007). Learning the value of information in an uncertain world. Nature Neuroscience 10, 1214– 1221. 10.1038/nn1954.
      13. Meder, D., Kolling, N., Verhagen, L., Wittmann, M.K., Scholl, J., Madsen, K.H., Hulme, O.J., Behrens, T.E.J., and Rushworth, M.F.S. (2017). Simultaneous representation of a spectrum of dynamically changing value estimates during decision making. Nat Commun 8, 1942. 10.1038/s41467-017-02169-w.
      14. Allenmark, F., Müller, H.J., and Shi, Z. (2018). Inter-trial effects in visual pop-out search: Factorial comparison of Bayesian updating models. PLoS Comput Biol 14, e1006328. 10.1371/journal.pcbi.1006328.
      15. Wittmann, M., Trudel, N., Trier, H.A., Klein-Flügge, M., Sel, A., Verhagen, L., and Rushworth, M.F.S. (2021). Causal manipulation of self-other mergence in the dorsomedial prefrontal cortex. Neuron.
      16. Wittmann, M.K., Kolling, N., Faber, N.S., Scholl, J., Nelissen, N., and Rushworth, M.F.S. (2016). Self-Other Mergence in the Frontal Cortex during Cooperation and Competition. Neuron 91, 482–493. 10.1016/j.neuron.2016.06.022.
      17. Kappes, A., Harvey, A.H., Lohrenz, T., Montague, P.R., and Sharot, T. (2020). Confirmation bias in the utilization of others’ opinion strength. Nat Neurosci 23, 130–137. 10.1038/s41593-019-0549-2.
      18. Trudel, N., Scholl, J., Klein-Flügge, M.C., Fouragnan, E., Tankelevitch, L., Wittmann, M.K., and Rushworth, M.F.S. (2021). Polarity of uncertainty representation during exploration and exploitation in ventromedial prefrontal cortex. Nat Hum Behav. 10.1038/s41562-020-0929-3.
      19. Yu, Z., Guindani, M., Grieco, S.F., Chen, L., Holmes, T.C., and Xu, X. (2022). Beyond t test and ANOVA: applications of mixed-effects models for more rigorous statistical analysis in neuroscience research. Neuron 110, 21–35. 10.1016/j.neuron.2021.10.030.
      20. Mars, R.B., Jbabdi, S., Sallet, J., O’Reilly, J.X., Croxson, P.L., Olivier, E., Noonan, M.P., Bergmann, C., Mitchell, A.S., Baxter, M.G., et al. (2011). Diffusion-Weighted Imaging Tractography-Based Parcellation of the Human Parietal Cortex and Comparison with Human and Macaque Resting-State Functional Connectivity. Journal of Neuroscience 31, 4087– 4100. 10.1523/JNEUROSCI.5102-10.2011.
      21. Yu, A.J., and Cohen, J.D. Sequential effects: Superstition or rational behavior? 8.
      22. Nili, H., Wingfield, C., Walther, A., Su, L., Marslen-Wilson, W., and Kriegeskorte, N. (2014). A Toolbox for Representational Similarity Analysis. PLoS Comput Biol 10, e1003553. 10.1371/journal.pcbi.1003553.
      23. Lockwood, P.L., Wittmann, M.K., Nili, H., Matsumoto-Ryan, M., Abdurahman, A., Cutler, J., Husain, M., and Apps, M.A.J. (2022). Distinct neural representations for prosocial and self-benefiting effort. Current Biology 32, 4172-4185.e7. 10.1016/j.cub.2022.08.010.
    1. Author Response

      Reviewer #1 (Public Review):

      The authors performed simultaneous extracellular recordings in brain regions (CA1, prefrontal cortex (PFC), olfactory bulb (OB)) that are key to odor-guided decision making to delineate the oscillatory and cell population dynamics that guide decision making based on learned associations. They used complementary analyses to assess the coordination between CA1 and medial PFC (mPFC), using coherence and phase-locking analysis as well as generalized linear models and Bayesian decoding methods.

      One of the strengths of this work is the comparison of beta and respiratory (RR) LFP coherence in several behavioral states to rule out confounds due to sniffing or preparatory motor behavior (e.g., coherence was assessed during decision making with and without an odor present, during reward consumption). These controls allowed the authors to identify a specific enhancement of beta compared to RR coherence during decision making.

      The analyses of task-responsive putative interneuron and pyramidal cells suggest that accurate decision-making is associated with a stronger modulation of beta phase-locking in interneurons. Additional cross-correlation analyses between cell types across regions showed that cells, particularly interneurons, are temporally coordinated in the beta range. Their analyses did not identify a mechanism for this coordination, but the temporal lags between PFC and CA1 cells raise the possibility of top-down interactions mediated by a third brain region.

      The authors used the cellular activity to determine that the animal's upcoming behavior could be predicted from the ensemble activity during decision-making a few hundred milliseconds before the behavioral choice, but decoding accuracy diminished soon after the decision-making period. Interestingly, decoding accuracy increased after decision-making when using the spatially active cell ensembles. As indicated by the authors, these results suggest that different cell ensembles are engaged during decision-making and during the execution of the decision. It is possible that this change in ensemble dynamics before and after decision-making relates to the familiarity of the animals with the task, which makes it likely to involve procedural components (e.g., Jog et al., 1999). As pointed out by the authors in the discussion, several results have implications for the formation of associative memories and provide clues for future experiments. Thus, future work looking at the ensemble dynamics and at the occurrence of CA1 ripples in the early stages of task learning compared to when the animals are very familiar with the task (as in the current study), will provide a better understanding of the shifts that develop during the formation and consolidation of the association.

      One of the considerations in interpreting the results is that the odor sampling and decisionmaking periods overlap, making it difficult to disentangle the neural dynamics that are driven by the recall of the association (cued retrieval) and those that relate to the upcoming turning behavior after odor port disengagement. However, the author's analyses of odor and choice selectivity in correct and incorrect trials demonstrate a preferential association between spike activity and choice selection in this task.

      Overall, the results advance our understanding of odor-guided decision-making mechanisms in CA1 and PFC at the LFP and cell population level. This work will be of significance to further research on the cellular basis of memory-guided decision-making, and to future work characterizing the interactions between CA1 and PFC during learning.

      We thank the Reviewer for their detailed evaluation summarizing and highlighting the strengths of the study. In addition to beta and respiratory rhythm (RR) modulation of CA1-PFC activity and the relationship between spiking activity and choice selection, the Reviewer also highlighted the temporal coordination of CA1 interneurons and change in ensemble dynamics during the decision-making period at the odor-port vs. during the execution of the decision on the maze, which is further emphasized as a novel result in the revised manuscript.

      Reviewer #2 (Public Review):

      Symanski et al. investigated the communication between the medial prefrontal cortex (mPFC), the hippocampal CA1 region, and the olfactory bulb (OB) while rats underwent an odor-cued decision-making task. By recording local field potentials and spiking activity in the three regions, they found that all regions became synchronized at the beta band and respiratory rhythms during cue sampling/decision-making. Although the strength of inter-region synchrony was not predictive of correct choices, both CA1 and mPFC neurons showed stronger phase-locked firings to beta oscillations for correct than incorrect choices. Moreover, a subset of putative pyramidal and interneurons in both regions were selective for task variables, and as ensembles, they formed activity patterns differentiating choices. Also, their firings were temporally coordinated in a direction that the mPFC interneurons led CA1 interneurons and pyramidal neurons. Based on these findings, the authors propose that cue-evoked beta oscillations modulate the activity of interneurons to coordinate ensemble activity in CA1-mPFC networks supporting decision-making.

      Strength:

      The findings uncovered a new style of mPFC-Hippocampal communication through odorevoked beta oscillations, which contrasts with theta oscillations and sharp-wave/ripples reported during memory-guided spatial navigation tasks. The overall quality of the work is outstanding. The data collection and analysis were meticulously conducted with appropriate controls and statistical tests.

      Weakness:

      The initial analysis of LFP activity (Figure 2d) revealed strong coherence in the beta band in all region pairs; however, the subsequent analysis focuses on mPFC-CA1 interaction. To justify this approach, it is essential to establish that the mPFC-CA1 beta synchrony reflects their direct communication rather than a by-product of common inputs from the OB.

      The authors used cross-correlograms to reveal the directionality of mPFC-CA1 interaction. To strengthen the author's view that beta oscillations help coordinate neural activity, it is worth investigating if the same temporal relationship is also detectable within each cycle of beta oscillations. Specifically, mPFC interneurons may fire at earlier phases, followed by firings of CA1 interneurons and pyramidal neurons at later phases.

      We thank the Reviewer for their positive evaluation and constructive comments. We have addressed the weaknesses noted in the revised manuscript. In particular, we have added analyses and text that emphasize the change in beta synchrony in the OB-CA1PFC network during the task, and added analyses that examine phase locking of pyramidal cells and interneurons to beta rhythms in the mPFC, CA1 and OB.

      Reviewer #3 (Public Review):

      Symanski et al. describe a set of interesting results derived from analyzing electrophysiological recordings performed in rats well trained on an associative memory task on a spatial maze (a T maze), in which animals learned to associate a given odor delivered in an initial maze region (upon a nose poke) with a subsequent spatial choice (a left or a right turn) to receive a reward. The authors have obtained LFPs from the OB, PFC, and CA1 from 8 animals subjected to this task, along with single-unit activity from the PFC and CA1. The authors describe that, during odor sampling, there is prominent LFP activity in the beta range (20-30 Hz) as well as prominent activity of the respiration-entrained LFP rhythm (RR, 7-8 Hz). The authors convincingly show that beta activity - but not RR - is specific to odor sampling (RR also shows up during other immobility periods within the task and when animals breathed clean air). They further show that not only beta power but also inter-regional beta coherence significantly enhances during the odor sampling period. In addition, the authors find a higher beta phase modulation of spiking in a subset of neurons associated with subsequent correct decisions. Since the authors also prove - based on behavioral analysis - that the odor-sampling period corresponds to the decisionmaking period in this task, they propose a role for beta coordination of hippocampal-prefrontal networks in sensory-cued decision making. The paper also brings along a set of complementary findings looking at the single unit and ensemble activity in both regions (CA1 and PFC), which are capable of predicting future spatial choices.

      I consider the investigated topic relevant to modern neuroscience and likely to interest a broad audience. Nevertheless, while there is much to like about this paper (e.g., carefully done experiments, advanced computational data analyses, well-written text, and well-crafted figures), I caught some issues that called my attention upon a careful reading, which I list below:

      A) The paper is written in a way clearly centered on rhymical brain activity (c.f. title, abstract, introduction, and discussion). Yet, out of 7 main figures, only 2 of them show data related to oscillations (while 1 figure shows behavioral data and 4 figures show spiking analysis not related to brain rhythms). Therefore, the presentation of the results seems unbalanced and disconnected from the main story.

      B) Somewhat related to the point above, in a strict sense, the title is not well justified ("Rhythmic coordination of hippocampal-prefrontal ensembles (...)") since there is no analysis relating assembly activity with either beta or RR (their results show beta or RR modulating a subset of single units), nor there is a combined ensemble analysis of PFC and hippocampal units (i.e., interregional cell assemblies). Why not try to relate ensemble activity to the observed oscillations?

      C) The main result of increased interregional beta coherence specifically during odor sampling is very interesting and seems quite solid. Though I hate being the one raising questions about the level of advancement, I cannot avoid noticing that similar increases in beta coherence in odor-sampling-based tasks have been observed before (e.g., increased OB-HPC beta coherence during odor sampling has been shown in Martin et al 2007 and between LEC and HPC in Igarashi et al 2014), which is to say that there is overlap between this core finding and previous research. But that said, in times where the reproducibility of our scientific endeavor has been put into question, this particular reviewer favors the publication of similar findings by independent labs, especially given this neatly collected dataset. It is recommended to highlight which results constitute novel insights here and which results provide support for previously published results.

      D) It called to my attention that many of the spiking results were obtained for a small percentage of neurons. For instance, how can the authors be confident that the choice-selective neurons are actually coding for the choice as opposed to being randomly detected by statistical chance? As a case in point, the authors mention that 1309 units were recorded in CA1, and from these 42 cells were choice selective. If the authors have employed a typical alpha of 5% for detecting such neurons, chance alone would predict ~60 neurons being false positives. I apologize if I am missing something, but could the authors clarify? On a related note, even though most findings hold true for a small percentage of neurons, the writing also tends to generalize the findings to the whole population (e.g., "Beta phase modulation of CA1 and PFC neuronal activity during this period was linked to accurate decisions, suggesting that this temporal modulation influences sensory-cued decision making.").

      We thank the Reviewer for their detailed comments and feedback. We have addressed the issues raised by the Reviewer, which has significantly strengthened the manuscript.

      A) We have added several new analyses for rhythmic modulation of spiking activity, and elevated some of the Supplementary Figures related to oscillations to the main figures (Figures 2, 5). In addition, since several of our analyses provide novel results for spiking and ensemble dynamics before and after the decision making period, as noted by Reviewer 1, and we have emphasized these results as a novel advance in the revised manuscript , including the title and abstract.

      B) We agree that our analysis focuses on rhythmic coordination by beta and RR oscillations, phase modulation of single cell spiking activity in CA1 and PFC for accurate odor-cued decision making, and ensemble dynamics during decision making and execution of decisions. While relating ensemble activity to the observed oscillations is a long-term goal, we are limited by the size of simultaneously recorded ensembles within single sessions, since measures of ensemble dynamics per trial are required for such analyses. This is now noted in the Discussion section. We therefore focus our analyses separately on single cell modulation by rhythms and dynamics of ensemble activity during decision making.

      We have also retitled the manuscript to indicate this: “Rhythmic coordination and ensemble dynamics in the hippocampal-prefrontal network for odor-place associative memory and decision making”, to more accurately reflect our results.

      C) We appreciate the Reviewer’s favorable view on independent confirmation of previous results on beta coherence using our strong dataset. We have referenced previous results on OB-HPC, LEC-HPC and striatal beta coherence in the manuscript (e.g., Kay and Beshel 2010; Igarashi et al. 2014; Rangel et al. 2016; Leventhal et al., 2012).

      In addition, we also highlight the novelty of our results in the manuscript, as noted by Reviewers 1 and 2. Our findings in these specific circuits, namely the PFC-CA1 network, during odor-cued decision making are novel. Our results show that beta phase modulation of a sub-population of phase-coherent CA1 and PFC neurons is linked to accurate decision making, elucidate selectivity and ensemble dynamics in these regions during decision making, and show that independent ensembles are recruited during odor-sampling vs. the execution of decisions on the spatial maze. These results are emphasized in the revised manuscript.

      D) We apologize for the misunderstanding regarding the number of neurons. We had initially reported total number of neurons recorded across run and sleep sessions, including those with very few spikes during the task. In determining task-responsive and task-unresponsive neurons (Figure 3), the task-unresponsive set also includes a very large fraction of neurons that did not have sufficient spikes during the odor-sampling or decision making period (e.g. using a criterion of number of spikes equal to number of trials; similar numbers are seen with other criterion such as an absolute minimum number of spikes). These neurons should be more accurately denoted as “Odor Period Inactive”. Therefore a more accurate estimate of task-responsive neurons in CA1 and PFC indicating their task engagement is now shown in Figure 3, starting with neurons that had sufficient spikes for this analysis. Using this metric, a large fractions of neurons are task responsive and selective, similar to previously reported fractions in other studies (Igarashi, et al., 2014). We have added this description and numbers in the text (page 11 lines 230-241) and Methods (page 37 lines 795-797).

      We have also toned down the interpretation by avoiding generalizing to the whole population, and note that beta phase modulation of phase-locked neurons is related to behavior accuracy. Here, in particular, our results suggest a key role of CA1 interneurons in beta-mediated interactions.

    1. Author Response

      Reviewer #2 (Public Review):

      Reinforcement learning (RL) theory is important because it provides a broad, mathematically proven framework for linking behavioral states to behavioral actions, and has the potential for linking realistic biological network dynamics to behavior. The most detailed neurophysiological modeling uses biophysical compartmental models with the theoretical framework of HodgkinHuxley and Rall to describe the dynamics of real neurons, but those models are extremely difficult to link to behavioral output. RL provides a theoretical framework that could help bridge across the still-underexplored chasm between behavioral modeling and neurophysiological detail.

      On the positive side, this paper uses a network of interacting neurons in region CA3 and CA1 (as used in previous models by McNaughton and Morris, 1987; Hasselmo and Schnell, 1994; Treves and Rolls, 1994; Mehta, Quirk and Wilson. 2000; Hasselmo, Bodelon and Wyble, 2002) to address how a simple representation of biological network dynamics could generate the successor representation used in RL. The successor representation is an interesting theory of hippocampal function, as it contrasts with a previous idea of model-based planning. Previous neuroscience data supports the idea that animals use a model-based representation (a cognitive map made up of place cells or grid cells) to read out potential future paths to plan their behavior in the environment. For example, Johnson and Redish, 2007 showed activity spreading into alternating arms of a T-maze before a decision is made (i.e. a model-based exploration of possible actions, NOT a successor representation), and Pfeiffer and Foster, 2013 showed that replay in 2-dimensions corresponds to future goal directed activity. Models such as Erdem and Hasselmo, 2012 and Fenton and Kubie, 2012 showed how forward planning of possible trajectories could guide performance of behavioral tasks. In contrast, the successor representation proposes that model-based activity is too computationally expensive and proposes that instead of reading out various possible model-based future paths when making a decision, that a simulated agent could instead learn a look-up table indicating the probability of future behavioral states accessible from a given state. In previous work, the successor representations accounted for certain aspects of experimental neuroscience data such as place cells responding to the insertion of barriers as seen by Alvernhe et al. and the backward expansion of place field seen by Mehta et al. The current paper is admirable for addressing the potential role of neural replay in training of successor representations and its relationship to other neural and behavioral data such as the papers by Cheng and Frank 2008 and by Wu et al. 2017.

      However, a lot of this same data could still be interpreted as indicating that animals use a model-based representation as described above. There's nothing in this paper that rules out a model-based interpretation of the results discussed above. In fact, the cited paper by Momennejad et al. 2017 shows that humans extensively use model-based mechanisms along with some use of a successor representation in addition to the model-based mechanism. The description in the article under review needs to avoid treating successor representations as if they are already the ground truth.

      To do this, throughout the paper, the authors need to repeatedly address the fact that the Successor Representation is just a theory and not proven experimental fact. And they need to repeatedly in all sections point out that the successor representations hypothesis can be contrasted with the theory that model-based neural activity could instead guide behavior and could be the correct account for all of the data that they address (i.e. such as the darkavoidance behavior). They should cite the previous examples of neural data that looks like model-based planning such as Johnson and Redish, 2007 in the T-maze and Pfeiffer and Foster, 2013 in open fields, and cite models such as Hasselmo and Eichenbaum, 2005; Erdem and Hasselmo, 2012 and Fenton and Kubie, 2012 that showed how forward replay or planning of possible trajectories could guide performance of behavioral tasks

      We thank the reviewer for the valuable feedback. We have adapted the manuscript throughout to discuss the important point that the SR is not the ground truth (e.g. the final paragraphs in the sections “Bias-variance trade-off” and “Leveraging replays to learn novel trajectories”). We also discussed more extensively the model-based literature and the suggested citations in the manuscript.

      The title and text repeatedly refers to a "spiking" model. They show spikes in Figure 2 and extensively discuss the influence of spiking on STDP, but they ought to more explicitly discuss the interaction of their spike generation mechanisms (using a Poisson process) and the authors should compare their model to the model of George, DeCothi, Stachenfeld and Barry which addresses many of the same questions but using theta phase precession to obtain the correct spike timing in STDP.

      Yes, that's a great suggestion. We have extended our discussion section. In particular, we added:

      In our work, we did not include theta modulation, but phase precession and theta sequences could be yet another type of activity within the TD lambda framework. Interestingly, more groups have recently investigated related ideas. A recent work \citep{George2022} incorporated the theta sweeps into behavioural activity, showing it approximately learns the SR. Moreover, theta sequences allow for fast learning, playing a similar role as replays (or any other fast temporalcode sequences) in our work. By simulating the temporally compressed and precise theta sequences, their model also reconciles the learning over behavioral timescales with STDP. In contrast, our framework reconciles both timescales relying purely on rate-coding during behaviour. Finally, their method allows to learn the SR within continuous space. It would be interesting to investigate whether these methods co-exist in the hippocampus and other brain areas. Furthermore, \citep{Fang2022} et al. recently showed how the SR can be learned using recurrent neural networks with biologically plausible plasticity.

      The introduction and start of the Results section are should have more citations to neuroscience data. The introduction currently cites only three experimental citations (O'Keefe and Dostrovsky, 1971; O'Keefe and Nadel, 1978 and Mehta et al. 2000) and then gives repeated citations of previous theory papers as if those papers define the experimental data that is relevant to this study. The article should review actual neuroscience literature, instead of acting as if a few theory papers in the last five years are more important sources of data than decades worth of experimental work. The start of the results section makes a statement about the role of hippocampus and only cites Stachenfeld et al. 2017 as if it were an experimental paper. The introduction, start of results and discussion need to be modified to address actual experimental data instead of just prior modeling papers. They need to add at least a paragraph to the introduction discussing real experimental data. There are numerous original research papers that should be cited for the role of hippocampus in behavior so that the reader doesn't get the impression all of this work started with the paper by Stachenfeld et al. 2017. For example, the introduction should supplement the citations to O'Keefe and Mehta with other experimental papers including those that they cite later in the paper. They should also cite other seminal work of Morris et al. 1982 in Morris water maze and Olton, 1979 in 8-arm radial maze and work by Wood, Dudchenko, Robitsek and Eichenbaum on neural activity during spatial alternation. At the start of the Results, instead of only citing Stachenfeld (which should have reduced emphasis when speaking about experiments), they should again cite O'Keefe and Nadel, 1978 for the very comprehensive review of the literature up to that time, plus the work of Morris and Eichenbaum and Aggleton and other experimental work.

      We thank the reviewer for the suggested citations. We have added many citations in order to discuss the experimental literature more thoroughly.

      This article is admirable for addressing how to utilize a continuous representation of space and time, which Kenji Doya also addressed in his NeurIPS article in 1995 and Neural Computation 2000 (which should be cited). To emphasize the significance of this continuous representation, they could note that reinforcement learning (RL) theory models still tend to use a discretized grid-like map of the world and discrete representation of time that does not correspond to the probabilistic nature of place cell response properties (Fenton and Muller) and the continuous nature of the response of time cells (Kraus et al. 2013).

      We thank the reviewer for this important comment and this is indeed one of the main strengths of the proposed framework. We have now emphasised this point, by adding the following paragraph to the Discussion:

      “Importantly, the discount parameter also depends on the time spent in each state. This eliminates the need for time discretization, which does not reflect the continuous nature of the response of time cells (Kraus et al. 2013).”

      I think the authors of this article need to be clear about the shortcomings of RL. They should devote some space in the discussion to noting neuroscience data that has not been addressed yet. They could note that most components of their RL framework are still implemented as algorithms rather than neural models. They could note that most RL models usually don't have neurons of any kind in them and that their own model only uses neurons to represent state and successor representations, without representing actions or action selection processes. They could note that the agents in most RL models commonly learn about barriers by needing to bang into the barrier in every location, rather than learning to look at it from a distance. The ultimate goal of research such as this should to link cellular level neurophysiological data to experimental data on behavior. To the extent possible, they should focus on how they link neurophysiological data at the cellular level to spatial behavior and the unit responses of place cells in behaving animals, rather than basing the validity of their work on the assumption that the successor representation is correct.

      We thank the reviewer for this suggestion, we have now extended the Discussion to include a paragraph on the “Limitations of the Reinforcement Learning framework” which we reproduce here:

      We have already outlined some of the perks of using reinforcement learning for modelling behaviour, including providing clear computational and algorithmic frameworks. However, there are several intrinsic limitations to this framework. For example, it needs to be noted that RL agents that only use spatial data do not provide complete descriptions of behavior, which likely arises from integrating information across multiple sensory inputs. Whereas an animal would be able to smell and see a reward from a certain distance, an agent exploring the environment would only be able to discover it when randomly visiting the exact reward location. Furthermore, the framework rests on fairly strict mathematical assumptions: typically the state space needs to be markovian, time and space need to be discretized (which we manage to evade in this particular framework) and the discounting needs to follow an exponential decay. These assumptions are overly simplistic and it is not clear how often they are actually met. Reinforcement Learning is also a sample-intensive technique, whereas we know that some animals, including humans, are capable of much faster or even one-shot learning. \ Regarding the specific limitations of our model, we can note that even though we have provided a neural implementation of the SR, and of the value function as its read-out (see Figure 5-figure supplement S2, the whole action selection process is still computed only at the algorithmic level. It may be interesting to extend the neural implementation to the policy selection mechanism in the future.

    1. Author Response

      Reviewer #2 (Public Review):

      Regulation of NAD and its intermediary metabolites is of critical importance in axon degeneration and neurodegenerative disease. Mounting evidence supports a scenario in which low NAD, and high NMN triggers axon degeneration by competitive allosteric inhibition/activation of SARM1. Strategies to increase NAD levels and/or lower NMN levels provide neuroprotection in a variety of contexts. NAD metabolism is a partially conserved process, however, there are key differences in pathway routes and dynamics between model organisms used for NAD research (yeast, worm, fly, zebrafish, mouse/mammalian systems). Drosophila is a key model organism for axon degenerative research based on its ease of use and range of available genetic tools, in addition, the effector of axon degeneration - SARM1 - was first identified in the fly. As Drosophila has some key differences in the NAD synthesis pathways to mammalian systems it is important to test and develop tools to enable exploration of these pathways on the fly. Llobet Rosell and colleagues have developed clear and demonstrable tools in Drosophila for exploring NAD-related axon degenerative pathways by modulating the use of NMN via the addition of NMN consuming and NMN generating enzymes. They utilize Drosophila genetics to adequately support the claims made in the manuscript. Importantly, the authors well-demonstrate that consuming NMN through an alternate route to NaMN provides neuroprotection and that the neuroprotective components of low NMN are upstream of SARM1. These should be useful tools for neuroscientists in the future to use Drosophila for neurodegenerative research.

      Strengths:

      • Clear demonstration that low NMN provides neuroprotection using novel, stable, enzymatic depletion of NMN (to NaMN).

      • Development of a novel Drosophila tool (NMN-D transgenics) to explore NMN metabolism in vivo, including a stabilized version to permit chronic NMN depletion.

      • Metabolomic profiles across the pathway to show all pathway changes (not just isolated NMN or NAD assays). • Neurodegenerative assays that include both histological outcomes (axon degeneration) but also circuitry/functional outcomes. Data from both series of experiments all support each other.

      • Assessment of other known potent axon degenerative genes via genetics in combination with the tools developed. • Staging of the molecular processes by strategic ablation of the inhibitory ARM domain on SARM1 (dSarm deltaARM). These experiments suggest that low NAD AND high NMN (i.e. ratio between the two) is the critical factor that drives axon degeneration. Once NAD is low, axon degeneration cannot be recovered by further lowering of NMN. The dSarm delta-ARM and dnmnat sgRNAs experiments support a hypothesis in that (high) NMN triggers, but doesn't, execute axon degeneration.

      We appreciate his recognition of the quality of our research.

      Weaknesses:

      • The authors use murine NAMPT (mNAMPT) to increase NMN. The degeneration assays support the hypotheses made, yet mNAMPT doesn't actually increase NMN. Thus it is unclear in this setting whether mNAMPT promotes axon degeneration by an NMN-related mechanism or through another route. It is also unclear as to why the murine form was chosen versus a human or other orthologues, or changing the metabolism of the intrinsic pathway (NR and NRK).

      Why mNAMPT:

      We decided to use mouse NAMPT (mNAMPT) because it was readily available by Giuseppe Orsomando (Amici et al., 2017), and because we did not have access to human NAMPT (hNAMPT).<br /> We agree with the observation that under physiological conditions, the expression of mNAMPT does not change NMN. However, we argue that after injury, once dNmnat is degraded, the additional NMN synthesis provided by mNAMPT expression (in addition to dNrk), leads to a faster NMN accumulation. It is supported by the observation that NMNAT2 is more labile than NAMPT in mammals (Gilley and Coleman, 2010; Stefano et al., 2015).

      • The authors use metabolic profiling to look at the individual metabolites during axon degenerative evens and treatments however it is unclear if any of these proteins or genes change as a consequence. This is likely not important for understanding the findings however, might be helpful in explaining the mNAMPT data.

      We agree with the idea to test whether there is a change induced at the mRNA or protein level when the metabolic flux is altered. To do this, first, we measured the relative expression levels of axon death and NAD+ synthesis genes (Figure 2 – figure supplement 1B). Then, we measured potential changes upon mNAMPT expression (Figure 4 – figure supplement 1). Importantly, while the Gal4-driven expression resulted in an increase of relative mNAMPT transcript abundance from 30 to 12’000, the change observed in the other genes was not notable. Importantly, compared to Actin–Gal4, dnrk is 2-fold lower in UAS-mNAMPT and Actin > mNAMPT backgrounds (control vs. experiment, respectively). Thus, overall, there appears to be no change in mRNAs of either axon death or NAD+ synthesis genes.

      In the results, we changed the text accordingly:

      "We then tested the effect of mNAMPT on the NAD+ metabolic flux in vivo. Surprisingly, NAM, NMN, and NAD+ levels remained unchanged under physiological conditions (Figure 4C). However, we noticed 3-fold higher NR and a moderate but significant elevation of ADPR and cADPR levels upon mNAMPT overexpression (Figure 4C). We also asked whether mNAMPT impacts on NAD+ homeostasis thereby altering the expression of axon death or NAD+ synthesis genes. Besides the expected significant increase in the Gal4-mediated expression of mNAMPT, we did not observe any notable changes at the mRNA level (Figure 4 – figure supplement 1)."

      • The authors repeatedly introduce a novel PncC antibody. However, no details on this, its generation, or its testing are found within the manuscript as presented. The antibody detects with several bands. The authors speculate that this could be a degradation product but nothing substantial is shown.

      In Materials and methods, we added a new section:

      "PncC antibody generation Rabbit anti-PncC antibodies were generated by Lubioscience under a proprietary protocol. The immunogen used was purified from Escherichia coli, strain K12, corresponding to the full protein sequence of NMN-D. The amino acid sequence is the following: MTDSELMQLSEQVGQALKARGATVTTAESCTGGWVAKVITDIAGSSAWFERGFVTYSNEAKAQMIGVREETLAQHGAVSEPVVVEMAIGALKAARADYAVSISGIAGPDGGSEEKPVGVWFAFATARGEGITRRECFSGDRDAVRRQAT AYALQTLWQQFLQNT"

      We also updated the results referencing it.

      "We found that both wild-type and enzymatically dead NMN-D enzymes are equally expressed in S2 cells, as detected by newly generated PncC antibodies (Materials & Methods, Figure 1–figure supplement 2). Notably, we observed two immunoreactivities per lane, with the lower band being a potential degradation product."

      In addition, we now provide evidence why we believe that the upper band is NMN-D, while the lower one is a degradation product. In the figure attached below, the samples of the first five lanes were denatured at 70 °C, while the samples of the last two lanes were denatured at 95 °C (each for 10 min, respectively). The resulting Western blot shows that at 70 °C, there is more unspecific background, but no lower degradation product, while at 95 °C, the background is drastically reduced; however, there is a lower degradation product appearing. NMN-D is indicated by an asterisk. We feel that it is important to show this data here in the rebuttal. But we feel that it would add confusion to the readers in the manuscript.

      • Olfactory receptor neuron degeneration assays are shown in Fig1 but no data is presented with it to support the images.

      We agree that a quantification would support our observation. However, it is difficult to precisely quantify individual axons in the ORN injury assay, for two main reasons:

      1. Severed axons are often bundled, thus the exact number cannot be scored.

      2. Due to the removal of the cell body, the axonal GFP intensity decreases over time, due to the absence of mCD8::GFP synthesis. It adds another level of difficulty. Nevertheless, we added numbers to each example in Figure 1E and D, where we quantified the % of brains where severed preserved axons were observed, similar to Figure 2 in (MacDonald et al., 2006).

      In the results section, we changed the text as indicated below:

      "We extended the ORN injury assay and found preservation at 10, 30, and 50 dpa (Figure 1E). While quantifying the precise number of axons is technically not feasible, severed preserved axons were observed in all 10, 30, and 50 dpa brains, albeit fewer at later time points (MacDonald et al., 2006). Thus, high levels of NMN-D confer robust protection of severed axons for multiple neuron types for the entire lifespan of Drosophila."

      In the Figure 1 legend, we changed the text accordingly:

      "D Low NMN results in severed axons of olfactory receptor neurons that remain morphologically preserved at 7 dpa. Examples of control and 7 dpa (arrows, site of unilateral ablation). Lower right, % of brains with severed preserved axon fibers. E Low NMN results in severed axons that remain morphologically preserved for 50 days. Representative pictures of 10, 30, and 50 dpa, from a total of 10 brains imaged for each condition (arrows, site of unilateral ablation). Lower right, % of brains with severed preserved axon fibers."

    1. Author Response

      Reviewer #1 (Public Review):

      Alexander Komkov et al. developed a novel software/algorithm (iROAR) to utilise naturally occurring non-functional clonotypes as a control repertoire to correct for amplification bias associated with multiplex PCR based technologies commonly used in TCR/BCR repertoire analysis. No new data was generated in this study and utilises only publicly available datasets. The authors firstly determine the over amplification rate (OAR) as a metric which is found to be close to 1 under no or little amplification bias and this was validated by calculating the OAR for repertoires determined using 5'-RACE, a method known to have little to no amplification bias. This was a great control to have and is essential for validating the OAR measurement. In contrast, multiplex PCR based protocols such as VMPlex and VJMplex had significant deviations in the distribution of OAR.

      Strengths: The authors used publicly available datasets that utilise both biased (multiplex PCR based) and low biased (5'-RACE) methods to determine TCR/BCR repertoires. In addition, the authors generated in silico biased 5'-RACE datasets. These comparisons are critical in determining the effect of bias correction.

      Weaknesses: Analysis of TCR/BCR repertoires are very generalised to number of clonotypes. The use of this algorithm could be more widespread if the effect of iROAR on another repertoire analysis tools was determined or discussed. For example, does iROAR affect measures of diversity? Identification of rare but unique clonotypes? The ability to detect true clonal expansions? Additionally, documentation for the software is lacking and largely inaccessible to non-specialists.

      By default, iROAR does not affect diversity and does not remove any clones. This statement was added to the manuscript. For now, the analysis of the potential effect on the detection of true clonal expansion is infeasible due to the lack of appropriate data with sufficient sequencing coverage. Also, we’ve made a more detailed description of iROAR software.

      Reviewer #2 (Public Review):

      In this paper, Komkov et al. describe a novel approach for computational correction of PCR amplification bias in adaptive immune receptor repertoire (AIRR) sequencing data (AIRR-seq). Their correction algorithm is based on using out-of-frame rearrangements to approximate gene-specific amplification bias. Gene-specific relative frequencies among out-of-frame rearrangements are not altered by clonal expansion except to the extent that out-of-frame rearrangements are passengers in clones expanding as a consequence of the specificity of the functional rearrangement. Due to independence between the two rearrangements, it can be reasonably assumed that the effects of clonal expansion are uniform in their impact on the observed V- and J-gene frequencies among out-of-frame rearrangements. Komkov et al. further assume that gene-specific relative frequencies among unique, out-of-frame rearrangements approximate recombination frequencies and that the extent to which gene-specific relative frequencies among all out-of-frame rearrangements deviate from those among unique, out-of-from rearrangements provides an estimate of gene-specific PCR amplification bias. The ratio of V- or J-gene relative frequencies among all out-of-frame rearrangements to the corresponding relative frequency among unique out-of-frame rearrangements provides this estimate and can be used as a correction factor during data processing. It also serves as the basis for a repertoire-level metric of the overall extent of amplification bias in a repertoire.

      This is a very nice and, to the best of my knowledge, novel idea. The proposed correction factor and metric have potential utility in all studies conducting AIRR-seq that use a PCR amplification step. While the proposed approach may not have superior or even equal performance when compared to biological spike-ins, it still has great potential utility given the time and financial costs and required expertise of using biological spike-ins and because it can be applied to data sets that have already been generated. Incorporation of this approach into AIRR-seq data processing has the potential to increase the accuracy of downstream analyses. It also has the potential to enhance the comparability of results across studies and to reduce the effects of different sequencing protocols for data re-use when data are integrated across studies.

      Enthusiasm is dampened by the fact that the proposed method is not directly compared to the gold standard of biological spike-ins.

      During manuscript revision, we designed and performed an additional wet-lab experiment to directly compare the iROAR approach with biological spike-ins.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript describes the generation and characterization of a mouse knockout model of Cep78, which codes for a centrosomal protein previously implicated in cone-rod dystrophy (CRD) and hearing loss in humans. Previous work in cultured mammalian cells (including patient fibroblasts) also indicated roles for CEP78 in primary cilium assembly and length control, but so far no animal models for CEP78 were described. Here, the authors first use CRISPR/Cas9 to knock out Cep78 in the mouse and convincingly demonstrate loss of CEP78 protein in lysates of retina and testis of Cep78-/- animals. Next, by careful phenotypic analysis, the authors demonstrate significant defects in photoreceptor structure and function in these mutant animals, which become more severe over a 9 (or 18) month period. Specifically, TEM analysis demonstrates ultrastructural defects of the connecting cilium and photoreceptor outer segments in the Cep78 mutants, which is in line with previously reported roles for CEP78 in CRD and in regulating primary cilia assembly in humans. In addition to a CRD-like phenotype, the authors also convincingly show that male Cep78-/- animals are infertile and exhibit severe defects in spermatogenesis, sperm flagella structure and manchette formation (MMAF phenotype). Furthermore, the authors provide evidence for an MMAF phenotype from a male individual carrying a previously reported CEP78 c.1629-2A>G mutation, substantiating that CEP78 is required for sperm development and function in mammals and supporting previously published work (Ascari et al. 2020).

      Finally, to identify the underlying molecular mechanism by which CEP78 loss causes MMAF, the authors perform some biochemical analyses, which suggest that CEP78 physically interacts with IFT20 and TTC21A (an ortholog of Chlamydomonas IFT139) and might regulate their stability. The authors conclude that CEP78 directly binds IFT20 and TTC21A in a trimeric complex and that disruption of this complex underlies the MMAF phenotype observed in Cep78-/- male mice. However, this conclusion is not fully justified by the data provided, and the mechanism by which CEP78 affects spermatogenesis therefore remains to be clarified.

      Specific strengths are weaknesses of the manuscript are listed below.

      Strengths:

      Overall, the phenotypic characterisation of the Cep78-/- animals appears convincing and provides new evidence supporting that CEP78 plays an important role in the development and function of photoreceptors and sperm cells in vertebrates.

      Weaknesses:

      1) The immunoprecipitation experiments of mouse testis extracts that were used for the mass spectrometry analysis in Table S4 were performed with an antibody against endogenous CEP78 (although antibody details are missing). One caveat with this approach is that the antibody might block binding of CEP78 to some of its interactors, e.g. if the epitope recognized by the antibody is located within one or more interactor binding sites in CEP78. This could explain why the authors did not identify some of the previously identified CEP78 interactors in their IP analysis, such as CEP76 and the EDD-DYRK2-DDB1-VprBP complex (Hossain et al. 2017) as well as CEP350 (Goncalves et al. 2021).

      We thank Reviewer #1 (Public Review) for agreeing with us on Cep78 plays an important role in photoreceptors and sperm cells development. We also appreciate Reviewer #1 (Public Review) for pointing out the weaknesses which helped us improve our study.

      For the immunoprecipitation experiments of mouse testis extracts, the antigenic sequence of the Cep78 antibody used is p457-741 (NP_932136.2). Cep78 was reported to bind DD-DYRK2-DDB1-VprBP complex, the 1-520aa is responsible for Cep78’s interaction with VprBP, and deletion of p450-497 didn’t affect Cep78’s interaction with VprBP, indicating importance of Cep78 (1-450aa) in interaction with VprBp (Hossain et al. 2017). Our anti-Cep78 antibody is generated using antigen sequence p457-741, the binding of p1-450aa to VprBP is not expected to be blocked by our anti-Cep78 antibody. However, VprBp was not detected by our IP-MS experiment. C-terminal region (395-722aa) of Cep78 overlaps with our Cep78 antibody’s antigenic region (p457-741), and was reported to interact with Cep350 (Goncalves et al. 2021). As a polyclonal antibody, our anti-Cep78 antibody didn’t block the interaction with p457-741, because we still identified Cep350 in our IP-MS. Thus, immunoprecipitation experiments using our Cep78 antibody identified some of the previously known interactors, and the interaction with VprBP may not be blocked by our Cep78 antibody.

      The detailed antibody information has now been added to Supplementary Table S7 in our revised supplementary materials.

      2) Figure 7A-D and page 18-25: based on IPs performed on cell or tissue lysates the authors conclude that CEP78 directly binds IFT20 and TTC21A in a "trimeric complex". However, this conclusion is not justified by the data provided, nor by the previous studies that the authors are referring to (Liu et al. 2019 and Zhang et al. 2016). The reported interactions might just as well be indirect. Indeed, IFT20 is a known component of the IFT-B2 complex (Taschner et al., 2016) whereas TTC21A (IFT139) is part of the IFT-A complex, which suggests that they may interact indirectly. In addition, the IPs shown in Figure 7A-D are lacking negative controls that do not coIP with CEP78/IFT20/TTC21A. It is important to include such controls, especially since IFT20 and CEP78 are rich in coiled coils that tend to interact non-specifically with other proteins.

      Thank Reviewer #1 (Public Review) for the comment on protein interaction between Cep78, Ift20, and Ttc21a. As the reviewer pointed out, IFT20 is a known component of the IFT-B2 complex (Taschner et al., 2016) whereas TTC21A (IFT139) is part of the IFT-A complex. Both IFT20 and TTC21a are located at peripheral areas of IFT-B and IFT-A (PMID: 32456460), and are not core components of IFT-A or IFT-B. It is still possible that these two proteins interact with each other. Actually, Liu et al. have revealed interaction between Ift20 and Ttc21a in human sperm (PMID: 30929735). Additionally, to mediate trafficking of ciliary axonemal components, the IFT machinery is recruited to the distal appendages (PMID: 30601682), which is adjacent to the distal end of the (mother) centriole wall, where at the (mother) centriole wall was reported to be located (PMID:35543806). Cep78 may interact with Ift20 and Ttc21a at centriole during cilliogenesis.s

      To rule out the nonspecific interaction between Cep78 and Ttc21a or Ift20, we added additional negative controls of Gapdh (Figure 7D) and Ap80-NB-HA (Supplementary Figures S7A-C) in co-IP as the reviewer suggested, and found that the interaction between Cep78 and Ttc21a or Ift20 is specific. To examine if Cep78, Ift20 and Ttc21a formed a complex, we fractionated testicular protein complexes using size exclusion chromatography, and found that Cep78, Ift20 and Ttc21a co-fractioned at the size between158 kDa to 670 kDa (Figure 7E), supporting the formation of a trimeric complex. And our immunofluorescent analysis by SIM also showed co-localization between Cep78 and Ift20 or Ttc21a (Figure 7F). All these data support the interaction among Cep78, Ttc21a and Ift20. In the revised manuscript, we rephrased “direct interaction” as “interaction” at page 18, line 393 in the revised manuscript.

      3) In Figure 7D, the input blots show similar levels of TTC21A and IFT20 in control and Cep78-/- mouse testicular tissue. This is in contrast to panels E-G in the same figure where TTC21A and IFT20 levels look reduced in the mutant. Please explain this discrepancy.

      Thank you for pointing this out. Deletion of Cep78 caused down-regulation of Ttc21a and Ift20 proteins. To better reveal the change of interaction between Ttc21a and Ift20, we have to normalize their interaction against expression levels. To achieve this, we increased the amount of total Cep78-/- testicular proteins to ensure that Ttc21a and Ift20 in the input are at similar levels between Cep78+/- and Cep78-/- testes. Using 3 times the amount of the Cep78+/- testicular proteins for Cep78-/- testicular proteins, we detected similar protein levels of Ttc21a and Ift20 between Cep78-/- and Cep78+/- testes, and the interaction between Ttc21a and Ift20 was shown to be down-regulated after Cep78 deletion. Consistently, the analysis of GAPDH as a loading control in input proteins showed more Cep78-/- testicular proteins than Cep78+/- testicular proteins subjected to analysis. To avoid confusion, we have added description of “The amount of Cep78-/- testicular proteins used was 3 times of that of Cep78+/- proteins” in the legend of Figure 7D in the revised version of manuscript.

      4) The efficiency of the siRNA knockdown shown in 7J-M was only assessed by qPCR (Figure S4), but this does not necessarily mean the corresponding proteins were depleted. Western blot analysis needs to be performed to show depletion at the protein level. Furthermore, it would be desirable with rescue experiments to validate the specificity of the siRNAs used.

      Thank the reviewer for the suggestion. To validate the specificity of the siRNAs used, we performed rescue experiments using rescue plasmid with siRNA targeting sequence synonymously mutated (Supplementary Table S6). The efficiency of siRNA knockdown and effects of rescue experiments were evaluated by both qPCR (Supplementary Figures S4.A-C) and Western Blot (Figures 7.J-K, Supplementary Figures S4.D-E, H-I). The results showed that siRNAs significantly reduced the expression of Cep78, Ift20, and Ttc21a at both mRNA (Supplementary Figures S4.A-C) and protein levels (Figures 7.J-K, Supplementary Figure S4.A-C). Meanwhile, with siRNA treatment, the rescue plasmids rescued the expression of Cep78, Ift20, and Ttc21a at both mRNA (Supplementary Figures S4.A-C) and protein levels (Figures 7.J-K, Supplementary Figures S4.D-E, H-I) compared with the control groups.

      In the rescue experiments, we further evaluated whether the effects are specific for Cep78, Ift20 and Ttc21siRNAs in the regulation of cilia and centriole lengths. The results showed that suppression of cilia and centriole lengths by Cep78, Ift20 and Ttc21siRNAs could be rescued by overexpression of rescue plasmids of Cep78syn-HA, Ift20syn-Flag and Ttc21asyn-Flag (Figures 7.N-S).

      5) Figure 7I: the resolution of the IFM is not very high and certainly not sufficient to demonstrate that CEP78, IFT20 and TTC21A co-localize to the same region on the centrosome, which one would have expected if they directly interact.

      Thank the reviewer for the constructive comments. To better demonstrate co-localization of CEP78, IFT20 and TTC21A on the centrosome, we overexpressed Cep78-Halo, Ift20-mCherry and Ttc21a-mEmerald in NIH3T3 cells by lentivirus, and acquired super-resolution images with SIM (N-sim, Nikon, Tokyo, Japan). The SIM results showed that Ift20 and Ttc21a co-localized with Cep78 (Figure 7F). Cep78 was previously reported to localize at the centriole (Goncalves et al., 2021). The co-localization of Cep78, Ift20 and Ttc21a indicated possible important roles of Cep78 in the regulation of Ift20 and Ttc21a in centriole. Our interaction analysis revealed that Cep78 interacted with Ift20 and Ttc21a (Figure 7A-C, Supplementary Figure S7), and formed a complex with Ift20 and Ttc21a (Figure 7E). Loss of Cep78 down-regulated the expression of and interaction between Ift20 and Ttc21a (Figures 7D, G-M).

      6) It is not really clear what information the authors seek to obtain from the global proteomic analysis of elongating spermatids shown in Figure 3N, O and Tables S2 and S3. Also, in Table S2, why are the numbers for CEP78 in columns P, Q and R so high when Cep78 is knocked out in these spermatid lysates? Please clarify.

      Thank the reviewer for the comments. Our global proteomic analysis showed that majority of differentially expressed proteins were down-regulated (Figure 3N), and many proteins are centrosome- and cilia-related proteins and important for sperm flagella and acrosome structures (Figure 3O), which provide insights of downstream molecular events in sperm flagella and acrosome defects after Cep78 deletion.

      As to the quantification of CEP78 expression in TMT-based proteomics analysis, the ratio between Cep78-/- and Cep78+/- is relatively high due to the ratio compression effect, a well-known phenomenon in TMT-based proteomics analysis (PMID: 25337643). The actual difference in protein expression is usually higher than the ratio calculated by TMT signals. Actually, our Western blot analysis of CEP78 protein showed absence of expression in Cep78-/- testis. Although TMT labelling has the disadvantage of ratio compression (PMID: 32040177,PMID: 23969891), it is widely used quantitative proteomics analysis, and is demonstrated to be able to identify key pathways and proteins (PMID: 30683861, 33980814).

      7) Figure 1F and Figure 4K: the data needs to be quantified.

      Thank the reviewer for this suggestion. For Figure 4K, we stained Cep78+/- and Cep78-/- spermatids with anti-Centrin 1 to measure the centriole length. The statistical data of centriole length were provided (Figure 4L), showing significantly increased centriole lengths in Cep78-/-spermatids.

      For Figure 1F, we quantified the immunofluorescence intensities of cone arrestin of light-adapted retinas of Cep78+/- and Cep78-/- mice at 3-month. The results indicate that immunofluorescence intensity of the cone arrestin was lower in Cep78-/- mice.

      8) Figure 2A: It is difficult to see a difference in connecting cilium length in control and Cep78-/- mutant retinas based on the images shown here.

      Thank you for your suggestion, we have stained retinal cryosections from Cep78+/- and Cep78-/- mice with anti-Nphp1 to visualize connecting cilium, and the data are provided in the revised Figure 2A-B.

      Reviewer #2 (Public Review):

      In this report, the authors have described the generation and characteristics of Cep78 mutant mice. Consistent with the phenotype observed in patients carrying the mutations in CEP78, Cep78 knock-out mice show degeneration in photoreceptors cells as well as defects in sperm. The author further shows the CEP78 protein can interact with IFT120 and TTC21a. Mutation in CEP78 results in a reduction of protein level of IFT120 and TTC21A and mislocalization of these two proteins, offering mechanistic insights into the sperm defects. Over all the manuscript is well written and easy to follow. Phenotyping is thorough. However, improvement of the background section is needed. In addition, some of the conclusion is not sufficiently supported by the data, warranting further analysis and/or additional experiments. The Cep78 KO mice model established by the author will be a useful model for further elucidating the disease mechanism in human and developing potential therapy.

      My comments are the following:

      1) Introduction. The statement that "CRD usually exists with combination of immotile cilia defects in other systems" is not correct. CRD due to ciliopathy can have cilia-related syndromic defects in other systems but it is a relatively small portion of all CRDs and the most frequently mutated genes are not cilia-related genes, such as ABCA4, GUCY2D, CRX.

      Thank the reviewer for the comments. We agree with the reviewer that only a small portion of CRDs are due to cilia defects and can have cilia-related syndromic defects in other systems. We corrected this statement in Line 4, Page 77-79 of the revised version of our manuscript. In our revised version, the statement has been changed to “A small portion of CRDs are due to retina cilia defects, and they may have cilia-related syndromic defects in other systems[1].”

      2) Introduction: Page 4 CNGB1 encodes channel protein and not a cilia gene. It should be removed since it does not fit.

      Thank the reviewer for the comment. According to the reviewer’s suggestion, we removed the description of “mutations in CNGB1 cause CRD and anosmia [3]” at Page 4, Line 81 in the revised manuscript.

      3) Page 5, given the previous report of CEP78 patients with retina degeneration, hearing loss, and reduced infertility, the statement of "we report CE79 as a NEW causative gene for a distinct syndrome...TWO phenotypes....." Is not accurate.

      Thank the reviewer for the comments. We have removed the statement of “NEW” causative gene in Page 5, Line 104 of the revised version of our manuscript. The revised sentence is “In this study, based on results of a male patient carrying CEP78 mutation and Cep78 gene knockout mice, we report CEP78 as a causative gene for CRD and male sterility.”

      4) Figure 1F, the OS of the cone seems shorter, which might be the reason for weaker arrestin staining in the mutant compared to the heterozygous. Also, it would be better to quantify the staining to substantiate the statement.

      Thanks for this suggestion. For Figure 1F, we have quantified the immunofluorescence intensity of cone arrestin in Cep78+/- and Cep78-/- light-adapted retinas at 3-month. The results indicate that immunofluorescence intensity of the cone arrestin was significantly lower in Cep78-/- mice.

      5) Figure 1K, panel with lower magnification would be useful to get a better sense of the overall structure defect of the retina. Is the defect observed in the cone as well?

      Thank the reviewer for the comment. As suggested by the reviewer, we have provided images of lower magnification to show the overall structure by TEM, showing disruption of most outer segment in Cep78-/- retina. It is difficult to distinguish whether the disordered outer segment structure belongs to a cone or a rod cell. The images are now provided as Figure 1L in the revised manuscript.

      We observed the abnormality of photopic b-wave amplitudes (Figure 1B, E) and decreased intensity of cone arrestin in light-adapted retinas (Figure 1F, G) in Cep78-/- mice, which indicate that the function of cone cells is damaged.

      6) Figure 2A, NPHP1 or other markers specifically label CC would be more useful to quantify the length of CC. Also need to provide a notation for the red arrows in Figure 2. In addition, the shape of CC in the mutant seems differ significantly from the control. It seems disorganized and swollen.

      Thank the reviewer for the suggestion. According to the reviewer’s suggestion, we have stained anti-Nphp1 in retinal cryosections from Cep78+/- and Cep78-/- mice to visualize connecting cilium, and quantified the length of CC. The results showed that connecting cilia were shorter in Cep78-/- mice. These data are showed in Figure 2A-B.

      Besides, we observed that upper parts of connecting cilia were swelled with disorganized microtubules in TEM (Figure 2E-G). The red arrows in Figure 2E-G indicated swelled upper part of connecting cilia and disorganized microtubules of Cep78-/- photphoreceptors, we added this description in the figure legend.

      7) Evidence provided can only indicate direct interaction among CEP78/IFT20/TTC21A.

      Thanks for the comment. To further validate the interaction between Cep78 and Ttc21a or Ift20, we performed reciprocal co-IP between Cep78 and Ttc21a or Ift20 by overexpression (Figure 7A-C), and also added relevant negative control of Gapdh (Figure 7D) and Ap80-NB-HA (Supplementary Figures S7A-C) in co-IP as negative controls to avoid non-specific interaction. Besides, we provided evidence that Cep78, Ift20 and Ttc21a formed a complex, as they all co-fractioned in a testicular protein complex at the size between158 kDa to 670 kDa using size exclusion chromatography (Figure 7E). Additionally, we performed super-resolution analysis of immunofluorescent localizations, and observed co-localization between Cep78 and Ttc21a or Ift20 by SIM. With these data, we think that Cep78 interacts with Ttc21a and Ift20 and they form a complex. We rephrased “direct interaction” as “interaction” in the manuscript.

      Reviewer #3 (Public Review):

      Authors were aiming to bring a deeper understanding of CEP78 function in the development of cone-rod dystrophy as well as to demonstrate previously not reported phenotype of CEP78 role in male infertility.

      It is important to note, that the authors 're-examined' already earlier published human mutation, 10 bp deletion in CEP78 gene (Qing Fu et al., 10.1136/jmedgenet-2016-104166). This should be seen as an advantage since re-visiting an older study has allowed noting the phenotypes that were not reported in the first place, namely impairment of photoreceptor and flagellar structure and function. Authors have generated a new knockout mouse model with deleted Cep78 gene and allowed to convey the in-depth studies of Cep78 function and unleash interacting partners.

      The authors master classical histology techniques for tissue analysis, immunostaining, light, confocal microscopy. They also employed high-end technologies such as spectral domain optical coherence tomography system, electron, and scanning electron microscopy. They performed functional studies such as electroretinogram (ERG) to detect visual functions of Cep78-/- mice and quantitative mass spectrometry (MS) on elongating spermatids.

      The authors used elegant co-immunoprecipitation techniques to demonstrate trimer complex formation.

      Through the manuscript, images are clear and support the intended information and claims. Additionally, where possible, quantifications were provided. Sample number was sufficient and in most cases was n=6 (for mouse specimens).

      The authors could provide more details in the materials and methods section on how some experiments were conducted. Here are a few examples. (i) Authors have performed quantitative mass spectrometry (MS) on elongating spermatids lysates, however, did not present specifically how elongating spermatids were extracted. (ii) In the case of co-IPs authors should provide information on what number of cells (6 well-plate, 10 cm dish etc) were transfected and used for co-IPs. Furthermore, authors could more clearly articulate what were the novel discoveries and what confirmed earlier findings.

      The authors clearly demonstrate and present sufficient evidence to show CEP78/Cep78 importance for proper photoreceptor and flagellar function. Furthermore, they succeed in identifying trimer complex proteins which help to explain the mechanism of Cep78 function.

      The given study provides a rather detailed characterization of human and mouse phenotype in response to the CEP78/Cep78 deletion and possible mechanism causing it. CEP78 was already earlier associated with Cone-rod dystrophy and, this study provides a greater in-depth understanding of the mechanism underlying it. Importantly, scientists have generated a new knock-out mouse model that can be used for further studies or putative treatment-testing.

      CEP78/Cep78 deletion association with male infertility is not previously reported and brings additional value to this study. We know, from numerous studies, that-testes express multiple genes, some are unique to testes some are co-expressed in multiple tissues. However, very few genes are well studied and have clinical significance. Studies like this, combining patient and animal model research, allow to identify and assign function to poorly characterized or yet unstudied genes. This enables data to use in basic research, patient diagnostics and treatment choices.

      We would like to thank Reviewer #3 (Public Review) for positive comments on our work.

      As to the suggestions to provide some details in the materials and methods by the reviewer, we added the description of STA-PUT method for spermatids purification at Page 34, Line 729-741 in the revised manuscript, the amount of cells used for co-IPs “10 cm dish HEK293T were transfected (Vazyme, Nanjing, China) wit 5μg plasmid for each experimental group.” at Page 36, Line 783-784 in the revised manuscript.

      We also highlighted our new discovery and ensured that all previous published findings are accompanied by references, we added “We further explored whether c.1629-2A>G mutation in this previously visited patient would disturb CEP78 protein expression and male fertility. Blood sample was collected from this patient and an unaffected control for protein extraction.” at Page 17, Line 335. We also added “The major findings of our study are as follows: we found CEP78 as the causal gene of CRD with male infertility and multiple morphological abnormalities of the sperm flagella using Cep78-/- mice. A male patient carrying CEP78 c.1629-2A>G mutation, whom we previously reported to have CRD [8], was found to have male infertility and MMAF in this study. Cep78 formed a trimer with sperm flagella formation enssential proteins IFT20 and TTC21A (Figure 8), which are essential for sperm flagella formation[16, 18]. Cep78 played an important role in the interaction and stability of the trimer proteins, which regulate flagella formation and centriole length in spermiogenesis. ” at the first paragraph of discussion, which is Page 21, Line 447-456 of our revised manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      The idea that a passive living being can improve the wind dispersal of its seeds by passively changing their drag is enticing. The manuscript shows that high wind events in Scotland are inversely correlated with the ambient humidity. The dandelion pappus morphs with the ambient humidity, being more open in dry conditions, which is associated with stronger wind events. This passive morphing of the shape of the pappi thus leads to a dispersal of the seeds further away from their origin.

      The analysis and discussion in the paper is focused on "distance", i.e., how far the pappus will fly. Could the notion of time be relevant too? In wet conditions, perhaps it's better for a seed to hit the ground quickly and start germinating, whereas if its dry, staying up in the air for longer to travel farther might be a better strategy.

      This is an interesting point; however, we think that flight time is likely to be less relevant to the dispersal outcomes. This is because seeds mostly remain attached to the parent plant in wet conditions so will not fly at all and therefore will not begin germination. When they do disperse, flight time will generally be only a few seconds for the majority of seeds whether they are wet or dry, and the timescale of wet weather is generally much longer (typically hours).

    1. Author Response

      Reviewer #1 (Public Review):

      This excellent manuscript challenged the premise that NF-kappaB and its upstream kinase IKKbeta play a role in muscle atrophy following tenotomy. Two animal models were used - one leading to enhanced muscle-specific NF-kappaB activation and the other a muscle-specific deletion. In both models, there was no significant relationship to observed muscle changes following tenotomy. Overall this work is significant in that it challenges the existing dogma that NF-kappaB plays a crucial role in muscle atrophy.

      Surprisingly the authors noted that there were basal differences observed in the phenotypes of their models that were sex-dependent. They note that male mice lose more muscle mass after tenotomy and specifically type 2b fiber loss.

      Overall this is an outstanding study that challenges the notion that NF-kappaB inhibitors are likely to improve muscle outcomes following injuries such as rotator cuff tears. Its main weakness is that there were no pharmacological arms of investigation; this fails to definitively exclude the hypothesis that inhibition may exert some effect in healing, perhaps in surrounding non-muscle matrix tissue that in turn may assist in healing.

      Thank you for your careful and thoughtful review. We agree that the finding that NFkb is not driving tenotomy-induced atrophy is both surprising and interesting. We look forward to further uncovering the atrophic mechanisms responsible. We also agree that an investigation using pharmacological NFkb inhibitors will improve our understanding of the full scope of the role of NFkb in the tenotomy pathology. As you and another reviewer note, this work has only blocked NFkb signaling in the mature muscle fiber and thus cannot assess the role of NFkb in satellite cell, fibroblast, immune cell activation etc in the healing response. However, we avoided using these inhibitors in this study due to the potential for these systemic effects to obscure the role of NFkb in the muscle fiber. While we believe that a pharmacological investigation is beyond the scope of this study, it will make an excellent follow on investigation.

      Reviewer #2 (Public Review):

      The primary strength of this paper is a rigorous approach to 'negative' data. Did the authors definitively prove that NF-kB has no role in the tenotomy-induced atrophy? Probably not entirely, since there are limitations of the mouse model and the knockdown mice. There cannot be complete elimination of load since mice heal with some scar tissue, and the knockdown is not complete elimination. However, even with these limitations, this presents important findings that tenotomy, which induces mechanical unloading of the muscle-tendon unit, provides a unique biomechanical environment for the muscle to undergo atrophy, which warrants a more in-depth look given that these injuries are unique and extremely common. It must be mentioned that the results are entirely supported by their data and that even though the model is not 'perfect' it truly supports that NF-kB has a limited role in atrophy. The sex-mediated differences based on autophagy are a secondary hypothesis and are interesting but possibly less clinically relevant based on the differences shown.

      We appreciate your thoughts on the “negative” data in this study. A manuscript in which the data refute your hypothesis and that of the field is difficult to write. There is a higher burden of validation and closer scrutiny of limitations. We agree that the model does have some limitations, but overall strongly supports a limited role for NBkb in tenotomy-induced muscle atrophy.

      The important next step for this group and others is to evaluate the 'how and why' of tenotomy atrophy if not through NF-kB. Is it that there are many redundant processes that the muscle may have to circumnavigate the NF-kB pathway given that it is so ubiquitous that the authors didn't see a difference? Could it be differences in axial vs appendicular muscle? Or should there be a closer look at the mechanosensors in the muscle cells to determine if there are other key drivers of atrophy? Regardless, this paper shows that tenotomy-induced muscle atrophy is unique and supports the conclusion that muscle has many ways to atrophy based on the injury it undergoes.

      We agree that the major next step for this work is to investigate the mechanism(s) responsible for tenotomy-induced atrophy. Autophagy in particular needs a more thorough investigation using autophagic inhibitors in naive wildtype mice to investigate its role in the sex-specificity of tenotomy-induced atrophy. The question of axial vs. appendicular muscle is intriguing. There could also be an upper vs. lower body difference that is worth exploring in future work.

      Reviewer #3Public Review):

      The authors provided thorough analyses of muscle morphology, biochemistry, and function, which is a major strength of the study. However, there are some key confounding variables authors failed to address. For example, the difference in the estrous cycle in female animals was not controlled. The study could have been significantly improved by controlling sex hormone levels or at least testing differences in response to injury.

      We appreciate your careful and insightful review of our work. We designed this study to assess the role of myofiber NFkb in tenotomy-induced atrophy, which led us to a rigorous assessment of morphology, biochemistry and function, which we agree is the strength of the study. We also agree that a major limitation of this study is that the secondary observations of sex-specificity and autophagic signaling are not as well controlled or supported. This is because these observations were made at the end of the study when the histological analyses were completed by the blinded rater. The sex-specificity in the basophilic puncta that the rater observed sparked us to reconsider the sex-specificity in our other data and to stain for autophagic vesicles. As you suggest, to rigorously assess sex-specificity it would be good to control of estrous cycle and analysis of sex hormones which would require initiation of another study, planning for these variables in advance. We think this is beyond the scope of the current question of the role of NFkb in tenotomy-induced atrophy but think it should be undertaken as a follow on to eliminate confounding variables of genetic manipulation and tamoxifen treatment.

      However, since we still need to report the sex specificity we observed while ensuring that our findings are not misconstrued, we reviewed the language in the manuscript to emphasize that these are retrospective observations that require further investigation. We have also added discussion of these variables and their potential influence on the results to the Discussion.

      Discussion: “Additionally, it is important to note that estrous cycle was not controlled in these mice and sex hormone levels weren’t measured in this study. These preliminary observations, though intriguing, will require more rigorous follow up evaluations to define the interaction between sex, tenotomy, and autophagy in naïve wildtype mice.”

      Furthermore, more data are needed to link NFkB signaling and autophagy to make any kind of conclusions. Overall, in the current form of the manuscript, the presented data seem underdeveloped, and the addition of more supporting data could significantly improve the quality of the manuscript and enhance our understanding of NFkB signaling and muscle wasting in rotator cuff injury.

      We agree that more data are needed to complete the picture of autophagy in tenotomy-induced muscle atrophy. The p62 and LC3 positive intracellular puncta in male tenotomized muscle are distinctive, but only limited conclusions can be drawn physiologically because 1) they are only present in a fraction of fibers and 2) it is impossible to tell whether they result from increased autophagic flux or altered vesicle processing. Western blot for LC3 (and now p62) indicates only small changes in total protein, but since these proteins are synthesized and degraded during active autophagy, it is possible for their levels to remain constant while flux increases. Direct measures of autophagic flux would require treating mice with an autophagosome block which would require initiation of another study. However, we agree with the reviewer that we can add some additional measures to better characterize the instantaneous state.

      We have added analysis of p62 protein expression to LC3 since p62 protein content in muscle can be decoupled from LC3 (PMID: 27493873). We also added expression data for genes involved in autophagy (Lc3b, Gabarapl1, Becn1, Bnip3, and Atg5). Finally, we have commented on the limitations of our data in the Discussion.

      Discussion: “Evidence for autophagy regulating tenotomy-induced atrophy has been mounting over recent years (Bialek et al., 2011; Gumucio et al., 2012; Joshi et al., 2014; Ning et al., 2015; Hirunsai & Srikuea, 2021). The evidence presented here supports this contention, but we find surprisingly small effect sizes for all markers investigated. This could be because we are not directly assessing autophagic flux and so are missing some temporal dynamics since synthesis and degradation are ongoing simultaneously.”

    1. Author Response

      Reviewer #1 (Public Review):

      The authors have generated a set of seven nanobody tools against two of the largest Drosophila proteins, which are related to vertebrate titin and essential for muscle function. The study of such gigantic proteins is a challenge. They show that each of these nanobodies recognizes their epitope with high affinity (as expected from antibodies), fails to generate a signal after immune-fixation of a mutant for the cognate protein, do not cross-react with each other, and generates a signal in the muscle that makes sense with what one would anticipate for fly titin homologs. In addition, they show that these nanobodies have better penetration and labeling efficiency than conventional antibodies in thick tissues after classical paraformaldehyde fixation. Using these nanobodies, they could deduce the organization of the epitopes in different muscle types and propose a model for Sallimus and Projectin arrangement in muscles, including in larvae which are difficult to label with traditional antibodies due to their impermeable chitin skeleton. Finally, they could fuse the gene encoding one of the nanobodies to the open reading frame of NeonGreen and express the corresponding fusion protein in animals to use the probe in FRAP assays.

      The work is very well performed and convincing. However, given its significant redundancy in terms of biological conclusions with the companion study "Nanobodies combined with DNAPAINT super-resolution reveal a staggered titin nano-architecture in flight muscles" by the same authors, and other published papers, I recommend the authors further prove the use of their nanobodies in live assays. In particular, the authors should test whether they can use the nanobodies to induce protein degradation either permanently or conditionally.

      Thanks for this nice summary of our findings. We have now extended the analysis of the Nanobody-NeonGreen fusion expressing larval muscles and provide first proof of principle analysis of new fly strains that we generated that contain Sls-Nano2 or Sls-Nano42 nanobodies fused to a degradation signal. These induce lethality of the animals suggesting that Sls protein is partially non functional. We verified this by providing quantitative stainings of various Sls epitopes in these muscles suggesting that Sls is not fully degraded but rather partially modified in the Sls-Nano-deGrad expressing muscle fibers. These will be interesting tools to study Sls function during sarcomere homeostasis.

      Reviewer #2 (Public Review):

      The data presented in this manuscript are sound but rather descriptive. The contribution - as presented - is mostly of a technical nature. The authors correctly state that anti-GFP nanobodies, while used extensively across many model organisms, have limited utility for in vivo applications when the GFP-tagged protein in question displays abnormal behavior or is non-functional. The creation of nanobodies that are uniquely specific for the protein(s) of interest is therefore a significant improvement, especially since the Sallimus and Projectinspecific reagents reported here react with PFA-fixed material. At least one of these nanobodies, when expressed in vivo, decorates the appropriate target. The source of antigens used for the construction of the nanobody library is Drosophila-derived. The extent of homology of Drosophila Sallimus and Projectin with related proteins in other species is not discussed. Whether the nanobodies reported here would be useful in other (closely related?) species, therefore, remains to be established. For those studying muscle biology in Drosophila, the nanobodies described here will be publicly available as cDNAs. Ease of production implies a readily shared and standardized resource for the field.

      We thank this reviewer for appreciating that our Sallimus and Projectin nanobodies are useful. We now have extended the collection even further, including anti-Obscurin, αActinin and Zasp52 nanobodies, the latter two will also be useful for researcher studying other tissues, in particular Drosophila epithelial tissues. As always in the Drosophila field, all the here generated fly strains and plasmids will be made easily available to the community by placing them in stock centers or shipping them to the laboratories directly. As indicated, also the plasmids will be deposited at Addgene.

      Further characterization of these nanobodies by biochemical methods such as immunoblotting would be challenging, given the size of the target proteins. In view of the technical nature of this manuscript, the authors should perhaps critically discuss the distinction between bulky GFP tags versus the much smaller epitope tags and the nanobodies that recognize them, although this was covered in a recent eLife paper from the Perrimon lab. Insertion of small tags, in conjunction with nanobodies that recognize them, would be less perturbing than the much bulkier GFP tag and lend itself to genome-wide applications. Creating nanobodies uniquely specific for each protein encoded in the Drosophila genome is not realistic, and the targeted approach deployed here is obviously valuable.

      We are discussing the drawbacks of solely relying on GFP nanobodies, which requires GFP tagged proteins to be available and being functional. In particular for the sarcomeric proteins this is often not the case. We also cite the Perrimon paper, which was just published as we prepared this manuscript. We would like to point out to this reviewer that even tagging with a small epitope tag is considerable work in Drosophila and that the Perrimon paper, on which this reviewer is an author, does describe only two endogenously tagged genes with a nanotag (histone H2Av and Dilp2) the other genes described were expressed from a UAS source or in cell culture. We show here 22 nanobodies against 11 target epitopes.

      Nanobodies recognise typically folded epitopes and are rather unlikely to work in immunoblotting.

      The authors apply two different approaches to characterize the newly generated Nanobodies: more or less conventional immunohistochemistry with fluorescently labeled nanobodies, and in vivo expression of nanobodies fused to the fluorescent neongreen protein. The superiority of nanobodies in terms of tissue penetration has been shown by others in a direct comparison of intact fluorescently labeled immunoglobulins versus nanobodies. The authors state that in vivo labeling with nanobody fusions "thus far was done only with nanobodies against GFP, mCherry or short epitope tags." There is no fundamental difference between these recognition events and what the authors report for their Sallimus and Projectin-specific reagents. The section that starts at line 304 is thus a little bit of a 'straw man'. There is no reason to assume that a nanobody that recognizes a muscle protein would behave differently than a nanobody that would recognize that same protein (or another) when epitope- or GFP-tagged. What might be interesting is to examine the behavior of these muscle-specific nanobodies in the course of muscle contraction/relaxation: are there conformational alterations that promote dissociation of bound nanobodies? Do different nanobodies display discrete behavior in this regard? The manuscript is silent on how muscles behave in live L3 larvae. The FRAP experiment seems to suggest that not much is happening, but the text refers to the contraction of larval sarcomeres from 8.5 µM to 4.5 µM. Does the in vivo expressed nanobody remain stably bound during this contraction/relaxation cycle? What about the other nanobodies reported in this manuscript? Since the larval motion was reduced by exposure to diethylether, have the authors considered imaging the contractive cycle in the absence of such exposure?

      We appreciate the expert knowledge about nanobodies by this reviewer. However, nanobodies were not extensively applied in Drosophila tissues. Hence, we believe it is important to characterise their penetration in stainings and compare them carefully to antibodies. Such, the Drosophila reader will be aware of their advantages.

      We have now also included more data on the larval muscle morphology in the nanobody expressing muscles. Their morphology is normal. As larvae move around extensively all the time, the binding of the nanobodies to the target must be stable, otherwise it would not be bound when we fix them or anesthetize them. However, we have not attempted to image them at high resolution while crawling freely. From quantifying the crawling speed (about 1.5 mm per second, see Figure 9 S1) we hope this reviewer appreciates that high resolution imaging of sarcomeres in freely crawling larvae is highly non trivial.

      Given that the nanobodies bind well-folded epitopes with low picomolar dissociations constants, it is hard to imagine that conformational changes of the target would dissociate them. The nanobody would stabilise the recognised conformation by a ΔG of ≈60 KJ/ mole, and we would not expect that the chosen domains undergo major conformational changes.

      Reviewer #3 (Public Review):

      Loreau et al. have presented a well-written manuscript reporting clever, original work taking advantage of fairly new biotechnology - the generation and use of single chain antibodies called nanobodies. The authors demonstrate the production of multiple nanobodies to two titin homologs in Drosophila and use these nanobodies to localize these proteins in several fly muscle types and discover interesting aspects of the localization and span of these elongated proteins in the muscle sarcomere. They also demonstrate that one of these single chain antibodies can be expressed in muscle fused to a fluorescent protein to image the localization of a segment of one of these giant proteins called Sallimus in muscle in a live fly. Their project is well-justified given the limitations of the usual approaches for localizing and studying the dynamics of proteins in the muscle of model organisms such as the possibility that GFP tagging of a protein will interfere with its localization or function, and poor penetration of large IgG or IgM antibodies into densly packed structures like the sarcomere after fixation as compared to smaller nanbodies.

      They achieved their goals consistent with the known/expected properties of nanobodies: (1) They demonstrate that at least one of their nanobodies binds with very high affinity. (2) They bind with high specificity. (3) The nanobodies show much better penetration of fixed stage 17 embryos than do conventional antibodies.

      They use their nanobodies mostly generated to the N- and C-terminal ends of Sallimus and Projectin to learn new information about how these elongated proteins span and are oriented in the sarcomere. For example, in examining larval muscles which have long sarcomeres (8.5 microns), using nanobodies to domains located near the N- and C-termini, they show definitively that the predicted 2.1 MDa protein Sallimus spans the entire I-band and extends a bit into the A-band with its N-terminus embedded in the Z-disk and C-terminus in the outer edge of the A-band. Using a similar approach they also show that the 800 kDa Projectin decorates the entire myosin thick filament except for the H-zone and M-line in a polar orientation. Their final experiment is most exciting! They were able to express in fly larval muscles a nanobody directed to near the N-terminus of Sallimus fused to NeonGreen and show that it localizes to Z-disks in living larvae, and by FRAP experiments demonstrate that the binding of this nanobody to Sallimus in vivo is very stable. This opens the door to using a similar approach to study the assembly, dynamics, and even conformational changes of a protein in a complex in a live animal in real time.

      We thank this reviewer for appreciating the quality and impact of our approach and the our obtained results.

      There are only a few minor weaknesses about their conclusions: (1) They should note that in fact their estimate of the span of Sallimus could be an underestimate since their Nano2 nanobody is directed to Ig13/14 so if all of these 12 Ig domains N-terminal of their epitope were unwound it would add 12 X 30 nm = 360 nm of length, and even if unwound would add about 50 nm of length.

      We are discussing the length contribution of the 12 Ig domains now more extensively in the DNA PAINT super-resolution paper, however not in this resource paper as the 50 nm difference was not resolved with the confocal microscopy applied here to the larval muscle sarcomere.

      (2) They discuss how Sallimus and Projectin are the two Drosophila homologs of mammalian titin, however, they ignore the fact that there is more similarity between Sallimus and Projectin to muscle proteins in invertebrates. For example, in C. elegans, TTN-1 is the counterpart of Sallimus, and twitchin is the counterpart of Projectin, both in size and domain organization. The authors present definitive data to support Figure 9, their nice model for a fly larval sarcomere but fail to point out that this model likely pertains to C. elegans and other invertebrates. In Forbes et al. (2010) it was shown that TTN-1, which can be detected by western blot as ~2 MDa protein and using two polyclonal antibodies spans the entire Iband and extends into the outer edge of the A-band, very similar to what the authors here have shown, more elegantly for Sallimus. In addition, several studies have shown that twitchin (Projectin) does not extend into the M-line; the M-line is exclusively occupied by UNC-89, the homolog of Obscurin.

      We thank this reviewer for pointing out the important C. elegans literature that we have now included in this revised manuscript. We apologise for initially omitting them. They are indeed highly relevant.

      Reviewer #4 (Public Review):

      Authors report the generation and characterisation of several nanobodies for giant Drosophila sarcomeric proteins, Sallimus and Projectin the functional orthologs of titin. They describe an efficient pipeline that could potentially help in designing and producing nanobodies for other proteins. There are several advantages to using nanobodies in comparison to conventional antibodies and the authors nicely demonstrate that the generated nanobodies allow to precisely map subcellular localisation and even the protein orientation in the case of Projectin. They also show that small nanobody molecules have superior penetration and labelling efficiencies with respect to classical antibodies. Finally, the authors select one of the nanobodies to test whether it will efficiently detect native proteins in living tissue. They confirm that Sls-Nano2NeoGreen binds Sls in vivo in muscles of temporarily immobilized 3rd instar larva allowing to reveal sarcomeric Sls pattern and to demonstrate by FRAP experiments that Sls does not exchange during a short time period.

      This work is of significant value to a large audience. It provides a clear and precise pipeline for the generation of efficient nanobodies, which are invaluable tools of modern biology.

      We thank this reviewer for expressing strong support for our manuscript and appreciating its value for a large readership.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Chou-Zheng and Hatoum-Aslan follow up on their previous studies that have characterized the collaborations between the type III-A CRISPR-Cas10 Csm complex and various cellular housekeeping nucleases. The authors have previously demonstrated that the Csm complex associates with several nucleases that are implicated in RNA degradation via pulldown and mass spectrometry analysis. They also previously showed that some of these enzymes, including PNPase, are important for CRISPR RNA (crRNA) maturation and for robust anti-phage defense. They now show that a second housekeeping enzyme, RNase R, is required for crRNA maturation. PNPase and RNase R act in concert to produce the mature crRNA. The authors also analyze the interactions between Csm5 and both housekeeping proteins. Finally, they demonstrate that PNPase and RNase R are important for robust anti-plasmid activity when using crRNAs that are complementary to low-abundance transcripts.

      This is a well-written paper with clear figures and well-described experiments and results. The experiments in Figures 1 and 2 demonstrating the importance of RNase R for crRNA maturation are excellent. The biochemistry experiments in Figure 2 are especially convincing, in which the authors were able to reconstitute the concerted activities of RNase R and PNPase for crRNA biogenesis. The experiments in Figure 5 implicating PNPase and RNase R in robust anti-plasmid activity when targeting low-abundance transcripts are also clear and convincing, and the result is intriguing. Overall, these experiments provide a new example in a growing list of co-opted host proteins that are important for crRNA biogenesis and CRISPR-mediated defense.

      Thank you for your thoughtful review of our manuscript and comments overall!

      I do have some concerns about experiments in Figures 3 and 4 analyzing interactions between PNPase or RNase R and the Csm5 subunit of the Csm complex, and I believe that some of the authors' conclusions are not fully supported by the evidence presented in these experiments. These concerns, along with a question about their model, are detailed below.

      1) The authors used the structure of S. thermophilus Csm5 to guide their design of truncations to probe potential intrinsically disordered regions (IDR1 and IDR2) that may be sites of interaction with PNPase or RNase R. Since the authors submitted their manuscript, an AlphaFold predicted structure of the S. epidermidis Csm5 has been released on the AlphaFold Protein Structure Database. In this model, the IDR2 region is predicted by AlphaFold to be a beta strand at the center of a beta sheet, rather than a disordered region. If the prediction is accurate, deletion of this strand could cause Csm5 to misfold, making it difficult to interpret what causes loss of interaction with PNPase (i.e. deletion of a specific interaction surface versus misfolding of the overall tertiary structure). In light of this, the discussion surrounding these experiments should be altered to include more caveats about the truncations, and conclusions based on this experiment should be softened.

      While this manuscript was under review, several cryo-EM structures of the Cas10-Csm complex from S. epidermidis were solved and reported (Smith et al, 2022, Structure). In the unbound complex (PDB ID 7V02), IDR2 of Csm5 does indeed overlap with a short beta strand, but it is flanked by loops/unstructured regions. In addition, of the 46 residues that we deleted in the Csm546 mutant, 20 residues are unresolved in the experimentally-determined structure, supporting the notion that this region is generally flexible. Also, it is unlikely that this and the other Csm5 deletion mutants are misfolded because they all retain the ability to associate with the complex (Fig. 4B), and we were able to readily purify the mutant with the largest deletion (Csm546) without any issues (Fig. 5). To address this concern, we added panel D in Figure 4-figure supplement 1, which highlights the IDR regions in Csm5 from the recently-published S. epidermidis Cas10-Csm complex structure and integrated the observations mentioned above in the narrative (lines 241-247 in the marked-up revised manuscript). We also softened the conclusions based on these experiments (lines 276-278 in the marked-up revised manuscript): “Taken together, these results suggest that the IDR2 region of Csm5 likely plays a role in the recruitment and stimulation of PNPase, while the binding site for RNase R may reside elsewhere in Csm5”.

      2) The native gels testing interactions between Csm5 and RNase R show a slight change in mobility of RNase R upon the addition of Csm5. Although I agree with the authors' interpretation that this shift could be due to transient interactions between Csm5 and RNase R, it is also possible that the mobility of RNase R is affected simply based on the addition of a large excess of a second protein, even without a specific interaction between the two proteins. As a result, the evidence for direct interaction with Csm5 is limited. Discussion of how RNaseR is recruited by the Csm complex could contain more possible explanations. For example, it is possible that the interaction between RNase R and the Csm complex is mediated by another protein (e.g. PNPase could bridge interaction between the two) or that such an interaction could be stabilized by intermediate crRNA or target RNA binding by the Csm complex.

      Thank you for this comment. To help rule out the possibility that excess Csm5 could cause a shift of any protein nonspecifically, we included a control in the original manuscript in which the same native gel assay was performed with BSA and Csm5, and found that Csm5 fails to cause an upward shift in BSA (Figure 3-figure supplement 1). In addition, to bolster the claim of a direct interaction between Csm5 and RNase R, we performed an additional pulldown assay (Figure 3-figure supplement 2). Details are described under the essential revisions point number 3 above. Regarding the other possibilities mentioned, it is unlikely that PNPase is bridging the interaction with RNase R because when we delete PNPase from cells, we still get some maturation (Fig. 1E and Chou-Zheng and Hatoum-Aslan, eLife, 2019). Also, in the reconstituted system, RNase R can still perform some level of maturation on its own (Fig. 2D). These observations argue against the need for bridging interactions with PNPase. Furthermore, maturation occurs in the absence of target RNA, ruling out the possibility that target RNA bridging is necessary for RNase R-mediated crRNA maturation. However, we agree with the reviewer that it is possible that other components of the Cas10-Csm complex may help to recruit and stabilize the interaction with RNase R in vivo, and this possibility was already mentioned in the narrative in the original submission, although we did not explicitally state the intermediate crRNA as one such component (lines 213-215 and again in lines 413-416 in the marked-up revised manuscript). We have replaced “subunits” with “components” in line 415 to be more inclusive of this possibility. Since this is all still speculative, we opt not to elaborate further on this point in the current manuscript. Needless to say, we are actively pursuing other more quantitative assays to measure the interactions between Csm5 and PNPase/RNase R and hope to have such data available in a follow-up manuscript.

      3) On lines 367-391, the authors propose a model for how PNPase and RNase R may contribute to defense against foreign DNA through their recruitment by the Csm complex to the target transcript. However, their experiments do not test whether PNPase and RNase R must interact with the Csm complex to support anti-plasmid activity. Indeed, it may make more sense for free RNase R to be involved in defense, similar to how free activated Csm6 degrades transcripts non-specifically, rather than only cleaving transcripts in close proximity to the Csm complex. The authors could expand their discussion to mention the possibility that free RNase R or PNPase are acting in anti-plasmid defense.

      Thank you for this suggestion. The following statement has been added to the discussion (lines 393-395 in the marked-up revised manuscript): “Once recruited by the complex, PNPase and RNase R may degrade nucleic acids in the vicinity nonspecifically, similarly to Csm6.”

      Reviewer #2 (Public Review):

      This work follows up on an earlier publication that showed PNPase and RNase J2 play important roles in CRISPR RNA processing (doi: 10.7554/eLife.45393). Here, the authors show that RNase R also plays a critical role in CRISPR RNA maturation. In addition, they show that RNase R and PNPase are both recruited to the type III CRISPR complex (Cas10-Csm) via direct interactions with the Cmr5 subunit and that deletion of an intrinsically disordered region (IDR2) on Cmr5 selectively inhibits PNPase recruitment but not RNase R. The authors show unquantified stimulation of PNPase nuclease activity by Cmr5. Phage challenge assays are performed to test the impact of PNPase and RNase R deletion mutations on CRISPR-Cas mediated phage defense. Contrary to expectation, over-expression of the CRISPR system in cells that contain a deletion of PNPase and/or RNase R, maintain robust anti-phage immunity. The interpretation of this experiment is that RNase R and PNPase may be dispensable in an over-expression system that produces high (non-natural) concentrations of the Csm complex. They test this idea using a system that expresses the CRISPR-Cas components off of a chromosomally encoded locus (strain RP62a) and challenge these cells using a plasmid conjugation assay. In this iteration, deletion of PNPase has no impact on CRISPR performance, while deletion of RNase R "exhibited a moderate" attenuation of the immune response. In contrast, to either single gene deletion, the PNPase and RNase R double mutant showed a near complete loss of immunity.

      Overall, the paper provides convincing evidence that PNPase and RNase R are involved in crRNA processing, and that they are recruited to the type III complex via Cmr5. The work on RNase R is entirely new and the role of PNPase is expanded. The role of cellular RNases in CRISPR RNA biogenesis is important, though some of the results are subtle and some of the biochemistry would benefit from a more quantitative analysis.

      Thank you for your thorough assessment and comments overall.

    1. Author Response

      Reviewer #1 (Public Review):

      This is a well-executed study using cutting-edge proteomics analysis to characterize muscle tissue from a genetically diverse mouse population. The use of only females in the study is a serious limitation that the authors acknowledge. The statistical methods, including protein quantification, QTL mapping, and trait correlation analysis are appropriate and include corrections for multiple testing. One concern is that missense variants, if they occur in peptides used to quantify proteins, could lead to false-positive signatures of low abundance (see lines 123-127). The experimental validation and deep dive into UFMylation provide some confidence in the reliability of other associations that can be mined from these data. The authors have provided a web-based tool for exploring the data.

      We thank the reviewer for these very positive comments and for reviewing the manuscript.

      We agree the quantification of peptides containing missense variants could confound quantification at the protein level. This is an important consideration when there are only a few peptides identified for a specific protein. However, in our data the average number of peptides used to quantify the 14 proteins containing missense-associated pQTLs was ~68 peptides/protein (lowest was 5 peptides for FGB and highest 703 peptides for NEB).

      In the case of EPHX1, we quantified 15 peptides (Figure R1A). We identified a peptide adjacent to R338 spanning amino acids 339-347. As such, mutation of R338C would prevent trypsin from cleavage resulting in the missense peptide not being identified and may lead to false-positive signatures of low abundance as suggested by the reviewer. To investigate this, we re-quantified EPHX1 relative protein abundance with or without the peptide spanning 339-347 for each genotype (Figure R1B). This made little difference to protein quantification and EPHX1 abundance was still significantly lower following mutation of R338C (AA genotype). In fact, quantification at the peptide-level revealed 12 out of the remaining 14 peptides were also significantly lower in AA genotype (data not shown).

      Although we agree this a very important consideration, we are mindful of the length of the article and feel including these data would not significantly improve the manuscript. We therefore request to not include these data as it would detract from the main findings of the paper focused on phenotypic associations and validation of UFMylation as a regulator of muscle function.

      Figure 1R. (A) Identified peptides from EPHX1 mapped onto primary amino acid sequence highlighting the missense mutation induced by SNP rs32746574 that was associated to EPHX1 protein levels by pQTL analysis. (B) Relative quantification of EPHX1 between the two genotypes of SNP rs32746574 with and without the peptide neighboring the missense mutation (amino acids 339-347) (**p<0.001, students t-test)

    1. Author Response

      Reviewer #1 (Public Review):

      Current generative models of protein sequences such as Potts models, Variational autoencoders, or autoregressive models must be trained on MSA data from scratch. Therefore, they cannot learn common substitution or coevolution patterns shared between families, and require a substantial number of sequences, making them less suitable for small protein families (e.g., conserved only for eukaryotes or viruses). MSA transformers are promising alternatives as they can generalize across protein families, but there is no established method to generate samples from them. Here, Sgarbossa et al. propose a simple recursive sampling procedure based on iterative masking to generate novel sequences from an input MSA. The sampling method has three hyperparameters (masking frequency, sampling temperature, and the number of iterations) which are set by rigorous benchmarking. The authors compare their approach to bmDCA, and evaluate i) single sample quality metrics ii) sample diversity and similarity to native sequences iii) similarity between original and generated sequence distribution, and iv) phylogeny/topology in sequence space of the generated distribution.

      Strengths:

      • The proposed sampling approach is simple.

      • The computational benchmarking is thorough.

      • The code is well organized and looks easy to use.

      Weaknesses:

      • There is no experimental data to back up the methodology.

      • It is not clear whether the sampling hyperparameter used is optimal for all protein sizes.

      • I am unsure that the bmDCA baseline method was trained appropriately and that the sampling method was adequate for protein design purposes (regular sampling).

      • Quality assessment of predicted structures is incomplete.

      • The proposed metrics for evaluating the diversity of generated sequences are fairly technical.

      We respond to each of these points below, in the section titled "Recommendations for the authors", since these questions were asked by the reviewer in more detail there.

      Impact assessment: The claim that MSA Transformer could be useful for protein design is supported by the computational benchmark. This work will be useful for researchers interested in applying MSA-Transformer models for protein design

      We thank the reviewer for this encouraging assessment of our work, and for their very interesting suggestions which helped us improve our manuscript.

      Reviewer #2 (Public Review):

      The manuscript by Sgarbossa et al. proposes the use of a machine learning technique used in Language Models (LM) and adapted to protein sequences (PLM) as a means to generate synthetic sequences that retain functional properties contained in the original multiple sequence alignment (MSA) of natural sequences. This technique (or a similar one) called MSA Transformers is also a component of the supervised learning methodology Alphafold which has been successful in predicting protein structures and complexes of proteins. The premise of this study is that an iterative masking approach can be used as a sampling technique to create a diverse set of sequences that still preserve important properties of the original natural sequences. For example, such samples retain homology properties, score well in terms of retaining relevant pairwise or epistatic interactions, and produce "foldable" sequences when used as input for Alphafold and scored via its confidence metric pLDDT. In order to provide support for this claim, the authors compare against Direct Coupling Analysis (DCA), which is a global sequence modeling technique that has shown to be successful in many aspects of the structure and function of proteins and particularly in generating and sampling sequences analogous to the input MSA. Most importantly, DCA and its generative version bmDCA have been shown to produce functional sequences experimentally. The authors then establish that the properties of sequences of the MSA Transformer with iterative masking, have in general better scores in terms of homology, statistical energies, and pLDDT scores than the ones from bmDCA and have spectral, statistical and similarity properties more akin to the natural sequences than those from the bmDCA methodology, except for the reproduction of single and pairwise statistics. The sequences from the MSA Transformer, however, replicate better the three body statistics of the natural sequences. The authors conclude that MSA Transformers with iterative masking is a valid technique for sequence design and it is an important alternative to the use of DCA or de novo physics-based methods or supervised learning techniques.

      Given the success of the use of language models in machine learning and its contributions to the structure prediction of protein and complexes, I see this study as a required follow-up to the breadth of work of amino acid coevolution spearheaded by DCA methodologies. In general, I believe this is a useful and relevant study for the community and opens up several avenues for research connecting Transformers with unsupervised protein design. Although the study provides support for this technique to be potentially useful for protein design, I was not completely convinced that it will yield more transformative results than the ones using Potts models. The differences, although consistent across the study, seem to be within "the margin of error" compared to bmDCA.

      We thank the reviewer for this positive assessment of our work, and for their cogent remarks which helped us improve our manuscript.

      We agree that in the case of large protein families, the main message is that our sequence generation method based on MSA Transformer scores at least as well as bmDCA. Given that bmDCA has been experimentally validated as a generative model, we believe that this is a valuable result. Our revised manuscript makes this point stronger, by showing that our sequence generation method based on MSA Transformer yields sequences that score similarly to those generated by bmDCA at low sampling temperature, while retaining substantially more sequence diversity.

      In addition, following the reviewer's suggestion below, we now present results for smaller protein families, whose shallow MSAs make it difficult to accurately fit Potts models. These results are presented in a new section of Results, titled "Sequence generation by the iterative masking procedure is successful for small protein families", including the new Figure 3. As mentioned there, "Fig. 3 reports all four scores discussed above in the case of these 7 small families, listed in Table S1 (recall that the families considered so far were large, see Table 1). We observe that MSA-Transformer–generated sequences have similar HMMER scores and structural scores to natural sequences. MSA-Transformer–generated sequences also generally have better HMMER scores and structural scores than those generated by bmDCA with default parameters. While low-temperature bmDCA yields better statistical energy scores (as expected), and also gives HMMER scores and structural scores comparable to natural sequences, it in fact generates sequences that are almost exact copies of natural ones (see Fig. 3, bottom row). By contrast, MSA Transformer produces sequences that are quite different from natural ones, and have very good scores." This shows that our method not only performs as well as bmDCA for large families, but also has a broader scope, as it is less limited by MSA depth than bmDCA.

      I also have certain comments related to the use of these 3 metrics to analyze the performance of the sampling. On the one hand, HMMER which has had a great utility for Pfam and the community in general is a score that is not necessarily reflecting the global properties of the sequences. In other words, we might be using a simpler statistical model to evaluate the performance of two other models (MSA Transformers and bmDCA) which are richer and that capture more sequence dependencies than the hidden Markov model.

      We agree with the reviewer that HMMER scores are associated with simpler statistical models, which cannot fully represent the data. We nevertheless believe that these scores remain useful to assess homology. In the framework of our study, they show that the sequences we generate are deemed "good homologs" by HMMER - similarly to natural sequences that would be extracted from a database by this widely-used tool. This said, we agree with the reviewer that one should not overinterpret HMMER scores, and we have reduced our discussion of their correlations with Hamming distances to avoid giving too much importance to this point.

      Moreover, we now present new scores that give a more complete picture of the quality of our generated sequences:

      • Regarding structure, in addition to the AlphaFold pLDDT score, we now also report the RMSD between a reference experimental structure of the relevant family (see Table 1) and the AlphaFold structure predicted for each sequence studied. The results from the RMSD analysis corroborate those obtained with pLDDT and show that predicted structures are indeed similar to the native ones. These results are now discussed in the main text. We believe that this point strengthens our conclusions and we thank the reviewer for suggesting this analysis.

      • We also performed a retrospective validation using published experimental results. For chorismate mutase, a protein family which was experimentally studied in [Russ et al 2020] using bmDCA, we now report estimated relative enrichments for our generated sequences in Figure S8, in addition to our four usual scores now shown for this family in Figure S7. In addition, for protein families PF00595 and PF13354, we now report deep mutational scanning scores for our generated sequences in Figure S9. These results strengthen our conclusion that our sequence generation method based on MSA Transformer is highly promising.

      For the case of the statistical energy score, the authors decided to use a sampling temperature T=1, but the authors note that this temperature can be reduced, as it was done in the experimental paper, to produce sequences with better energies, therefore this metric can be easily improved by modifying the temperature. The authors mentioned that they did try to reduce the temperature and that they also improved their HMMER score, however, they decided against it because the pairwise statistics were affected. However, pairwise statistics was precisely the only factor where bmDCA seemed superior to the MSA transformer, so reducing it should be an acceptable trade-off in order to optimize the other two important metrics.

      We thank both reviewers for raising this very interesting point. As mentioned above in our response to the first reviewer, we have now performed a comprehensive comparison of our MSA Transformer-generated data not only to bmDCA-generated data at sampling temperature T=1 but also at lower sampling temperatures. We considered the two temperature values chosen in [Russ et al 2020], namely T=0.33 and T=0.66. For completeness, we also considered the two values of regularization strength λ from [Russ et al 2020] for these three temperatures, in the case of family PF00072, as reported in Table S5. Given the relatively small impact of λ observed there, we kept only one value of λ for each value of T in the rest of our manuscript namely, λ=0.01 for T=1 to match the parameters in [Figliuzzi et al 2018], and λ=0.001 for T=0.33 and T=0.66 as it gave slightly better scores in Table S5. Note that for our additional study of small protein families, we employed λ=0.01 throughout because it is better suited to small families. In particular, we now include results obtained for bmDCA at λ=0.001 and T=0.33 in all figures of the revised manuscript.

      Our general findings, which are discussed in the revised manuscript, are that decreasing T indeed improves the scores of bmDCA-generated sequences. However, the main improvement regards statistical energy (as expected from lowering T), while the improvements of other scores (HMMER score, and, more importantly, structural scores) are more modest. Even using T=0.33 for bmDCA, our MSA Transformer-generated sequences have similar or better scores compared to bmDCA-generated sequences, apart from statistical energy (see Figure 1 and Tables S2 and S3). Moreover, we find that decreasing T with bmDCA substantially decreases MSA diversity, while MSA Transformer-generated sequences do not suffer from such an issue (see Figure S1). In fact, at low T, bmDCA concentrates on local minima of the statistical energy landscape (see Figures 2, 5 and S5), resulting in low diversity.

      Overall, these new results confirm that our procedure for generating sequences using MSA Transformer is promising, featuring scores comparable with low-temperature bmDCA sequences and high diversity.

      Finally, the use of pLDDT could also present some biases, since Alphafold itself uses transformers, I wonder if this fact could lead to the fact that sequences obtained with transformers simply perform better by definition.

      We thank the reviewer for raising this intriguing point. It is true that MSA Transformer has an architecture that is very similar to that of the EvoFormer module of AlphaFold. However, AlphaFold couples the EvoFormer module to a structural module, and is trained in a supervised way to predict protein structure, which makes it significantly different from MSA Transformer.

      Nevertheless, we agree that the AlphaFold pLDDT score does not give a complete view of structure. As mentioned above, to improve this, in addition to pLDDT, we now also report the RMSD between a reference experimental structure of the relevant family (see Table 1) and the AlphaFold structure predicted for each sequence studied. The results from the RMSD analysis corroborate those obtained with pLDDT and show that predicted structures are indeed similar to the native ones. These results are now discussed in the main text.

      The authors should try to address all these concerns. My assessment is that these concerns do not demerit the relevance and how timely this study is, but I would like to see a more fair comparison of these metrics where more optimizations to bmDCA are made, e.g. lower T, to have a more accurate comparison of the methods, even if that is reflected in lower performance on pairwise statistics.

      We did our best to address all these points. We believe that the additions mentioned above have substantially improved our manuscript.

      My assessment is that this manuscript's main strength is in introducing a state-of-the-art technique that has already been extremely successful in the field of computer science and artificial intelligence into the field of amino acid coevolution. By adapting this technique and creating a sampling version that is compatible with other successful methodologies, this work will lead to many other studies dealing with function and the effects of sequence variation of biomolecules.

      Again, we thank the reviewer for their encouraging assessment.

    1. Author Response

      Reviewer #1 (Public Review):

      This fMRI study investigated how memories are updated after reinterpreting past events. Participants watched a movie and subsequently recalled individual scenes from that movie. Importantly, the movie ends with a twist that changes the interpretation of earlier scenes in the movie. One group of participants watched the movie with the twist at the end, one group did not get to see the twist, and a third group was already informed about this twist before watching the movie. Analyses compared the similarity of activity patterns to (encoded or recalled) events across participants within regions of the default mode network (DMN). The design allowed for multiple relevant comparisons, confirming the prediction that activity patterns in DMN regions reflect the (re)interpretation of the movie (during movie viewing and/or during recall).

      The study is well-designed and executed. The inclusion of multiple analyses involving distinct comparisons strengthens the evidence for the role of the DMN in memory updating.

      The following points may be relevant to consider:

      1) The cross-participant pattern analysis method used here is not standard, with such analyses typically done within participants (or across participants, but after aligning representational spaces). Considering individual variability in functional organization, the method is likely only sensitive to coarse-scale patterns (e.g., anterior vs posterior parts of an ROI). This is not necessarily a weakness but is relevant when interpreting the results.

      We agree with the reviewer that functional misalignment might have played against us here. We designed this study as a natural successor of our previous work in which we captured reliable and multimodal scene-specific cross-participant pattern similarity during encoding and recall in standard space. In this revised version, we provide further evidence on how scene content is captured and influences our results. Nonetheless, we agree with your comment and add the following section to the discussion to encourage considering this point while interpreting the results.

      "Moreover, our current method relies on averaging spatially-coarse activity patterns across subjects (and time points within an event). Future extensions of this work may benefit from using functional alignment methods (Haxby et al 2020, Chen et al 2015) to capture more fine-grained event representations which are shared across participants."

      2) Unlike previous work, analyses are not testing for scene-specific information. Rather, each scene is treated separately to establish between-group differences, and results are averaged across scenes. This raises the question of whether the patterns reflect scene-specific information or generic group differences. For example, knowing the twist may increase overall engagement, both when viewing the movie (spoiled group) and when recalling it (spoiled group + twist group). The DMN may be particularly sensitive to such differences in overall engagement.

      You have brought up great points. We addressed them in two ways: (1) We ran a univariate analysis in each DMN ROI to look at the role of overall regional-average response magnitude in our results. We did not observe a significant effect of group or an interaction between group and condition. (2) We ran a scene-specificity analysis in a new Results section entitled “The role of scene content” (Figure 4). This section is focused on comparing interaction index (Figure 2C), as an indicator of memory updating, under different manipulations. Interaction index reflects the reversal of neural similarity during encoding and recall. Our results suggest that we don’t see the same effects if we shuffle the scene labels and recompute the pattern similarity analyses. Please see added text and figures below:

      "To test whether our reported results were mainly driven by the similarities and differences in multivariate spatial patterns of neural representations, as opposed to by univariate regional-average response magnitudes, we ran a univariate analysis in each ROI. This analysis revealed no significant effect of group (“spoiled”, “twist”, “no-twist”) or interaction between group and condition (movie, recall) (Table 1, see Methods for details).

      Next, to determine whether scene-specific neural event representations—as opposed to coarser differences in general mental state across all scenes with similar interpretations—drive our observed pISC differences, we shuffled the labels of critical scenes within each group before calculating and comparing pISC across groups. By repeating this procedure 1000 times and recalculating the interaction index at each iteration, we constructed a null distribution of interaction indices for shuffled critical scenes (light magenta distributions in Figure 4B). In 12 out of 24 DMN regions, interaction indices were statistically significant based on the shuffled-scene distribution (p < .025, FDR controlled at q < .05). All of these 12 regions were among the ROIs that showed meaningful effects in our original analysis (Figure 2C). Regions with significant scene-specific interaction effects are marked as blue dots with black borders in Figure 4B. Overall, the findings from this analysis confirm that our results are driven by changes to scene-specific representations."

      3) The study does not reveal what the DMN represents about the movie, such that its activity changes after knowing the twist. The Discussion briefly mentions that it may reflect the state of the observer, related to the belief about the identity of the doctor. This suggests a link to the theory of mind/mentalizing, but this is not made explicit. Alternatively, the DMN may be involved in the conflict (or switching) between the two interpretations.

      Great points. We added to the discussion about the role of mentalizing network and in the particular temporo-parietal cortex. About your last point, we think our whole brain findings outside DMN (ACC and dlPFC) might relate to that point. We discussed these further in the paper.

      "We performed two targeted analyses to look for evidence of memory updating across encoding and recall: the interaction analysis (Figure 2C) and the encoding-recall analysis (Figure 3). We hypothesized that a shift in direction of pISC difference would occur when neural representations during recall in the “twist” group start to reflect the Ghost interpretation. The interaction analysis probed this shift indirectly by taking into account the effects of both encoding-encoding and recall-recall analyses. Unlike the interaction analysis, in the encoding-recall analysis, we directly compared neural event representations during encoding and recall. Interestingly, all regions exhibiting an effect across the two encoding-recall analyses, excluding left anterior temporal cortex, were present in the interaction results. Among these regions, the left angular gyrus/TPJ exhibited an effect across all three analyses. As a core hub in the mentalizing network, temporo-parietal cortex has been implicated in theory of mind through perspective-taking, rationalizing the mental state of someone else, and modeling the attentional state of others (Frith and Frith 2006, Guterstam et. al 2021, Saxe and Kanwisher 2003). The motivations behind some actions of the main character in the movie heavily depend on whether the viewer perceives them as a Doctor or a Ghost, and participants may focus on this during both encoding and recall. We speculate that neural event representations in AG/TPJ in the current experiment may be related to mentalizing about the main character’s actions. Under this interpretation, the updated event representations during recall following the twist would be more closely aligned to the “spoiled” encoding representations, as a consequence of memory updating in the “twist” group.

      In our whole brain analysis, these regions did not have significant interaction effects, which suggests that the effects were isolated to encoding. In the whole-brain analysis, we also observed a significant encoding-encoding and interaction effects in anterior cingulate cortex, as well as recall-recall and interaction effects in dlPFC. These results suggest that both the "spoiled" manipulation and the "twist" may recruit top-down control and conflict monitoring processes during naturalistic viewing and recall."

      4) The design has many naturalistic aspects, but it is also different from real life in that the critical twist involves a ghost. Furthermore, all results are based on one movie with a specific plot twist. It is thus not clear whether similar results would be obtained with other and more naturalistic plot twists.

      We added this as a limitation of the study.

      "Our findings provide further insight into the functional role of the DMN. However, these results have been obtained using only one movie. While naturalistic paradigms better capture the complexity of real life and provide greater ecological generalizability than highly-controlled experimental stimuli and tasks (Nastase et al., 2020), they are still limited by the properties of the particular naturalistic stimulus used. For example, this movie—including the twist itself—hinges on suspension of disbelief about the existence of ghosts. Future work is needed to extend our findings about updating event memories to a broader class of naturalistic stimuli: for example, movies with different kinds of (non-supernatural) plot twists, spoken stories with twist endings, or using autobiographical real-life situations where new information (e.g. discovering a longtime friend has lied about something important) triggers re-evaluation of the past (e.g. reinterpreting their friend’s previous actions)."

      5) Only 7 scenes (out of 18) were included in the analysis. It is not clear if/how the results depend on the selection of these 7 scenes.

      Thank you for bringing this up. These scenes were pre-selected for the analyses, as they are the only scenes that are rated high by our independent raters (not study participants) on “twist influence”, meaning that knowing the twist could dramatically change their interpretation. So, we had a priori reasons to hypothesize that the effect will be strong in these scenes. To address your point, we report results by including all 18 scenes in a new Results section entitled “The role of scene content” and in Figure 4A. While the effect was weaker for all scenes it was still apparent in this conservative analysis. As expected, however, including 7 critical scenes produces stronger results than including all scenes or the uncritical scenes (all minus critical scenes). Please see the “The role of scene content” in Results and in Figure 4 for more detailed information.

      "The role of scene content In the prior analyses, we focused on “critical scenes”, selected based on ratings from four raters who quantified the influence of the twist on the interpretation of each scene (see Methods). An independent post-experiment analysis of the verbal recall behavior of the fMRI participants yielded “twist scores” that were also highest for these scenes; that is, the expected and perceived effect of twist information on recall behavior were found to match. In our next analysis, we asked whether the neural event representations reflect these differences in the twist-related content of the scenes. In other words, are the “critical scenes” with highly twist-dependent interpretations truly critical for our observed effects?

      To answer this question, we re-ran our main encoding-encoding and recall-recall pISC analysis in each DMN ROI (Figure 2-3). We calculated interaction indices (Figure 2C) first by including all scenes, and second by including only the 11 non-critical scenes. To better compare the effect of including different subsets of scenes to our original results, in Figure 4 we show the results in 15 ROIs that exhibited meaningful effects in our main analyses (Figure 2C). Figure 4A demonstrates that “critical scenes” yielded higher interaction indices compared to all scenes or non-critical scenes across all ROIs. The interaction score across all DMN ROIs was significantly higher in “critical scenes” than all scenes (t(23) = 7.19, p = 2.53 x 10-7) and non-critical scenes (t(23) = 7.3, p = 1.95 x 10-7). These results show that critical scenes are indeed responsible for the observed pISC differences across groups."

      Reviewer #2 (Public Review):

      In this manuscript titled "Here's the twist: How the brain updates the representations of naturalistic events as our understanding of the past changes", the authors reported a study that examined how new information (manipulated as a twist at the end of a movie) changes the neural representations in the default mode network (DMN) during the recall of prior knowledge. Three groups of participants were compared - one group experienced the twist at the end, one group never experienced the twist, and one group received a spoiler at the beginning. At retrieval, participants received snippets of 18 scenes of the movie as cues and were asked to freely describe the events of each scene and to provide the most accurate interpretation of the scene, given the information they gathered throughout watching.

      All three groups were highly accurate in the recall of content. The groups that experienced the twist at the end as well as at the beginning as a spoiler showed a higher twist score (the extent to which twist information was incorporated into the recall), while seemingly also keeping the interpretation without the twist ("Doctor representation") intact. Neurally, several regions in the DMN showed significant interaction effects in their neural similarity patterns (based on intersubject pattern correlation), indicating a change in interpretation between encoding and recall in the twist group uniquely, presumably reflecting memory updating.

      Several points that I think should be addressed to strengthen the manuscript:

      1) The results from encoding-retrieval similarity analysis (particularly the one depicted in Figure 3B) don't match the results from encoding/retrieval interaction (particularly those shown in Figure 2C). While they were certainly based on different comparisons, I would think that both analyses were set up to test for memory updating. Can the authors comment on this divergence in results?

      Thank you for your comment. Except for one ROI, the other two regions in Figure 2C are present in the interaction analysis. The ROI at the frontal pole might be hard to see from this angle but in fact it holds a high effect size in interaction analysis. So we do not see a big divergence between these two results. But taking into account the recall-recall results, we agree that there seems to be inhomogeneity. We discussed these further in the discussion.

      "We performed two targeted analyses to look for evidence of memory updating across encoding and recall: the interaction analysis (Figure 2C) and the encoding-recall analysis (Figure 3). We hypothesized that a shift in direction of pISC difference would occur when neural representations during recall in the “twist” group start to reflect the Ghost interpretation. The interaction analysis probed this shift indirectly by taking into account the effects of both encoding-encoding and recall-recall analyses. Unlike the interaction analysis, in the encoding-recall analysis, we directly compared neural event representations during encoding and recall. Interestingly, all regions exhibiting an effect across the two encoding-recall analyses, excluding left anterior temporal cortex, were present in the interaction results. Among these regions, the left angular gyrus/TPJ exhibited an effect across all three analyses. As a core hub in the mentalizing network, temporo-parietal cortex has been implicated in theory of mind through perspective-taking, rationalizing the mental state of someone else, and modeling the attentional state of others (Frith and Frith 2006, Guterstam et. al 2021, Saxe and Kanwisher 2003). The motivations behind some actions of the main character in the movie heavily depend on whether the viewer perceives them as a Doctor or a Ghost, and participants may focus on this during both encoding and recall. We speculate that neural event representations in AG/TPJ in the current experiment may be related to mentalizing about the main character’s actions. Under this interpretation, the updated event representations during recall following the twist would be more closely aligned to the “spoiled” encoding representations, as a consequence of memory updating in the “twist” group.

      Our findings are consistent with the view that DMN synthesizes incoming information with one’s prior beliefs and memories (Yeshurun et al 2021). We add to this framework by providing evidence for the involvement of DMN regions in updating prior beliefs in light of new knowledge. Across our different encoding and recall analyses, we observe memory updating effects in a varied subset of DMN regions that do not cleanly map onto a specific subsystem of DMN (Robin and Moscovitch 2017, Ranganath and Ritchey 2012, Ritchey and Cooper 2020). Rather than being divergent, these results might be reflecting inherent differences between the processes of encoding and recall of naturalistic events. It has been proposed that neural representations corresponding to encoding of events are systematically transformed during recall of those events (Chen et al 2017, Favila et al 2020, Musz and Chen 2022). While we provide evidence for reinstatement of memories in DMN, our findings also support a transformation of neural representation during recall, as encoding-recall results were weaker in some areas than recall-recall findings. This transformation could affect how different regions and sub-systems of DMN represent memories, and suggests that the concerted activity of multiple subsystems and neural mechanisms might be at play during encoding, recall and successful updating of naturalistic event memories."

      2) The recall task was self-paced. Can reaction time information be provided on how long participants needed to recall? Did this differ across groups? Presumably in the twist group and spoiled group participants might have needed a longer time to incorporate both the original and twist interpretation.

      This is an interesting idea. Unfortunately, we could not measure this accurately because our recall cues were snippets from the beginning of each scene with different length (selected based on content). And updating could begin from the beginning of those snippets (but we wouldn’t know when). We will consider this point in the future related designs.

      How was the length difference across events taken into consideration in the beta estimates?

      They were used as event durations in the GLM model.

      Also, is there an order effect, such that one type of interpretation tended to be recalled first?

      This is hard to measure as this only occurs in a subset of scenes. But we assume it happens in other people’s brains as well

      This is indeed hard to measure as you mentioned. We will provide the transcripts when sharing the data and hopefully this will facilitate future text-analysis work on this dataset to answer interesting questions like this.

      3) The correlation analysis between neural pattern change and behavioral twist score is based on a small sample size and does not seem to be well suited to test the postulation of the authors, namely that some participants may hold both interpretations in their memory. Interestingly, the twist score of the spoiled group was similar to the twist group, indicating participants in this group might have held both interpretations as well. Could this observation be leveraged, for example by combining both groups (hence better powered with larger sample size), in order to relate individual differences in neural similarity patterns and behavioral tendency to hold both interpretations?

      Even though both groups showed signs of holding both interpretations in mind, the process happening in their brain during the recall is different. In particular, we do not expect to see any updating effect in the spoiled group. So it wouldn’t seem accurate to combine these groups to test the effect of incomplete updating.

      4) Several regions within the DMN were significant across the analysis steps, specifically the angular gyrus, middle temporal cortex, and medial PFC. Can the authors provide more insights on how these widely distributed regions may act together to enable memory updating? The discussion on the main findings is largely at a rather superficial level about DMN, or focuses specifically on vmPFC, but neglects the distributed regions that presumably function interactively

      Thanks for bringing this up. We added text to discussion to respond to this very valid point. Please see the added text in our response to your first point. One more snippet added to the discussion about this:

      "In addition to mPFC, right precuneus and parts of temporal cortex exhibited significantly higher pattern similarity in the “twist” and “spoiled” groups who recalled the movie with the same interpretation. Precuneus is a core region in the posterior medial network, which is hypothesized to be involved in constructing and applying situation models (Ranganath and Ritchey 2012). Our findings support a role for precuneus in deploying interpretation-specific situation models when retrieving event memories. In particular, we suggest that the posterior medial network may encode a shift in the situation model of the “twist” group in order to accommodate the new Ghost interpretation.

      We performed two targeted analyses to look for evidence of memory updating across encoding and recall: the interaction analysis (Figure 2C) and the encoding-recall analysis (Figure 3). We hypothesized that a shift in direction of pISC difference would occur when neural representations during recall in the “twist” group start to reflect the Ghost interpretation. The interaction analysis probed this shift indirectly by taking into account the effects of both encoding-encoding and recall-recall analyses. Unlike the interaction analysis, in the encoding-recall analysis, we directly compared neural event representations during encoding and recall. Interestingly, all regions exhibiting an effect across the two encoding-recall analyses, excluding left anterior temporal cortex, were present in the interaction results. Among these regions, the left angular gyrus/TPJ exhibited an effect across all three analyses. As a core hub in the mentalizing network, temporo-parietal cortex has been implicated in theory of mind through perspective-taking, rationalizing the mental state of someone else, and modeling the attentional state of others (Frith and Frith 2006, Guterstam et. al 2021, Saxe and Kanwisher 2003). The motivations behind some actions of the main character in the movie heavily depend on whether the viewer perceives them as a Doctor or a Ghost, and participants may focus on this during both encoding and recall. We speculate that neural event representations in AG/TPJ in the current experiment may be related to mentalizing about the main character’s actions. Under this interpretation, the updated event representations during recall following the twist would be more closely aligned to the “spoiled” encoding representations, as a consequence of memory updating in the “twist” group.

      Our findings are consistent with the view that DMN synthesizes incoming information with one’s prior beliefs and memories (Yeshurun et al 2021). We add to this framework by providing evidence for the involvement of DMN regions in updating prior beliefs in light of new knowledge. Across our different encoding and recall analyses, we observe memory updating effects in a varied subset of DMN regions that do not cleanly map onto a specific subsystem of DMN (Robin and Moscovitch 2017, Ranganath and Ritchey 2012, Ritchey and Cooper 2020). Rather than being divergent, these results might be reflecting inherent differences between the processes of encoding and recall of naturalistic events. It has been proposed that neural representations corresponding to encoding of events are systematically transformed during recall of those events (Chen et al 2017, Favila et al 2020, Musz and Chen 2022). While we provide evidence for reinstatement of memories in DMN, our findings also support a transformation of neural representation during recall, as encoding-recall results were weaker in some areas than recall-recall findings. This transformation could affect how different regions and sub-systems of DMN represent memories, and suggests that the concerted activity of multiple subsystems and neural mechanisms might be at play during encoding, recall and successful updating of naturalistic event memories."

      Reviewer #3 (Public Review):

      Zadbood and colleagues investigated the way key information used to update interpretations of events alter patterns of activity in the brain. This was cleverly done by the use of "The Sixth Sense," a film featuring a famous "twist ending," which fundamentally alters the way the events in the film are understood. Participants were assigned to three groups: (1) a Spoiled group, in which the twist was revealed at the outset, (2) a Twist group, who experienced the film as normal, and (3) a No-Twist group, in which the twist was removed. Participants were scanned while watching the movie and while performing cued recall of specific scenes. Verbal recall was scored based on recall success, and evidence for descriptive bias toward two ways of understanding the events (specifically, whether a particular character was or was not a ghost). Importantly, this allowed the authors to show that the Twist group updated their interpretation. The authors focused on regions of the Default Mode Network (DMN) based on prior studies showing responsiveness to naturalistic memory paradigms in these areas and analyzed the fMRI data using intersubject pattern similarity analysis. Regions of the DMN carried patterns indicative of story interpretation. That is, encoding similarity was greater between the Twist and No-Twist groups than in the Spoiled group, and retrieval similarity was greater between the Twist and Spoiled groups than in the No-Twist group. The Spoiled group also showed greater pattern similarity with the Twist group's recall than the No-Twist group's recall. The authors also report a weaker effect of greater pattern similarity between the Spoiled group's encoding and the Twist group's recall than between the Twist group's own encoding and recall. Together, the data all converge on the point that one's interpretation of an event is an important determinant of the way it is represented in the brain.

      This is a really nice experiment, with straightforward predictions and analyses that support the claims being made. The results build directly on a prior study by this research group showing how interpretational differences in a narrative drive distinct neural representations (Yeshurun et al., 2017), but extend an understanding of how these interpretational differences might work retrospectively. I do not have any serious concerns or problems with the manuscript, the data, or the analyses. However I have a few points to raise that, if addressed, would make for a stronger paper in my opinion.

      1) My most substantive comment is that I did not find the interpretive framework to be very clear with respect to the brain regions involved. The basic effects the authors report strongly support their claims, but the particular contributions to the field might be stronger if the interpretations could be made more strongly or more specifically. In other words: the DMN is involved in updating interpretations, but how should we now think about the role of the DMN and its constituent regions as a result of this study? There are a number of ideas briefly presented about what the DMN might be doing, but it just did not feel very coherent at times. I will break this down into a few more specific points:

      While many of us would agree that the DMN is likely to be involved in the phenomena at hand, I did not find that the paper communicated the logic for singularly focusing on this subset of regions very compellingly. The authors note a few studies whose main results are found in DMN regions, but I think that this could stand to be unpacked in a more theoretically interesting way in the Introduction.

      Relatedly, I found the summary/description of regional effects in the Discussion to be a bit unsatisfying. The various pattern similarity comparisons yielded results that were actually quite nonoverlapping among DMN regions, which was not really unpacked. To be clear, it is not a 'problem' that the regional effects varied from comparison to comparison, but I do think that a more theoretical exploration of what this could mean would strengthen the paper. To the authors' credit, they describe mPFC effects through the lens of schemas, but this stands in contrast to many other regions which do not receive much consideration.

      Finally, although there is evidence that regions of the DMN act in a coordinated way under some circumstances, there is also ample evidence for distinct regional contributions to cognitive processes, memory being just one of them (e.g., Cooper & Ritchey, 2020; Robin & Moscovitch, 2017; Ranganath & Ritchey, 2012). The authors themselves introduce the idea of temporal receptive windows in a cortical hierarchy, and while DMN regions do appear to show slower temporal drift than sensory areas, those studies show regional differences in pattern stability across time even within DMN regions. Simply put, it is worth considering whether it is ideal to treat the DMN as a singular unit.

      Thank you for your helpful comments. We added text to the introduction and discussion to address your point:

      "Introduction:

      The brain’s default mode network (DMN)—comprising the posterior medial cortex, medial prefrontal cortex, temporoparietal junction, and parts of anterior temporal cortex—was originally described as an intrinsic or “task-negative” network, activated when participants are not engaged with external stimuli (Raichle et al. 2001, Buckner et al 2008). This observation led to a large body of work showing that the DMN is an important hub for supporting internally driven tasks such as memory retrieval, imagination, future planning, theory of mind, and creating and updating situation models (Svoboda et al. 2006; Addis et al. 2007; Hassabis and Maguire 2007, 2009; Schacter et al. 2007; Szpunar et al. 2007; Spreng et al. 2009, Koster-Hale & Saxe, 2013 2013, Ranganath and Ritchey 2012). However, it is not fully understood how this network contributes to these varying functions, and in particular—the focus of the present study—memory processes. Activation of this network during “offline” periods has been proposed to play a role in the consolidation of memories through replay (Kaefer et al 2022). Interestingly, prior work has also shown that the DMN is reliably engaged during “online” processing (encoding) of continuous rich dynamic stimuli such as movies and audio stories (Stephens et al 2013, Hasson et al 2008). Regions in this network have been shown to have long “temporal receptive windows” (Hasson et al 2008; Lerner et al., 2011; Chang et al., 2022), meaning that they integrate and retain high-level information that accumulates over the course of extended timescales (e.g. scenes in movies, paragraphs in text) to support comprehension. This combination of processing characteristics suggests that the DMN integrates past and new knowledge, as regions in this network have access to incoming sensory input, recent active memories, and remote long-term memories or semantic knowledge (Yeshurun et al 2021, Hasson et al 2015). These integration processes feature in many of the “constructive” processes attributed to DMN such as imagination, future planning, mentalizing, and updating situation models (Schacter and Addis 2007, Ranganath and Ritchey 2012). Notably, constructive processes are highly relevant to real-world memory updating, which involves selecting and combining the relevant parts of old and new memories. Recent work has shown that neural patterns during encoding and recall of naturalistic stimuli (movies) are reliably similar across participants in this network (Chen et al. 2017; Oedekoven et al., 2017; Zadbood et al., 2017; see Bird 2020 for a review of recent naturalistic studies on memory), and the DMN displays distinct neural activity when listening to the same story with different perspectives (Yeshurun et al 2017). Building on this foundation of prior work on the DMN, we asked whether we could find neural evidence for the retroactive influence of new knowledge on past memories."

      "Discussion :

      In addition to mPFC, right precuneus and parts of temporal cortex exhibited significantly higher pattern similarity in the “twist” and “spoiled” groups who recalled the movie with the same interpretation. Precuneus is a core region in the posterior medial network, which is hypothesized to be involved in constructing and applying situation models (Ranganath and Ritchey 2012). Our findings support a role for precuneus in deploying interpretation-specific situation models when retrieving event memories. In particular, we suggest that the posterior medial network may encode a shift in the situation model of the “twist” group in order to accommodate the new Ghost interpretation.

      We performed two targeted analyses to look for evidence of memory updating across encoding and recall: the interaction analysis (Figure 2C) and the encoding-recall analysis (Figure 3). We hypothesized that a shift in direction of pISC difference would occur when neural representations during recall in the “twist” group start to reflect the Ghost interpretation. The interaction analysis probed this shift indirectly by taking into account the effects of both encoding-encoding and recall-recall analyses. Unlike the interaction analysis, in the encoding-recall analysis, we directly compared neural event representations during encoding and recall. Interestingly, all regions exhibiting an effect across the two encoding-recall analyses, excluding left anterior temporal cortex, were present in the interaction results. Among these regions, the left angular gyrus/TPJ exhibited an effect across all three analyses. As a core hub in the mentalizing network, temporo-parietal cortex has been implicated in theory of mind through perspective-taking, rationalizing the mental state of someone else, and modeling the attentional state of others (Frith and Frith 2006, Guterstam et. al 2021, Saxe and Kanwisher 2003). The motivations behind some actions of the main character in the movie heavily depend on whether the viewer perceives them as a Doctor or a Ghost, and participants may focus on this during both encoding and recall. We speculate that neural event representations in AG/TPJ in the current experiment may be related to mentalizing about the main character’s actions. Under this interpretation, the updated event representations during recall following the twist would be more closely aligned to the “spoiled” encoding representations, as a consequence of memory updating in the “twist” group.

      Our findings are consistent with the view that DMN synthesizes incoming information with one’s prior beliefs and memories (Yeshurun et al 2021). We add to this framework by providing evidence for the involvement of DMN regions in updating prior beliefs in light of new knowledge. Across our different encoding and recall analyses, we observe memory updating effects in a varied subset of DMN regions that do not cleanly map onto a specific subsystem of DMN (Robin and Moscovitch 2017, Ranganath and Ritchey 2012, Ritchey and Cooper 2020). Rather than being divergent, these results might be reflecting inherent differences between the processes of encoding and recall of naturalistic events. It has been proposed that neural representations corresponding to encoding of events are systematically transformed during recall of those events (Chen et al 2017, Favila et al 2020, Musz and Chen 2022). While we provide evidence for reinstatement of memories in DMN, our findings also support a transformation of neural representation during recall, as encoding-recall results were weaker in some areas than recall-recall findings. This transformation could affect how different regions and sub-systems of DMN represent memories, and suggests that the concerted activity of multiple subsystems and neural mechanisms might be at play during encoding, recall and successful updating of naturalistic event memories."

      2) I think that some direct comparison to regions outside the DMN would speak to whether the DMN is truly unique in carrying the key representations being discussed here. I was reluctant to suggest this because I think that the authors are justified in expecting that DMN regions would show the effects in question. However, there really is no "null" comparison here wherein a set of regions not expected to show these effects (e.g., a somatosensory network, or the frontoparietal network) in fact do not show them. There are not really controls or key differences being hypothesized across different conditions or regions. Rather, we have a set of regions that may or may not show pattern similarity differences to varying degrees, which feels very exploratory. The inclusion of some principled control comparisons, etc. would bolster these findings. The authors do include a whole-brain analysis in Supplementary Figure 1, which indeed produced many DMN regions. However, notably, regions outside the DMN such as the primary visual cortex and mid-cingulate cortex appear to show significant effects (which, based on the color bar, might actually be stronger than effects seen in the DMN). Given the specificity of the language in the paper in terms of the DMN, I think that some direct regional or network-level comparison is needed.

      In the original submission, we included additional analyses for visual and somatosensory networks, which we hypothesized would serve as control networks. Following your comment, in the revision, we added a separate section (included below) more thoroughly examining these analyses. We also added text to the results and discussion to explain our interpretation of these findings.

      "Changes in neural representations beyond DMN We focused our core analyses on regions of the default mode network. Prior work has shown that multimodal neural representations of naturalistic events (e.g. movie scenes) are similar across encoding (movie-watching or story-listening) and verbal recall of the same events in the DMN (Chen et al., 2017; Zadbood et al., 2017). Therefore, in the current work we hypothesized that retrospective changes in the neural representations of events as the narrative interpretation shifts would be observed in the DMN. We did not, for example, expect to observe such effects in lower-level sensory regions, where neural activity differs dramatically for movie-viewing and verbal recall. To be thorough, we ran the same set of analyses we performed in the DMN (Figure 2-3) in regions of the visual and somatomotor networks extracted from the same atlas parcellation (Schaefer et al., 2018). Our results revealed larger overall differences in DMN than in visual and somatosensory networks for the key comparisons discussed previously (Figure S2). In particular, the only regions showing significant differences in pISC in recall-recall and encoding-recall comparisons (p < 0.01, uncorrected) were located in the DMN. We did not observe a notable difference between DMN and the two other networks when comparing recall “twist” to movie “spoiled” and recall “twist” to movie “twist” (RG – MG > RG – MD) which is consistent with the weak effect in the original comparison (Figure 3B). In the encoding-encoding comparison, several ROIs from the visual and somatomotor networks showed relatively strong effects as well (see Discussion).

      In addition, we qualitatively reproduced our results by performing an ROI-based whole brain analysis (Figure S3, p < 0.01 uncorrected). This analysis confirmed the importance of DMN regions for updating neural event representations. However, strong differences in pISC in the hypothesized direction were also observed in a handful of other non-DMN regions, including ROIs partly overlapping with anterior cingulate cortex and dorsolateral prefrontal cortex (see Discussion)."

      "Discussion: While our main goal in this paper was to examine how neural representations of naturalistic events change in the DMN, we also examined visual and somatosensory networks. Aside from the encoding-encoding analysis in which some visual and somatosensory regions showed stronger similarity between two groups with the same interpretation of the movie, we did not find any regions with significant effects in these two networks in the other analyses. Unlike the recall phase where each participant has their unique utterance with their own choice of words and concepts to describe the movie, the encoding (move-watching) stimulus is identical across all groups. Therefore, the effects observed during encoding-encoding analysis in sensory regions could reflect similarity in perception of the movie guided by similar attentional state while watching scenes with the same interpretation (e.g. similarity in gaze location, paying attention to certain dialogues, or small body movements while watching the movie with the same Doctor or Ghost interpretations). In our whole brain analysis, these regions did not have significant interaction effects, which suggests that the effects were isolated to encoding. In the whole-brain analysis, we also observed a significant encoding-encoding and interaction effects in anterior cingulate cortex, as well as recall-recall and interaction effects in dlPFC. These results suggest that both the "spoiled" manipulation and the "twist" may recruit top-down control and conflict monitoring processes during naturalistic viewing and recall."

      3) If I understand correctly, the main analyses of the fMRI data were limited to across-group comparisons of "critical scenes" that were maximally affected by the twist at the end of the movie. In other words, the analyses focused on the scenes whose interpretation hinged on the "doctor" versus "ghost" interpretation. I would be interested in seeing a comparison of "critical" scenes directly against scenes where the interpretation did not change with the twist. This "critical" versus "non-critical" contrast would be a strong confirmatory analysis that could further bolster the authors' claims, but on the other hand, it would be interesting to know whether the overall story interpretation led to any differences in neural patterns assigned to scenes that would not be expected to depend on differences in interpretation. (As a final note, such a comparison might provide additional analytical leverage for exploring the effect described in Figure 3B, which did not survive correction for multiple comparisons.)

      This is a helpful suggestion, and we’ve added an analysis addressing your comment. We found that the interaction index capturing the difference between the three groups was stronger for the critical scenes than for the non-critical scenes for almost all DMN ROIs.

      "The role of scene content In the prior analyses, we focused on “critical scenes”, selected based on ratings from four raters who quantified the influence of the twist on the interpretation of each scene (see Methods). An independent post-experiment analysis of the verbal recall behavior of the fMRI participants yielded “twist scores” that were also highest for these scenes; that is, the expected and perceived effect of twist information on recall behavior were found to match. In our next analysis, we asked whether the neural event representations reflect these differences in the twist-related content of the scenes. In other words, are the “critical scenes” with highly twist-dependent interpretations truly critical for our observed effects?

      To answer this question, we re-ran our main encoding-encoding and recall-recall pISC analysis in each DMN ROI (Figure 2-3). We calculated interaction indices (Figure 2C) first by including all scenes, and second by including only the 11 non-critical scenes. To better compare the effect of including different subsets of scenes to our original results, in Figure 4 we show the results in 15 ROIs that exhibited meaningful effects in our main analyses (Figure 2C). Figure 4A demonstrates that “critical scenes” yielded higher interaction indices compared to all scenes or non-critical scenes across all ROIs. The interaction score across all DMN ROIs was significantly higher in “critical scenes” than all scenes (t(23) = 7.19, p = 2.53 x 10-7) and non-critical scenes (t(23) = 7.3, p = 1.95 x 10-7). These results show that critical scenes are indeed responsible for the observed pISC differences across groups."

      4) I appreciate the code being made available and that the neuroimaging data will be made available soon. I would also appreciate it if the authors made the movie stimulus and behavioral data available. The movie stimulus itself is of interest because it was edited down, and it would be nice for readers to be able to see which scenes were included.

      Unfortunately due to copyright, we cannot share the movie stimulus outright. However, we will share the timing of the cuts used, as well as the time-stamped transcripts of verbal recall.

      To sum up, I think that this is a great experiment with a lot of strengths. The design is fairly clean (especially for a movie stimulus), the analyses are well reasoned, and the data are clear. The only weaknesses I would suggest addressing are with regards to how the DMN is being described and evaluated, and the communication of how this work informs the field on a theoretical level.

    1. Author Response

      Reviewer #1 (Public Review):

      In a very interesting and technically advanced study, the authors measured the force production of curved protofilaments at depolymerizing mammalian microtubule ends using an optical trap assay that they developed previously for yeast microtubules. They found that the magnesium concentration affects this force production, which they argue based on a theoretical model is due to affecting the length of the protofilament curls, as observed previously by electron microscopy. Comparing with their previous force measurements, they conclude that mammalian microtubules produce smaller force pulses than yeast microtubules due to shorter protofilament curls. This work provides new mechanistic insight into how shrinking microtubules exert forces on cargoes such as for example kinetochores during cell division. The experiments are sophisticated and appear to be of high quality, conclusions are well supported by the data, and language is appropriate when conclusions are drawn from more indirect evidence. Given that the experimental setup differs from the previous optical trap assay (antibody plus tubulin attached to bead versus only antibody attached to bead), a control experiment could be useful with yeast microtubules using the same protocol used in the new variant of the assay, or at least a discussion regarding this issue. One open question may be whether the authors can be sure that measured forces are only due to single depolymerizing protofilaments instead of two or more protofilaments staying laterally attached for a while. How would this affect the interpretation of the data?

      This work will be of interest to cell biologists and biophysicists interested in spindle mechanics or generally in filament mechanics.

      Thank you for your careful reading of our manuscript, your kind remarks, and your favorable review.

      Reviewers #1 and #2 both mentioned a concern about potential differences between our previous setup with yeast microtubules, versus our new setup with predominantly bovine microtubules, and whether such differences might underlie the different pulse amplitudes we measured. We think this concern comes mainly from a misunderstanding of how the beads in both setups were tethered to the sides of the microtubules, and we apologize for not making this aspect clearer in our original submission.

      It is true that our new setup requires one additional step, pre-decoration of the anti-His beads with His6-tagged yeast tubulin. However, in both cases, the anti-His antibodies were kept very sparse on the beads to ensure that most beads, if they became tethered to a microtubule, were attached by a single antibody. (~30 pM beads were mixed with 30 pM of anti-His antibody, for a molar ratio of 1:1.) And even though the anti-His beads in our previous work did not undergo a separate incubation step for pre-decoration with tubulin, they undoubtedly were decorated immediately after being mixed into the microtubule growth mix, which in that case included ~1 µM of unpolymerized His6-tagged yeast tubulin dimers. Thus, the arrangement with beads tethered laterally to the sides of microtubules via single antibodies was created in both cases by essentially the same three-step process: First, beads decorated very sparsely with anti-His antibodies were bound to unpolymerized His6-tagged yeast tubulin. Second, a bead-tethered His6-tagged yeast tubulin was incorporated into the growing tip of a microtubule (which could be assembling from either yeast or bovine tubulin, depending on the experiment). Third, the tip grew past the bead to create a large extension. Because the beads in both scenarios were tethered by a single antibody to the same C-terminal tail of yeast β-tubulin, the differences in pulse amplitude cannot be explained by differences in the tethering. In our revised manuscript, we now mention explicitly in Results that the beads were tethered by single antibodies (lines 95 to 100). In Methods we significantly expanded the section about preparation of beads and how they became tethered (lines 365 to 393). [We refer here, and below, to line numbers when the document is viewed with “All Markup” shown.]

      You also raise an interesting, open question: Do protofilaments curl outward entirely independently of their lateral neighbors? Or under some conditions might they tend to stay laterally associated during the curling process, perhaps curling outward in pairs rather than as individual protofilaments? We cannot formally rule out the possibility that such lateral associations sometimes persist during protofilament curling. However, changes in lateral association seem unlikely to explain the magnesium- and species-dependent differences we measured in pulse amplitude, for several reasons: First, there is good evidence for lengthening of protofilament curls at disassembling tips (e.g., Mandelkow 1991, Tran & Salmon 1997), but we are not aware of convincing evidence for magnesium or species-dependent increases in the propensity of curling protofilaments to remain laterally associated. Second, an increase in lateral association should increase the effective flexural rigidity of the curls, but under all the conditions we examined, pulse enlargement was associated with a steepening of the amplitude-vs-force relation – i.e., with softening, not stiffening. Our model indicates that this softening can be fully explained by an increase in protofilament contour length, without any change in the intrinsic flexural rigidity of the protofilament curls.

      Reviewer #2 (Public Review):

      Microtubules are regarded as dynamic tracks for kinesin and dynein motors that generate force for moving cargoes through cells, but microtubules also act as motors themselves by generating force from outward splaying protofilaments at depolymerizing ends. Force from depolymerization has been demonstrated in vitro and is thought to contribute to chromosome movement and other contexts in cells. Although this model has been in the field for many years, key questions have remained unanswered, including the mechanism of force generation, how force generated might be regulated in cells, and how this system might be tuned across cellular contexts or organisms. The barrier is that we lack an understanding of experimental conditions that can be used to control protofilament shape and energetics. This study by Murray and colleagues makes an important advance towards overcoming that barrier.

      This study builds on previous work from the authors where they developed a system to directly measure forces generated by outward curling protofilaments at depolymerizing microtubule ends. That study showed for the first time that protofilaments act like elastic springs and related the generated force to the estimated energy contained in the microtubule lattice. Furthermore, they showed that slowing polymerization rate did not diminish force generation. That study used recombinant yeast tubulin, including a 6x histidine tag on beta tubulin that created attachment points for the bead on the microtubule lattice. The current study extends that system to show that work output is related to the length of protofilament curls.

      We are grateful for your very thoughtful and thorough review, which has helped us improve our manuscript.

      Murray and colleagues show this by manipulating curls in two ways - using bovine brain tubulin instead of yeast tubulin and altering magnesium concentration. Previous EM studies indicated that protofilaments on depolymerizing bovine microtubules have similar curvature but are shorter. The authors here use a blend of bovine brain tubulin and bead-linked recombinant yeast tubulin with the 6x histidine tag in their in vitro system and find smaller deflections of the laser-trapped bead than previously observed with pure yeast tubulin. A concern with comparing this heterogeneous bovine/yeast system to the previous work with homogeneous yeast tubulin is that density of 6x histidine-tagged tubulin subunits is likely to be different between the two systems. Also, the rate of incorporation of 6x histidine yeast tubulin into bovine microtubules in the current study may be different from the rate of incorporation into yeast microtubules in the previous study. These differences could lead to changes in the strength of bead attachment to the microtubule lattice and alter the compliance of the bead to deflection by curling protofilaments. These possibilities and lattice attachment strength are not explored in this study, raising concerns about comparing the two systems.

      Reviewers #1 and #2 both mentioned a concern about potential differences between our previous setup with yeast microtubules, versus our new setup with predominantly bovine microtubules, and whether such differences might underlie the different pulse amplitudes we measured. As detailed in our response to Reviewer #1 above, we think this concern comes mainly from a misunderstanding of how the beads in both setups were tethered to the sides of the microtubules, and we apologize for not making this aspect clearer in our original submission. For both our yeast and bovine microtubule experiments, the anti-His antibodies were kept very sparse on the beads to ensure that most beads, if they became tethered to a microtubule, were attached by a single antibody. Because the beads in both scenarios were tethered by a single antibody to the same C-terminal tail of yeast β-tubulin, the differences in pulse amplitude cannot be explained by differences in the tethering. In our revised manuscript, we now mention explicitly in Results that the beads were tethered by single antibodies (lines 95 to 100). In Methods we significantly expanded the section about preparation of beads and how they became tethered (lines 365 to 393).

      The authors go on to show that magnesium increases bead deflection and work output from the system. The use of magnesium was motivated by earlier studies which showed that increasing magnesium speeds up depolymerization and increases the lengths of protofilament curls. The use of magnesium here provides the first evidence that work output can be tuned biochemically. This is an important finding. The authors then go on to show that the effect of magnesium on bead deflection can be separated from its effect on depolymerization speed. They do this by proteolytically removing the beta tubulin tail domain, which previous studies had shown to be necessary to mediate the magnesium effect on depolymerization rate. The authors arrive at a conclusion that magnesium must promote protofilament work output by increasing their lengths. How magnesium might do this remains unanswered. The mechanistic insight from the magnesium experiments ends there, but the authors discuss possible roles for magnesium in strengthening longitudinal interactions within protofilaments or perhaps complexing with the GDP nucleotide at the exchangeable site, although that seems less likely at the concentrations in these experiments.

      The major conclusion of the study is the finding that work output from curling protofilaments is a tunable system. The examples here demonstrate tuning by tubulin composition and by divalent cations. Whether these examples relate to tuning in biological systems will be an important next question and could expand our appreciation for the versatility of depolymerizing microtubules as a motor.

      We fully agree that two very important next questions are whether work output from curling protofilaments is truly harnessed in vivo, and whether protofilament properties in vivo might be actively regulated for this purpose. Based on your recommendations, and as detailed below (under Major point 2), we have expanded our discussion of these possibilities in our revised manuscript.

      Reviewer #3 (Public Review):

      The authors used a previously established optical tweezers-based assay to measure the regulation of the working stroke of curled protofilaments of bovine microtubules by magnesium. To do so, the authors improved the assay by attaching bovine microtubules to trapping beads through an incorporated tagged yeast tubulin.

      The assay is state-of-the-art and provides a direct measurement of the stroke size of protofilaments and its dependence on magnesium.

      The authors have achieved all their goals and the manuscript is well written.

      The reported findings will be of high interest for the cell biology community.

      Thank you for reading and evaluating our manuscript. We are grateful for your positive comments.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors found that the IDR in Cdc15 gets phosphorylated by multiple kinases, Pom1/Shk1/Pck1/Kin1, and the phosphorylation on IDR inhibits the phase separation of the Cdc15 protein. The phosphorylation was demonstrated in the cell as well as in vitro. Moreover, the phosphorylation sites were identified by mass spectrometry. The phospho-regulation of Cdc15 LLPS was demonstrated by in vitro assay using recombinant proteins. The significance of the phosphorylation on contractile actomyosin ring (CAR) was demonstrated by using a cdc15 mutant carrying 31 Ala-substitutions at the phosphorylation sites (cdc15 31A). The CAR assembled comparable to cdc15+, but maturation and contraction of the ring were faster in the cdc15 31A mutant, suggesting the contribution of the phosphorylation for delaying cytokinesis. This could be one of the mechanisms to ensure the completion of chromosome segregation before the cytokinesis. In this paper, the authors showed over-accumulation of type-II myosin regulatory light chain Rlc1 on CAR in the cdc15 31A mutant during the CAR assembly and its contraction. In addition, the kinases for the Cdc15 IDR phosphorylation are identified as polarity kinases, which restrict the assembly of the CAR formation in the middle. Indeed, inhibition of the kinases increases the ratio of septa formation at the cell tip in the mid1 knockout mutant, which lacks a major positive polarity cue during the mitotic phase. However, in this manuscript, this phenotype is not solely explained by the phosphorylation of the cdc15 31A, because the authors did not show the tip septa formation using cdc15 31A.

      Preventing Cdc15 phosphorylation does not on its own promote tip septa formation (Bhattacharjee et al., 2020). The polarity kinases have other substrates in the tip exclusion pathway that presumably also play a key role in septation. Also, cells must also be in the correct part of the cell cycle to form functional CRs and septa. We described the necessary roles of other polarity kinase substrates in our discussion.

      Overall, the data supports their conclusion, Cdc15 forms LLPS, and the process is inhibited by the phosphorylation of amino acid residues in the IDR in Cdc15 by polarity kinases. It is still unclear whether LLPS formation is a reversible process regulated by the protein kinases. In vitro experiments showed condensate formation by dephosphorylation of Cdc15 IDR but not diffusion of the LLPS by phosphorylation. I wonder if incubation of the kinases and the Cdc15 IDR condensates induces demolition of the LLPS.

      This is an interesting idea but technically challenging. The reactions performed in vitro are done by adding phosphatase to induce droplet formation and there is no way to remove the phosphatase. Therefore, addition of kinase will battle the phosphatase and clear results are unlikely. What we do know from work in vivo is that without the ability to rephosphoryate Cdc15 with the Alanine mutants, the protein remains bound to membrane in clusters so it seems clear that it is the phosphostate of Cdc15 that governs this property of the protein.

      The transition of the Cdc15 IDR phosphorylation and LLPS formation through the cell cycle progression is unclear. In asynchronous cells (most of the cells may be in the G2 phase) and nda3 or cps1 mutants, Cdc15 was still highly phosphorylated. This indicates that the Cdc15 is phosphorylated and the LLPS formation is inhibited throughout the cell cycle. The transition of the phosphorylation status for individual residues could be the next challenge for this research.

      The cell cycle changes in Cdc15 phosphostatus and their correlation with localization have been well-documented (e.g. Fankhauser et al., Cell, 1998; Clifford et al., JCB, 2008; Roberts-Galbriath et al., Mol. Cell, 2010). Upon bulk analysis, Cdc15 is never fully dephosphorylated during mitosis but it is not highly phosphorylated in cells blocked in mitosis with nda3 or in cps1 cells when some portion of it is in CRs (please see the references indicated previously). As shown in the simulations, the protein need not be fully phosphorylated or dephosphorylated in order to undergo a conformational change that would allow condensate formation. A major conclusion of our work is that no particular phosphorylation site or sites is important but rather the overall charge on the dimer is important and that some threshold of phosphorylation keeps the protein off from forming clusters on the membrane. We agree with the reviewer that what that threshold is will be of interest in the future.

      In addition, currently, there is no approach to monitor the LLPS in wild-type cells. Therefore, it is still unclear if LLPS formation is the physiological mechanism regulating cell division in wild-type cells.

      We agree that we have not monitored LLPS in live cells. However, Cdc15’s condensate formation in live cells and its phosphorylation state are highly correlated. This suggestive of LLPS in vivo.

    1. Author Response

      Reviewer #2 (Public Review):

      “To describe LLPS or to distinguish between polymer-polymer phase separation and LLPS, recent studies have used single particle tracking, a technique allowing to follow the dynamics of individual proteins in living cells (https://doi.org/10.7554/eLife.60577; https://doi.org/10.7554/eLife.69181; https://doi.org/10.7554/eLife.47098). The authors should mention that such an approach can be a good alternative to avoid the artefact of fixation. Using techniques such as single particle tracking or FCS, it is possible to estimate the effective diffusion coefficient of protein-living cells. When a liquid phase separation is formed, it is also possible to estimate the diffusion coefficient of the protein of interest (POI) inside versus outside of the LLPS.”

      We thank the reviewer for their insight and fully agree that live-cell techniques like SPT and FCS are valuable for investigating LLPS while avoiding fixation artifacts. We have added discussion emphasizing this fact and incorporated the citations recommended by the reviewer in Paragraph 1 on Page 15: “Live imaging techniques that allow estimation of protein diffusion coefficients within specific cellular compartments, e.g., SPT (Hansen et al., 2018 and Heckert et al., 2022) and fluorescence correlation spectroscopy (Lanzanò et al., 2017), can be useful alternative approaches for diagnosing LLPS in vivo without the potential artifact of fixation, as diffusion dynamics are recently shown to be affected by LLPS (Heltberg et al., 2021; McSwiggen et al., 2019a; Miné-Hattab et al., 2021; Chong et al., 2022; and Ladouceur et al., 2020).”

      “The authors say that less dynamic interactions are better captured by PFA fixation. In the simulation part, would it be possible to predict from the diffusion coefficients of the POI inside a condensate the effect of the PFA fixation? […] In the simulation part, they could try to incorporate the diffusion coefficient of the protein of interest and see if it is possible to predict the effect of fixation as a function of the diffusion coefficient.”

      We thank the reviewer for pointing out the absence of this critical piece that connects our experimental observations to our kinetic model. Our model considers association/dissociation rates rather than diffusion coefficients to describe interaction dynamics, but the reviewers’ point is still very insightful and important. As described in Response 2, we compared two proteins: Halo-TAF15(IDR), which is poorly preserved by fixation, and TAF15(IDR)-Halo-FTH1, which is well preserved by fixation. We used SPT to measure the dissociation rates of Halo-TAF15(IDR) and TAF15(IDR)-Halo-FTH1 and showed that the dissociation rate of Halo-TAF15(IDR) from its puncta is much faster than that of TAF15(IDR)-Halo-FTH1, demonstrating more stable homotypic interactions of the latter than the former. The observation that TAF15(IDR)-Halo-FTH1 has less dynamic interactions and is better preserved by fixation compared to Halo-TAF15(IDR) agrees with our model’s prediction that less dynamic interactions are better captured by fixation. Please see Response 2 for more details. Our new data and discussion have been added to the revised manuscript in Paragraph 3 on Page 13 and in Figure 3B, Figure 3E, Figure 6, and Video 2.

      “Finally, the authors propose that in the future, it will be important to design novel fixatives with significantly faster cross-linking rates than biomolecular interactions to eliminate fixation artifacts in the cell. It would be even more interesting if the authors could propose some ideas of potential novel fixatives. Did they test several concentrations of PFA, for example? Did they test different times of PFA incubation? Did they test cryofixation and do they know what would be their effect on LLPS? Do they have novel fixatives in mind? […] To strengthen the manuscript, the authors should try more protocols of fixation.”

      We thank the reviewer for these good questions. As described in Response 1, we have done additional quantification of the change of LLPS appearance in cells upon treatment of 0% PFA (only PBS buffer), 1% PFA, 2% PFA, and 8% PFA as well as 4% PFA supplemented by 0.2% GA. We saw statistically significant changes in the LLPS-describing parameters upon all the PFA and PFA/GA treatments except the 0% PFA control. To examine how fixation artifacts depend on the time of PFA incubation, we acquired a time-lapse movie of a cell overexpressing EGFP-FUS(IDR) immediately after 4% PFA treatment and quantified the number of puncta over time (Video 1). We showed that fixation is complete (the number of puncta becomes constant) by roughly 100 seconds (Figure 1 – figure supplement 2). Our new data also justified our choice of a 10-minute PFA incubation time for analyzing fixation-induced change of LLPS appearance in the rest of the paper. Please see Response 1 for more details. Our new data and discussion have been added to the revised manuscript in Paragraph 3 on Page 3 and in Figure 1 - figure supplement 2 (time dependence of fixation artifacts), Figure 1 - figure supplement 3 (fixation artifact at various PFA concentrations), and Figure 1 - figure supplement 4 (fixation artifact upon treatment of 4% PFA supplemented with 0.2% GA).

      We agree that testing more cell fixation protocols such as cryofixation on LLPS appearance would be interesting. However, given the complexity of novel fixation protocols like cryofixation and highly specialized equipment and reagents they require, testing widely how different fixation methods might change LLPS appearance would be a tremendous amount of work that is enough to fill a separate paper. These experiments would be much more appropriate for a separate study in the future.

      Reviewer #3 (Public Review):

      “Understanding whether/how fixation methods affect the detection of biomolecular condensates is of broad interest given the importance of LLPS in regulating different aspects of cell biology. However, in this manuscript, the authors use only paraformaldehyde as a fixation method and study only fluorescently-labelled IDR proteins. The work would benefit from a comparison between living cells and cells fixed with other fixation methods.”

      We appreciate the reviewer for this suggestion and agree that more fixation protocols should be investigated. As described in Response 1 and Response 18, besides examining PFA fixation, we have quantified how fixation using 4% PFA supplemented by 0.2% GA changes LLPS appearance in cells. We saw statistically significant changes in all the LLPS-describing parameters upon PFA/GA treatments. Please see Response 1 and Response 18 for details. Our new data and discussion have been added to the revised manuscript in Paragraph 3 on Page 3 and in Figure 1 - figure supplement 4.

      “In addition, it would be useful to test the impact of these fixation methods on the detection of endogenous proteins or IDR proteins without fluorescent tag.”

      We appreciate the reviewer for this suggestion and have now investigated an endogenous IDR-containing protein in the revised manuscript. Specifically, we quantified the effect of 4% PFA fixation on endogenously expressed EWS::FLI1 in an Ewing sarcoma cell line A673, which is an oncogenic fusion transcription factor that causes Ewing sarcoma (Grünewald et al., 2018) and known to form local, high-concentration hubs at target genes associated with GGAA microsatellites (Chong et al., 2018). We previously Halo-tagged endogenous EWS::FLI1 in A673 cells using CRISPR/Cas9-mediated genome editing (Chong et al., 2018). Here, we quantified the effect of PFA fixation on endogenous EWS::FLI1 puncta in this knock-in cell line and found no significant difference in the distribution of EWS::FLI1 upon fixation. This result suggests that PFA fixation does not change the intracellular distribution of all proteins. Our new data and discussion have been added to the revised manuscript in Paragraph 1 on Page 8 and in Figure 3C.

      Unfortunately, testing fixation artifacts of IDR-containing proteins without a fluorescent tag has been infeasible as we rely on fluorescence from a tag on the protein of interest to quantitatively compare LLPS appearance in live and fixed cells. Although we have considered using non-fluorescent methods, e.g., phase contrast microscopy, to visualize putative LLPS in cells, its lack of specificity in imaging proteins or cellular structures makes the type of quantification we do for fixation artifact characterization inaccessible.

    1. Author Response

      Reviewer #1 (Public Review):

      1 - Problems with the analysis of stimulation latency

      The data in this paper show a variable latency in signal propagation from stimulation sites to hippocampal recording electrodes. In an attempt to measure this latency, the authors examine the theta phase offset between each pair of stimulation and recording electrodes (Figure 9). They interpret their results as showing a consistent 90-degree phase offset. However, their data do not support this interpretation because in fact their measurements show a bimodal distribution of phase differences with peaks at 0 and 180 degrees. It is not valid to interpret the circular mean of a bimodal distribution because the result is not well defined. Further, individual electrodes do not show a mean difference of 90 degrees.

      Because the results do not reliably support the claim of a consistent 90 phase difference between the hippocampus and cortex, it is a substantial problem for the paper, given the importance of hippocampal-cortical timing in their interpretation. In particular, the authors should reconsider how they frame their results in relation to the Siegle and Wilson work and others.

      We no longer emphasize the phase difference between hippocampus and neocortex in the revised manuscript. This phase difference was computed to attempt to address the possibility that there was some latency in the propagation of stimulation effects from lateral temporal cortex to hippocampus, which would affect our interpretation of which theta phase angles evoked minimal versus maximal hippocampal response (i.e., “peak” stimulation trials may actually have involved stimulation propagating to hippocampus sometime after its peak). However, as noted above in response to Essential Revisions #1, we cannot fully rule out the possibility that volume conduction influenced our estimates of phase lag. We no longer emphasize this analysis and have moved it to the appendix (Appendix 1-Figure 4), along with a new analysis using bipolar rereferencing to address the volume conduction issue.

      The manuscript is now focused on the main finding of the experiment, of a 180-degree separation between theta phases associated with minimal versus maximal evoked responses. We analyzed this via circular-linear models of phase versus evoked amplitude, as suggested by the reviewers, rather than the phase-binning analyses emphasized in the original manuscript. Circular-linear analyses are indifferent to the specific phase values associated with minimal/maximal response. We have also expanded our Introduction with further discussion of homologies to the rodent literature, including to the Siegle and Wilson paper. Our revised Discussion section emphasizes that the central homology is that there is 180-degree separation between hippocampal theta phase angles associated with minimal versus maximal responsiveness to input, with less emphasis placed on the specific angles (i.e., peak versus trough), given difficulties in comparing specific phase angles across species and recording approaches.

      2 - Problems with the figures

      Some figures in the paper were hard to interpret and I felt it would benefit readers for many to be combined. The results from Figures 3 through 7 would be helpful to see side by side, as they show various investigations of the same data. In Figure 4, it would be helpful to see both plots from (a) on the same axis, as is in (b). I did not find that the accuracy estimation paper in Figure 2 was important to include in the main paper. It would be better suited for the supplement, in my view, unless I am missing something.

      We have substantially revised the figures for clarity. The analyses presented in original Figures 2, 6, and 9 have been moved to the appendix (as revised Appendix Figures 1, 3, and 4). Figure 3 has been combined with Figure 1 into the revised Figure 1. Figures 4 and 7 have been combined in order to show EP data from all four phase bins side-by-side (Figure 3). We did not combine a) and b) from the original figure onto the same axis, as we found it difficult to interpret the four overlaid traces (i.e., 2 EP traces and 2 phase-matched stimulation-free traces). However, these data are now shown side-by-side and on equal axes. We have updated all EP visualizations to improve readability. Figure 5 has been expanded to include component amplitudes comparisons for both peak versus trough and rising versus falling phases, in keeping with the expanded Figure 3.

    1. Author Response

      Reviewer #2 (Public Review):

      This clinical trial is conducted to pursue short course DAA therapy. For an ultra-short course to work, it has to be simple, equally efficacious to established treatments, and requires no additional workup (like genotyping, IL28B, HCV VL determination, etc after initiation of therapy as shown in Liu et al.). This is because our aim is to simplify therapy to treat most people, especially those who are not engaged in care. This work struggles to achieve these goals, as the to the SVR for short-course therapy is unacceptably low. The authors' conclusion that treat short first and then you can treat those who fail again does not appear to achieve these goals, as realistically,it is difficult to re-engage marginalized population from an elimination perspective. The ideal is to treat them in one attempt.

      We would like to clarify that we do not propose treating with 4 weeks and then retreating, because we acknowledge an unacceptable first line cure rate with this approach. We suggest 8 weeks may achieve cure rate of greater than 90% in mild liver disease (18/18 participants with slow virological response were cured with 8 weeks SOF/DCV in this study). Since retreatment with the same drug combination is effective, there is arguably less jeopardy in a regimen with 90% cure rate than previously perceived.

      Reviewer #3 (Public Review):

      This prospective study evaluated the utility of D2 VL determination for response-guided ultra-short (4w) sofosbuvir + daclatasvir treatment of chronic HCV patients (with mild disease) with G1+6. Shortening therapy duration reduces DAA use with a cure rate of 75% overall upon first-line treatment and 100% among retreated patients. In contrast to a previous report in G1b patients that showed a 100% success rate with D2-based 3-week triple therapy, the present study fails to show a good enough yield for a 4w sofosbuvir + daclatasvir regimen among G1+6 patients. Given the small number of patients, additional studies should determine whether a different time point and/or a different viral threshold could be more appropriate indicators to allow a 4-week duration of dual therapy (without a protease inhibitor).

      Strengths:

      A) An important study that is a nice addition to previous reports evaluating the utility of response-guided therapy for shortening the duration of HCV treatment. Given the disease burden and the high costs of treatment, especially in low-income countries, this is a major goal that was also advocated by the WHO.

      B) This study investigates an ultra-short protease-inhibitor-free regimen and therefore complements a previous (positive) RGT study of a 3-week triple regimen.

      C) This study is prospective with careful analyses of ample data, including the evaluation of RAS by gene sequencing. The follow-up was long enough and analyses of viral kinetics were performed. In addition, a detailed analysis of re-treatment outcomes and viral mutations in this population was performed

      D) Although the main objective (shortening therapy to 4 weeks) was not adequately achieved (<90% success rate), the study's results may suggest that re-treatment in case of failure is safe and efficient, although further studies with a higher number of patients are needed for confirmation.

      Limitations:

      A) Relatively small study cohort. Overall, only 34 patients were treated with a 4-week regimen. However, given the results, it seems that this number of patients who achieved only a 75% cure rate, is enough to exclude the use of a D2 RGUT, at least in G1+6 patients treated with sofosbuvir + daclatasvir. On the other hand, even 100% of success rate on 8-week treatment among 17 patients is not really enough to draw firm conclusions on the adequacy of this short regimen among this group of patients. A higher number of patients could better validate this positive result.

      Addressed in discussion. Firstly, it was powered to determine overall cure rate with 4- and 8- weeks treatment, rather than outcomes with each duration. It is possible that we would have seen patients failing 8 weeks therapy with a larger sample, and our cure estimates may therefore be imprecise.

      B) The values chosen for the RGT are arbitrary. The relatively small number of patients could not allow for a more detailed analysis of more appropriate time points and/or viral load thresholds to determine the adequacy of a 4-week of therapy in individual patients. The D2 500IU/ML threshold is based on a small previous phase 2 study on G1b patients treated with a triple-drug regimen, which does not necessarily imply dual therapy (w/o a protease inhibitor) involving patients with a different subtype of the virus. In this context, a control group treated with triple combination therapy (with a protease inhibitor) could be very helpful to the study.

      This was a mechanistic pilot study conducted in Vietnam, where antiviral options are limited. We therefore made a conscious decision to use licensed/available treatments (SOF/DCV) rather than Lau combination which is not WHO-approved.

      C) Is there a particular pattern of viral kinetics to 4w cured patients Vs. failures? Fig 1 (Appendix 1) only shows the means of viral load and the general kinetics for the whole population, but individual plots of viral kinetics are not presented although could potentially be useful. Also, according to the presented data, day 7 VL<LLOQ may be a better indicator for shortening treatment to 4w. A detailed graphical presentation of viral kinetics in these patients could be helpful.

      We have added appendix 1- figure 2 showing HCV RNA kinetics in participants treated with 4 weeks SOF/DCV, with cures (red lines) distinguished from treatment failures. In results section we comment on this that Even though the numbers are small, this helps illustrate that early on-treatment response alone may be of limited value in determining cure with ultra-short therapy.

      D) According to Table 3, no significant differences in the host or viral factors were detected between cured or failures of the 4w regimen. However, the low number of patients makes it very difficult to interpret these data and might miss potential differences between these two groups of patients, emphasizing again the difficulty in drawing firm conclusions from this study. In this context, I wonder whether a regression analysis would better define either viral (subtype, RAS) or host factors that are implicated in a 4w duration success.

      See above.

    1. Author Response

      Reviewer #1 (Public Review):

      Auwerx et al. have taken a new approach to mine large existing datasets of intermediary molecular data between GWAS and phenotype, with the aim of uncovering novel insight into the molecular mechanisms which lead a GWAS hit to have a phenotypic effect. The authors show that you can get additional insight by integrating multiple omics layers rather than analyzing only a single molecular type, including a handful of specific examples, e.g. that the effect of SNPs in ANKH on calcium are mediated by citrate. Such additional data is necessary because, as the authors' point out, while we have thousands of SNPs with significant impact on phenotypes of interest, we often don't know at all the mechanism, given that the majority of significant SNPs found through GWAS are in non-coding (and often intergenic) regions.

      This paper shows how one can mine large existing datasets to better estimate the cellular mechanism of significant, causal SNPs, and the authors have proven that by providing insight into the links between a couple of genes (e.g. FADS2, TMEM258) and metabolite QTLs and consequent phenotypes. There is definitely a need and utility for this, given how few significant SNPs (and even fewer recently-discovered ones) hit parts of the DNA where the causal mechanism is immediately obvious and easily testable through traditional molecular approaches.

      I find the paper interesting and it provides useful insight into a still relatively new approach. However, I would be interested in knowing how well this approach scales to the general genetics community: would this method work with a much smaller N (e.g. n = 500)? Being able to make new insights using cohorts of nearly 10,000 patients is great, but the vast majority of molecular studies are at least an order of magnitude smaller. While sequencing and mass spectrometry are becoming exponentially cheaper, the issue of sample size is likely to remain for the foreseeable future due to the challenges and expenses of the initial sample collection.

      We thank the reviewer for his assessment and have now addressed – in the revised version of the manuscript, as well as in the below point-by-point reply – his specific comments/questions.

      Reviewer #2 (Public Review):

      Auwerx et al. present a framework for the integration of results from expression quantitative trait loci (eQTL), metabolite QTL (mQTL) and genome-wide association (GWA) studies based on the use of summary statistics and Mendelian Randomization (MR). The aim of their study is to provide the field with a method that allows for the detection of causal relationships between transcript levels and phenotypes by integrating information about the effect of transcripts on metabolites and the downstream effect of these metabolites on phenotypes reported by GWA studies. The method requires the mapping of identical SNPs in disconnected mQTL and eQTL studies, which allows MRbased inference of a causal effect from a transcript to a metabolite. The effect of both transcripts and metabolites on phenotypes is evaluated in the same MR-based manner by overlaying eQTL and mQTL SNPs with SNPs present in phenotypic GWA studies.

      The aim of the presented approach is two-fold: (1) to allow identification of additional causal relationships between transcript levels and phenotypes as compared to an approach limited to the evaluation of transcript-to-phenotype associations (transcriptome-wide MR, TWMR) and (2) to provide information about the mechanism of effects originating from causally linked transcripts via the metabolite layer to a phenotype.

      The study is presented in a very clear and concise way. In the part based on empirical study results, the approach leads to the identification of a set of potential causal triplets between transcripts, metabolites and phenotypes. Several examples of such causal links are presented, which are in agreement with literature but also contain testable hypotheses about novel functional relationships. The simulation study is well documented and addresses an important question pertaining to the approach taken: Does the integration of mQTL data at the level of a mediator allow for higher power to detect causal transcript to phenotype associations?

      We thank the reviewer for his/her assessment and have now addressed – in the revised version of the manuscript, as well as in the below point-by-point reply – his/her specific comments/questions.

      Major Concerns

      1) Our most salient concern regarding the presented approach is the presence of multiple testing problems. In the analysis of empirical datasets (p. 4), the rational for setting FDR thresholds is not clearly stated. While this appears to be a Bonferroni-type correction (p-value threshold divided by number of transcripts or metabolites tested), the thresholds do not reflect the actual number of tests performed (7883 transcripts times 453 metabolites for transcript-metabolite associations, 87 metabolites or 10435 transcripts times 28 complex phenotypes). The correct and more stringent thresholds certainly decrease the overlap between causal relationships and thus reduce the identifiable number of causal triplets. Furthermore, we believe that multiple testing has to be considered for correct interpretation of the power analysis. The study compares the power of a TWMR-only approach to the power of mediation-based MR by comparing "power(TP)" against "power(TM) * power(MP)" (p. 12). This comparison is useful in a hypothetical situation given data on a single transcript affecting a single phenotype, and with potential mediation via a single metabolite. However, in an actual empirical situation, the number of non-causal transcript-metabolite-phenotype triplets will exceed the number of non-causal transcript-phenotype associations due to the multiplication with the number of metabolites that have to be evaluated. This creates a tremendous burden of multiple testing, which will very likely outweigh the increase in power afforded by the mediation-based approach in the hypothetical "single transcript-metabolite-phenotype" situation described here. Thus, for explorative detection of causal transcript-phenotype relationships, the TWMR-only method might even outperform the mediation-based method described by the authors, simply because the former requires a smaller number of hypotheses to be tested compared to the latter. The presented simulation would only hold in cases where a single path of causality with a known potential mediator is to be tested.

      We thank the reviewer for pointing out the multiple testing issue. Based on this comment, we have revised our approach by mainly implementing two major modifications to our approach.

      First, we reduce the number of assessed metabolites to 242 compounds for which we were able to identify a Human Metabolome Database (HMDB) identifier through manual curation. This was triggered by the suggestion of reviewer #1 to facilitate the database/literature-based follow-up of our discoveries. The motivation is to only test metabolites that if found to be significantly associated would yield interpretable results, thereby reducing the number of tests to be performed. This modification is described in the revised manuscript:

      Results: “Summary statistics for cis-eQTLs stem from the eQTLGen Consortium metaanalysis of 19,942 transcripts in 31,684 individuals [3], while summary statistics for mQTLs originate from a meta-analysis of 453 metabolites in 7,824 individuals from two independent European cohorts: TwinsUK (N = 6,056) and KORA (N = 1,768) [6]. After selecting SNPs included in both the eQTL and mQTL studies, our analysis was restricted to 7,884 transcripts with ≥ 3 instrumental variables (IVs) (see Methods, Supplemental Figure 1) and 242 metabolites with an identifier in The Human Metabolome Database (HMDB) [28] (see Methods, Supplemental Table 1).”

      Methods: “mQTL data originate from Shin et al. [6], which used ultra-high performance liquid chromatography-tandem mass spectrometry (UPLC-MS/MS) to measure 486 whole blood metabolites in 7,824 European individuals. Association analyses were carried out on ~2.1 million SNPs and are available for 453 metabolites at the Metabolomics GWAS Server (http://metabolomics.helmholtz-muenchen.de/gwas/). Among these metabolites, 242 were manually annotated with Human Metabolome Database (HMDB) identifiers (Supplemental Table 1) and used in this study.”

      Second, to account for all remaining tests, we now select significant causal effects based on FDR < 5% in all performed univariable MR analyses. With 5% FDR on both the transcript-to-metabolite and metabolite-to-phenotype effects, the FDR for triplets is slightly inflated to 9.75% (= 1-0.952), a consideration that we now explicitly describe. Note that selecting triplets based on transcript-tometabolite and metabolite-to-phenotype effects FDR < 2.5%, result in a FDR < 5% (1-0.9752) for the triplets. This more stringent threshold identifies 135 causal triplets, 39 of which would be missed by TWMR. Overall, Results and Supplemental Tables have been updated and now read as follow:

      “Mapping the transcriptome onto the metabolome […] By testing each gene for association with the 242 metabolites, we detected 96 genes whose transcript levels causally impacted 75 metabolites, resulting in 133 unique transcriptmetabolite associations (FDR 5% considering all 1,907,690 instrumentable gene-metabolite pairs Supplemental Table 2) […].

      Mapping the metabolome onto complex phenotypes […] Overall, 34 metabolites were associated with at least one phenotype (FDR 5% considering all 1,344 metabolite-phenotype pairs), resulting in 132 unique metabolitephenotype associations (Supplemental Table 4).

      Mapping the transcriptome onto complex phenotypes […] In total, 5,140 transcripts associated with at least one phenotype (FDR 5% considering all 292,170 gene-phenotype pairs) resulting in 13,141 unique transcript-phenotype associations (Supplemental Table 5).

      Mapping metabolome-mediated effects of the transcriptome onto complex phenotypes […] We combined the 133 transcript-metabolite (FDR ≤ 5%) and 132 metabolite-trait (FDR ≤ 5%) associations to pinpoint 216 transcript-metabolite-phenotype causal triplets (FDR = 1-0.952 = 9.75%) (Supplemental Table 6).”

      In the simulations performed for the power analysis, we used a Bonferroni correction. We ran each simulation for 500 transcripts, measuring 80 metabolites at each run and performed TWMR and MWMR. The power of TWMR was calculated by counting how many times we obtain p-values ≤ 0.05/500. The power of the mediation analysis was calculated as 𝑝𝑜𝑤𝑒𝑟"$ ∗ 𝑝𝑜𝑤𝑒𝑟$#, where 𝑝𝑜𝑤𝑒𝑟"$ was calculated by counting how many times we obtain p-values ≤ 0.05/(500*80), and 𝑝𝑜𝑤𝑒𝑟$# was calculated by counting how many times we obtain p-values ≤ 0.05/80. In the revised manuscript, we additionally repeated each simulated scenario 10 times to increase robustness of results. This has been clarified in both the Methods and Results sections of the revised manuscript:

      Methods: “Ranging 𝜌 and 𝜎 from -2 to 2 and from 0.1 and 10, respectively, we run each simulation for 500 transcripts measuring 80 metabolites at each run and performed TWMR and MWMR starting from above-described 𝛽7<"=, 𝛽4<"= and 𝛽>?,(. For each MR analysis we calculated the power to detect a significant association as well as the difference in power between TWMR and the mediation analyses (i.e., 𝑝𝑜𝑤𝑒𝑟"# − 𝑝𝑜𝑤𝑒𝑟"$ ∗ 𝑝𝑜𝑤𝑒𝑟$#). Each specific scenario was repeated 10 times and the average difference in power across simulation was plotted as a heatmap.”

      Results: “To characterize the parameter regime where the power to detect indirect effects is larger than it is for total effects, we performed simulations using different settings for the mediated effect. In each scenario we evaluated 500 transcripts and 80 metabolites and varied two parameters characterizing the mediation: a. the proportion (𝜌) of direct (𝛼!) to total (𝛼"#) effect (i.e., effect not mediated by the metabolite) from -2 to 2 to cover the cases where direct and mediated effect have opposite directions (51 values); b. the ratio (𝜎) between the transcript-to-metabolite (𝛼"$) and the metabolite-to-phenotype (𝛼$#) effects, exploring the range from 0.1 to 10 (51 values).<br /> Transcripts were simulated with 6% heritability (i.e., median ℎ@ in the eQTLGen data) and a causal effect of 0.035 (i.e., ~65% of power in TWMR at a = 0.05) on a phenotype. Each scenario was simulated 10 times and results were averaged to assess the mean difference in power (see Methods).”

      2) A second concern regards the interpretation of the results based on the empirical datasets. For the identified 206 transcript-metabolite-phenotype causal triplets, the authors show a comparison between TWMR-based total effect of transcripts on phenotypes and the calculated direct effect based on a multivariable MR (MVMR) test (Figure 2B), which corrects for the indirect effect mediated by the metabolite in the causal triplet. The comparison shows a strong correlation between direct and total effect. A thorough discussion of the potential reasons for deviation (in both negative and positive directions) from the identity line is missing.

      Deviation from the identity line, as observed in Figure 2B, indicates that while there is a strong correlation between direct and total effect, it is not perfect, and part of the total effect is due to an indirect effect mediated by metabolites. This is explained and discussed in the Results and Discussion section:

      Results: “Regressing direct effects (𝛼!) on total effects (𝛼"#) on (Figure 2A), we estimated that for our 216 mediated associations, 77% [95% CI: 70%-85%] of the transcript effect on the phenotype was direct and thus not mediated by the metabolites (Figure 2B).”

      Discussion: “The observation that 77% of the transcript’s effect on the phenotype is not mediated by metabolites suggests that either true direct effects are frequent or that other unassessed metabolites or molecular layers (e.g., proteins, post-translational modifications, etc.) play a crucial role in such mediation. It is to note that in the presence of unmeasured mediators or measured mediators without genetic instruments, our mediation estimates are lower bounds of the total existing mediation. […] Thanks to the flexibility of the proposed framework, we expect that in the future and upon availability of ever larger and more diverse datasets, our method could be applied to estimate the relative contribution of currently unassessed mediators in translating genotypic cascades.”

      Furthermore, no test of significance for potential cases of mediation is presented. Due to the issues of multiple testing discussed above, the significance of the inferred cases of mediation is drawn into question. The examples presented for causal triplets (involving the ANKH and SLC6A12 transcripts) feature transcripts with low total effects and a small ratio between direct and total effect, in line with the power analysis. However, in these examples, the total effects are also quite low. Its significance has to be tested with an appropriate statistical test, incorporating multiple testing correction.

      Following the reviewer’s suggestion, we have modified our criteria to call significant associations to account for multiple testing (see extensive reply to major concern #1). With 5% FDR on both the transcript-to-metabolite and metabolite-to-phenotype effects, the FDR for triplets is slightly inflated to 9.75% (= 1-0.952). We mention this limitation in the revised manuscript:

      “We combined the 133 transcript-metabolite (FDR ≤ 5%) and 132 metabolite-trait (FDR ≤ 5%) associations to pinpoint 216 transcript-metabolite-phenotype causal triplets (FDR = 1-0.952 = 9.75%) (Supplemental Table 6).”

      All examples presented in the original manuscript remained significant. The fact that the total effect in these examples is low makes them particularly interesting as it highlights how our approach can detect biologically plausible associations between a transcript and a phenotype that only show mild evidence through TWMR but are strongly supported when accounting for metabolites that mediate the transcript-phenotype relation, showcasing situations in which our method can provide a true advantage over classical approaches such as TWMR. Such examples may emerge due to opposite signed direct and indirect effects, which cancel each other out when it comes to testing total effects. What is key that we do not claim the total and the mediated effects to be different (as we would have very limited power to do so), but simply point out that under certain settings we are better powered to detect mediated effects than total ones. In the ANKH example (more details below), the total ANKH-calcium effect is almost exactly the same as the product of the 𝛼,-.%→056157 and 𝛼056157→0120*34 effects, simply the latter ones are detectable, while the total effect is not.

      In the revised manuscript the case for our selected examples is made even stronger thanks to an analysis proposed by Reviewer #1 that aimed at estimating the proportion of previously reported associations through automated literature review. For instance, while our literature review found previously reported evidence of the ANKH-calcium link and of the ANKH-citrate link, we did not identify any publication mentioning all 3 terms in combination in the abstract and/or title, illustrating how our approach can establish bridges between knowledge gaps. We revised the Results section describing the ANKH example accordingly:

      “The 126 triplets that were not identified through TWMR due to power issues represent putative new causal relations. This is well illustrated by a proof-of concept example involving ANKH [MIM: 605145] and calcium levels, for which 48 publications were identified through automated literature review (Supplemental Table 6). While the TWMR effect of ANKH expression on calcium levels was not significant (𝛼,-.%→012034 = −0.02; 𝑃 = 0.03), we observed that ANKH expression decreased citrate levels (𝛼,-.%→056157 = −0.30; 𝑃 = 2.2 × 1089:), which itself increased serum calcium levels (𝛼056157→012034 = 0.07; 𝑃 = 6.5 × 108;9). Mutations in ANKH have been associated with several rare mineralization disorders [MIM: 123000, 118600] [32] due to the gene encoding a transmembrane protein that channels inorganic pyrophosphate to the extracellular matrix, where at low concentrations it inhibits mineralization [33]. Recently, a study proposed that ANKH instead exports ATP to the extracellular space (which is then rapidly converted to inorganic pyrophosphate), along with citrate [34]. Citrate has a high binding affinity for calcium and influences its bioavailability by complexing calcium-phosphate during extracellular matrix mineralization and releasing calcium during bone resorption [35]. Together, our data support the role of ANKH in calcium homeostasis through regulation of citrate levels, connecting previously established independent links into a causal triad.”

      Furthermore, the analysis of the empirical data indicates that the ratio between direct and indirect effect of a transcript on a phenotype is in most cases close to identity, except for triplets with low total effects. This fact should be considered in the power analysis, which assigned the highest gain in power by the mediation analysis to cases of low direct to total effect ratio. The empirical data indicate that these cases might be rare or of minor relevance for the tested phenotypes.

      As our previous power analyses did not fully reflect scenarios observed from empirical data, we extended the range of covered 𝜌 (i.e., the ratio between direct and total effect), so that it mimics more closely the observed range of 𝜌. In the revised manuscript, 𝜌 varies from -2 to 2, so that we also consider configurations where direct and total effects have opposite direction. To provide the readers with a rough idea how frequent the different parameter combinations occur in real data, we now provide another heatmap indicating the density of detected associations in those parameter regimes as Supplemental Figure 4.

      This map can be brought in perspective of Figure 4A that illustrates the power of TWMR vs. mediation analysis over the same range of parameter settings.

      It becomes apparent from Supplemental Figure 4 that in real data, 𝜎 is always larger than 1 and often exceeds 10. Note, however, that this heatmap must be interpreted with care, since the “detected” density will be low in regions where both methods have low power.

      3) Related to the interpretation of causal links: horizontal pleiotropy needs to be considered. The authors report the identification of causal links between TMEM258, FADS1 and FADS2, arachidonic acid-derived lipids and complex phenotypes. However, they also mention the high degree of pleiotropy due to linkage disequilibrium at the underlying eQTL and mQTL region as well as the network of over 50 complex lipids known to be associated with the expression of the above transcripts. Thus, it seems possible that the levels of undetected lipid species may be more important for the phenotypic effect of variation in these transcripts and that the reported "mediators" are rather covariates. Such horizontal pleiotropy would violate a basic assumption of the MR approach. While we think that this does not invalidate the approach altogether, it does affect the interpretation of specific metabolites as mediators. This is aggravated by the fact that metabolic networks are more tightly interconnected than macromolecular interaction networks (assortative nature of metabolic networks) and that single point-measurements of metabolites may not be generally informative about the flux through a specific metabolic pathway.

      This is a valid point and we discuss this limitation in the revised Discussion:

      “It is to note that in the presence of unmeasured mediators or measured mediators without genetic instruments, our mediation estimates are lower bounds of the total existing mediation. In addition, unmeasured mediators sharing genetic instruments with the measured ones, can modify result interpretation as some of the observed mediators may simply be correlates of the true underlying mediators. While this is a limitation of all MR methods, metabolic networks may harbor particularly large number of genetically correlated metabolite species.”

    1. Author Response

      Reviewer #2 (Public Review):

      This paper presents novel evidence for the successor representation in the hippocampus and V1 for temporally structured visual sequences. Participants learned sequences of 4 items shown in specific locations (A-B-C-D) on the screen. On a subset of trials, participants were only shown one of the four items, which enabled the authors to test whether the remaining three items were reactivated equivalently, or whether the upcoming items were activated in a temporally graded predictive fashion, consistent with the successor representation. The data suggest the latter interpretation, which was observed in both the hippocampus and V1.

      The approach is well-motivated, and the hypotheses are laid out clearly. The manuscript is very clear and streamlined. The design adopted by the authors, which allowed them to disentangle spatial vs. temporal proximity, is clever and provides an interesting approach to the SR framework. The figures are also very clear and nicely designed. I just have a few comments which I hope the authors can address.

      We thank the reviewer for the positive evaluation.

      1) My main question is related to the difference between the analytic approach to V1 vs. hippocampal representations. In Fig. 3, the authors present evidence of a compelling gradation in V1 representations. However, the corresponding hippocampal results in Fig. 5 are collapsed across all predecessor vs. successor representations.

      I initially thought that the same approach could not be taken in the hippocampus (-3/-2/-1 vs. 1/2/3) due to the coarser representation of space - is that the correct interpretation? However, on p. 9 the authors state that they successfully trained a hippocampal classifier based on spatial locations, so I was unsure why the same approach would not be possible. It would be helpful if the authors could add a sentence clearly explaining why the plots and analyses are not parallel across V1 and the hippocampus.

      We appreciate the reviewer bringing up this point. The reviewer is correct, that in principle the same approach could be applied to both V1 and hippocampus. We have now added our motivation for collapsing the data for hippocampus and also appended the non-averaged hippocampus results as a Supplementary Figure. Below we copy our response to Reviewer #1 from above, who brought up a similar point.

      Given the significant, but very low classification accuracy in within the localizer (accuracy = 15% 3.6%, mean ± s.d.; p = 0.002), we had previously decided to only report averaged location results for the hippocampus as the non-averaged predictions would be very noisy. To put the hippocampus classification accuracy into context, in V1 cross-validated accuracy within the localizer was 92% ± 12%, mean ± s.d.).

      We now stressed this difference between V1 and hippocampus decoding in the Results section and motivate our reason for presenting averaged results:

      "Within localizer decoding accuracy results confirmed that hippocampus has a coarse representation of the eight stimulus locations (Figure 5B) within the localizer (one-sample t-test; t(34) = 3.28, p = 0.002; cross-validated accuracy = 15%  3.6%, mean  s.d.; see Materials and Methods). Notably, compared to V1 (cf. Figure 2A), within localizer accuracy was relatively low and as a consequence tuning curves in hippocampus appeared less sharp (Figure 5C). In order to maximize sensitivity for the hippocampus, we averaged classification evidence across successor and predecessor locations. Non-averaged results can be found in Supplementary Figure 1A."

      Further, we followed the reviewer’s suggestion and added a new supplementary Figure including the non-averaged results for hippocampus. The new Figure also includes the model comparison the reviewers had asked for. The new Supplementary Figure 1 is included here for convenience:

      2) The analysis disentangling temporal vs. spatial proximity in the localizer data (Fig. 6) is interesting, particularly the persistent gradation in hippocampal responses vs. their absence in V1. However, could the same/similar temporal vs. spatial model not be applied in the full vs. partial sequences as well, as one of the alternative models shown in Fig. 4? The CO model in Fig. 4B assumes a flat reactivation of all other items in the sequence, but it might be that the two items closer in terms of Euclidean distance are represented differently than the far item. After reading the detailed methods, I wonder if this was not possible because the second presented item was always the furthest from the start (180 degrees), but it would be helpful if the authors could clarify this.

      The reviewer is correct that the fact that the sequence order and spatial distance were not fully decorrelated (second presentation was always farthest away from starting dot, third and fourth dot always the same distance from start) prevents us from quantifying the interaction of the SR and CO model with a spatial model during the main task.

      We added the following to the Method section to clarify this:

      "Note that because within each dot sequence, temporal order and spatial distance were not perfectly decorrelated (e.g. the second sequence dot was always farthest apart from the starting dot), it is not possible to estimate the combined influence of the SR model and the spatial coactivation model on the observed BOLD activity."

      Having said that, we believe that there is little concern that the reported reactivations of the main task are driven by the Euclidean distance in a meaningful way for two reasons:

      (1) detailed analysis of the localizer data showed that there is no spatial spreading from one dot location to the other sequence locations (Figure 6). This is likely because the relevant dot locations were sufficiently spaced apart (at least 5.36 degrees of visual angle), whereas population receptive field sizes in V1 are well below 2 degrees (Dumoulin & Wandell, 2008). Given the lack of spreading during the localizer, where the dot was flashed for 13.5s, makes the presence of spreading during the main task, where the dot was flashed for only 100ms, equally unlikely.

      (2) the presence of spatial spreading would actually obfuscate the reported SR-like pattern and could not have caused it. Specifically, because the second sequence dot was always farthest apart from the start, this is where one would assume the least amount of activity spread (greatest Euclidean distance). Sequence dots three and four should be more active given that they are both closer to the starting point in terms of Euclidean distance. Our reported results are the opposite of that pattern, ruling out the possibility that these were caused by spatial spreading.

      3) As the authors state on p. 12, the present study did not require any long-term prospective planning. However, the participants' task during the full sequences was closely linked to their predictions about the temporal structure of the four stimuli. It would be useful to see whether the participants who were more closely 'locked' to the sequence and accurate at this temporal detection task also showed stronger SR representations (as these rely on temporal distance).

      This would also provide a useful test of the timescale at which successor representations are behaviorally relevant. In several prior studies, the timescales were quite long, so it would be important to test how strongly SR representations at these timescales relate to behavior.

      We thank the reviewer for this suggestion. In order to relate SR representations to behavior, we first calculated individual BOLD differences for successor vs predecessor locations to get an estimate for how much participant’s predictions were skewed toward future locations. One might argue, that participants with stronger predictions toward future locations would perform better at the behavioral task. We then correlated these values with behavioral accuracy across subjects. No significant correlation was found (r = 0.05; p = 0.769). The lack of significant correlation might not be surprising, given that our design is likely underpowered for such a between-subject correlation analysis. Additionally, there was no behavioral response in the prediction trials, that could be directly related to participants’ BOLD activity. Instead the behavioral response is taken from the full sequence trials.

      These new results were added to the results section:

      "One might argue that participants with stronger predictions toward future locations would perform better at the behavioral detection task. However, no such correlation between individual V1 BOLD activity and task accuracy was found in an across subject correlation analysis (see Materials and Methods, spearman r = 0.05; p = 0.769)."

      And described in the methods:

      "Correlation with behavior. In order to relate SR representations to behavior, we first calculated individual V1 BOLD differences for all successor vs all predecessor locations to get an estimate for how much participant’s predictions were skewed toward future locations. We then correlated these values with behavioral accuracy across subjects using spearman correlation."

    1. Author Response

      Reviewer #2 (Public Review):

      In the manuscript, Mijnheer et al mainly exploited CyTOF Helios mass cytometer and TCRβ repertoire sequencing to investigate the T cell composition and distribution in peripheral blood and synovial fluid, and further explored the temporal and spatial dynamics of regulatory T cells (Tregs) and non-Tregs in the inflamed joints of Juvenile Idiopathic Arthritis (JIA) patients. Their results indicate that the activated effector T cells and hyper-expanded Treg TCRβ clones found at the inflamed joints are highly persistent in the circulation, and the dominant of high degree of sequence similarity of Treg clones could serve as the novel therapeutic targets for the JIA treatment. Overall, the research design is appropriate, and the methods are adequately described in the study. However, several issues are required to be addressed.

      (1) The criteria for the JIA patient's recruitment should be clearly presented in the method section. For example, what is the specific included criteria and excluded criteria? Or did the patients take medicines for the treatment during the study?

      A total of 9 JIA patients were included in this study. Of these, n=2 were diagnosed with extended oligo JIA, n=2 with rheumatoid factor negative poly-articular JIA, and n=5 with oligo JIA, according to the revised criteria for JIA. The average age at the time of inclusion was 13,1 years (range 3,2 – 18,1 years) with a disease duration of 7,3 years (range 0.4 – 14.2 years). Due to limited sample availability, we did not have strict inclusion or exclusion criteria for JIA patient recruitment. For CyTOF analysis, patients were selected based on the criteria that the left and right knee joints should both be affected at the time of inclusion. For sequential TCR sequencing analysis, we included patients with a refractory disease course. At the time of first inclusion, patients were treated with non-steroidal anti-inflammatory drugs (NSAIDs) or methotrexate, but no biologicals. For the refractory time point samples, patients were treated with disease modifying anti-rheumatic drug (DMARDs) (leflunomide) and/or biologicals (Humira) after first sample inclusion due to the refractory nature of their disease.

      We have now updated the methods section (lines 455-463) of the revised manuscript with more information on patient recruitment, and included information on diagnosis, sex, age, disease duration and medication for every patient in Supplementary File 1.

      (2) As for the correlation analysis of the entire spectrum of node frequencies, the SFMCs and PBMCs isolated from 3 patients were conducted in the study. The sample size is too limited to obtain robust results and to make a convincing conclusion from the correlation analysis. And it is shown that a total of 9 JIA patients have been involved in the study. Could the author clarify it?

      In order to strengthen our observations, we now included single-cell transcriptomics data obtained from Zhang, et al. (https://doi.org/10.1038/s41590-019-0378-1). In this data, we identified a cluster of CD4+FOXP3+ Tregs (new Figure 2-figure supplement 2A and 3B) that showed increased frequency in RA patients (new Figure 2-figure supplement 2C), consistent with the high frequency of Tregs that we observed in our JIA SFMC samples. Additionally, the expression of markers of chronic TCR activation (PDCD1 (PD1), CTLA4 and ICOS), and cytokines (TNF, IFNG and GZMB) were significantly increased in RA compared to OA, in line with what we observed in JIA SFMC (new Figure 2-figure supplement 2D). These results demonstrate that, although the number of JIA patients included in our study is low, obtained results are robustly reproducible in an independent, comparable dataset.

      We do agree with the reviewer that the low number of patients included in our study warrants further validation. Therefore, we have now added the following line in the discussion to highlight this (lines 369-371): “Further validation of our observations in larger cohorts of JIA patients should help to substantiate these results and aid the identification of pathogenic Treg populations across patients.”.

      Regarding the number of patients included in our studies, we have now included Supplemental File 1, which clarifies which JIA patients have been used for each analysis in our study.

      (3) The results of the study indicate that the hyper-expanded T cell clones are shared between left and right knee joints. Since JIA may affect one or more joints, did the author check other joints to see if the same expanded T cell clones infiltrate multiple joints, such as hand or wrist?

      Indeed, it would be interesting to see whether hyper-expanded clones are shared between multiple inflamed joints other than knees. However, samples from other locations are very difficult to obtain and very little synovial fluid can be extracted from joints such as hands and wrists. Therefore, the number of cells obtained from these joints would be too limited to perform mass cytometry or TCR sequencing. Thus, we chose to focus on synovial fluid from knee joints in our studies. Moreover, for oligoarticular JIA patients, only the large joints are affected (of which the knees are most typical), so for these patients it would not be possible to include other joints.

      (4) For Fig.2B, the Treg CD25+FOXP3+ population was significantly enriched in synovial fluid (SF). Is it from the left knee joints or the right knee joints?

      Figure 2B shows data from both knee joints. We have now clarified this in the figure legend by adding “For SFMCs, data from the right and left knee joints for all patients is shown” (lines 179-180).

      And in the context of Line 144-148, it indicated the SF, however, the title of axis in Fig.2B indicated Synovial Fluid Mononuclear Cells (SFMCs). Please keep consistent.

      We thank the reviewer for bringing this to our attention. We have critically revised the manuscript and made the use of SF versus SFMCs more consistent.

      (5) For the longitudinal sampling timelines of JIA patients shown in Supplementary Fig.3, the interval of PB and SF sample collection is not consistent. And only 1 patient completed 4 visits and the sample collection. It is hard to make any conclusion from 1 patient.

      In our study, we had longitudinal samples available for 5 JIA patients for which we performed TCR sequencing of Tregs from SFMCs from different joints (right or left) at least two time points. In the manuscript we mainly focused on patient 1, as for this patient the largest amount of data was available. However, for all other longitudinal patient samples included, we also show that dominant clones persist over time (Figure 4A and 5A). To further highlight that our observations are not just applicable to one patient, but consistent for all patients included, we now included detailed analysis for all patients in Figure 4-figure supplement 3 and Figure 5-figure supplement 1. This analysis shows that frequencies of shared TCRβs are consistent over time in all patients.

    1. Author Response

      Reviewer #1 (Public Review):

      Detecting and quantifying balancing selection is a notoriously difficult challenge. Because the distribution of times to fixation or removal of strictly neutral variants has a long tail, it can be hard to exclude the null hypothesis of neutrality when testing for balancing selection that was not established so long ago that trans-specific variants can be observed. As Aqil et al. point out, most efforts to detect balancing selection in the human genome have been focused on single nucleotide variants. The authors seek to characterize the amount of balancing selection specific for polymorphic deletions. The authors justify their focus based on the fact that deletions are more likely to have functional consequences than single nucleotide variants, making it more likely that if they have remained for many generations, this could be a signature of balancing selection. That said, multiple aspects of the analysis deserve more attention.

      I have two broad concerns about the manuscript that the authors need to address. First, the authors use neutral simulations to exclude that neutrality alone can explain the amount of allele sharing observed between African modern humans and the archaic genomes. My concern is that human demography models, including the one from Gravel et al. (2011) used by the author are always simplifications of the complex demographic events that shaped human populations during evolution. In the case of the specific model used by the authors, African populations were inferred by the Gravel et al. model to have a constant population size for the past ~150,000 years (parameters Taf and Naf in the original model). This is an unrealistic assumption of this model. In brief, I am wondering how much the claim of the authors that neutrality alone cannot explain patterns of allele sharing is potentially based on mis-specifications of the neutral demography model. For example, the more fine scale fluctuations of effective population sizes in Africa inferred by author L. Speidel in 2019 Nature (Figure 3) paint a different picture than the Gravel et al. model. The authors need to run extensive testing of the robustness of their conclusions to changes in the neutral demographic model used. What if the average ancestral population size was closer to 20,000? What if it was closer to 50,000 and frequency fluctuations every generation were smaller? Given how uncertain past population sizes really were and the current uncertainties about demographic reconstruction in particular relative to linked selection, the authors need to explore a range of past population size beyond the idiosyncrasies of a specific model.

      These are great suggestions. Based on them, we now conducted 37 additional simulations with different sets of parameters, including adding the Speidel et al. model to the mix (the new Figure 1C). As discussed above (please refer to our response to the general reviews) and in the Results section, realistic neutral scenarios cannot explain the excess allele sharing.

      My second broad concern is that it is difficult to evaluate how novel the findings really are. It is true that the authors focus on deletions while pasts scans for balancing selection in the human genome focused on SNVs. But it could be the case that a substantial number of the deletions identified here as under balancing selection could have previously been identified as such loci through linked SNVs by the scans cited by the authors. The authors need to provide quantification of how many of their deletions are truly novel balancing selection loci as opposed to balancing selection loci already identified through linked SNVs.

      The reviewer is right. We now compared our results with previous genome-wide studies, which have been notoriously inconsistent with each other. We found that virtually all of our candidates are novel, as described in our response to the general reviews and our Results section.

      The novelty of the balanced deletions will also be better established by providing a more quantitative and less anecdotal functional analysis. It is true that the deletions include immune loci, but are they statistically enriched for immune loci as annotated for example by Gene Ontology, in a way that shows that their distribution across the genome is not random but indeed driven by selection enriching them at loci with specific functions? In addition, do the pie charts in Figure 5E, represent a statistically significant deviation from left to right or not?

      We appreciate the reviewers’ suggestions, which led us to conduct a series of very fruitful analyses. As discussed above, we now found that ancient deletions are significantly more likely to have GWAS traits and be exonic (Figure 5B) and significantly more likely to affect immunity, blood, and metabolism-related traits (Figure 5C). Moreover, we found that ancient deletions are depleted for smaller size categories but show significant enrichment for the sizes 95th percentile and above (Figure 7A). We now discussed these findings in the Results section.

      Reviewer #2 (Public Review):

      The authors assess evidence for balancing selection by looking at old polymorphisms where the derived allele is shared by descent with archaic humans, meaning the polymorphism must predate this split. Using simulations and several features of these old polymorphisms, they evaluate whether and what signatures of balancing selection are enriched in these polymorphisms. This is a well-explained and thorough analysis, and a clever way to approach a difficult question, yet the analysis remains fairly descriptive and the claims that can be made are not strong. For instance, the title of the paper does not state a particular finding of balancing selection, and several claims are "may" such as "A significant portion of ancient polymorphisms may have evolved under medium-term balancing selection" and "These results suggest that at least 27% of common functional deletion polymorphisms may have been evolving under balancing selection".

      We thank the reviewer for their insights. We agree that balancing selection is a difficult to elucidate definitively. However, in our revisions, we have conducted several additional analyses based on reviewers’ suggestions as discussed under individual comments. We believe that these analyses strengthen our claims.

      First, using simulations, they show there are more such ancient nonsynonymous and (indirectly) deletion variants than expected under a simple neutral model. The enrichment is nominal when compared only with Denisovan sharing, which could be explained due to some superarchaic ancestry in Denisovans (though not clear if that holds up quantitatively). The classification of the shared polymorphisms as recurrent, recently introgressed, or ancient shared by descent could be more carefully tested. In particular, I'm concerned about the possible inclusion of recurrent mutations among the ancient set. Although the age trend is consistent, it does not indicate how much misclassification there might still be. For example, there are "ancient" deletions that have inferred ages more recent than the human-archaic split (shown in Fig. 3).

      We agree that recurrent mutations are crucial to discriminate from the ancient ones in our analysis. We have now conducted additional analysis of allele frequency and CG content to further test potential recurrent mutations in our datasets as described in our response to general reviews. We described these in our Results section and Figure S1. In addition, we actually conducted even more stringent filtering requiring perfect LD and found that this increased stringency did not affect our results substantially. Thus, we believe that our pipeline identifies ancient deletions very conservatively and likely harbors a considerable number of false negatives, where ancient deletions are categorized as recurrent.

      The reviewer’s observation that some ancient deletions have recent dates is indeed interesting. The dating of individual alleles assumes neutrality and broadly depends on haplotype length and allele frequency. We believe that given the potential soft sweeps acting on these deletions, it is possible that the dates may be biased in some cases. For example, if there is a recent sweep on an ancient deletion, this may lead to longer haplotype lengths and, thus, a more recent date for these alleles. Therefore, the ancient derived alleles (those that are shared with archaic hominins) which happen to have recent allele dates may be of particular interest for future scrutiny. We now discuss this particular issue further in the Results section as follows:

      “Counterintuitively, some “ancient” deletions have very recent dates. This may be due to instances of recent soft sweeps involving some deletions leading to an increased length of the associated haplotype and an artificial decrease in age. Secondly, some ancient deletions may have low frequencies, which too creates a downward bias in age. Lastly, this may be due to rare instances of miscategorization of non-ancient deletions as ancient.”

      For the rest of the paper, the authors then focus on the deletion variants, showing that these ancient deletions show an elevated signature of balancing selection (stdbeta2) but do not show less variance in allele frequency over time as would be expected under an overdominance model. They infer the mechanism to be spatial or temporal variation in selection or negative frequency-dependent selection by process of elimination. They identify the subset of ancient deletion polymorphisms that overlap exons and are associated with phenotypes, finding a high proportion of ancient deletions that fall in both these categories. The identification of this set of potentially causal deletions that may be under balancing selection is a set that is of interest to the wider community for follow up (though several have already been the subject of study and individual publications from this lab). Overall, this is a useful combination of simulation work and assessment of an intriguing set of old deletion polymorphisms. Put together, the analysis does support evidence of balancing selection on some of them, but the extent is still not clear.

      We thank the reviewer. To further determine the extent of balancing selection acting on these ancient deletions, we conducted several enrichment analyses described above (please refer to our response to the general reviews) and in the paper. Briefly, we now added Figures 5B, 5C and 7A to describe these new analyses.

    1. Author Response

      Reviewer #1 (Public Review):

      This study demonstrates the role of the circadian clock in spatiotemporal regulation of floral development. The authors nicely illustrated floral development patterns in domesticated sunflower. In particular, during anthesis, discrete developmental zones, namely pseudowhorls, are established, and hundreds of florets simultaneously undergo maturation in each psudowhorl in a circadian-dependent manner. Consistently, the flower development follows key features of the circadian clock, such as temperature compensation and gating of plant response to environmental stimuli. Evolutionary advantages of this regulation will add more merit to this study.

      We thank the reviewer for this suggestion. We have performed new experiments (Figures 7 and 7-S1) that demonstrate that delays in anthesis relative to dawn and disruption of pseudowhorl formation both negatively impact pollinator visits to flowers. These findings suggest that circadian and light regulation of floral anthesis may have significant impacts on male reproductive fitness.

      Reviewer #2 (Public Review):

      Little is known about how the circadian clock regulates the timing of anthesis. This manuscript shows that the circadian clock regulates the diurnal rhythms in floral development of the sunflower. The authors have developed a new method to characterize the timing of floral development under normal conditions as well as constant dark and light conditions. The results from the treatment of darkness during the subjective night and day suggest that the circadian clock regulates the growth of ovary, stamen, and style differently.

      All clock papers claim that the circadian clock regulates the fitness of organisms, however, it is hard to evaluate how the circadian clock affects the fitness of organisms. The timing of pollen release and stigma maturity is directly related to plant fitness. That's why the authors suggest that the circadian clock in sunflowers increases plant fitness by regulating the timing of anthesis.

      Although the authors manipulated the light and temperature to examine the role of the circadian clock in floral development, the weakness of this manuscript is that there is no molecular evidence to show how the clock regulates floral development.

      We acknowledge that this study does not demonstrate the molecular mechanisms by which the circadian clock and environmental sensing pathways regulate floral anthesis in sunflower. However, we feel that our demonstration that the circadian clock is involved in the generation of spatial patterns of development on the sunflower inflorescence disk is in itself novel and significant.

      Reviewer #3 (Public Review):

      The flowering heads of species in the Asteracaeae comprise large number of flowers, and this phenotype is thought to contribute to their reproductive success. The Harmer lab has developed sunflower as an experimental model to investigate the contribution of circadian regulation to the processes of reproduction in the Asteraceae, and this paper presents a new addition to this line of research.

      The novelty of the article is that it resolves unanswered questions around the processes that underlie coordinated flowering within the disc structure of the floral capitulum. The authors demonstrate a role for circadian clock in the temporal structuring of this process. They identify a free running rhythm in constant darkness of floral anthesis, and this rhythm has several key characteristics of circadian rhythms. The data collected also indicate that the circadian clock might gate the response of anthesis to darkness.

      I like the presentation of an external coincidence model for the interaction of light and circadian cues in the floral developmental program of the capitulum. However, I wonder whether this is the only potential explanation. The data in Fig. 4C look like classical entrainment responses. Are the authors sure that they are not just seeing an entrainment process within the capitulum, combined with a masking effect of continuous light upon the rhythmic phenotype? I encourage the authors to retain speculation about the coincidence model within the discussion- it's so important for future work- but perhaps consider alternative interpretations of the data also.

      We thank the reviewer for their positive comments and overall enthusiasm for the study. We agree that it is entirely plausible that continuous light masks circadian clock-controlled rhythms in floral organ development; in our view, this is a restatement of the external coincidence model. We argue that in developing sunflowers, a circadian clock-regulated process controls elongation of floret organs. Normal development depends upon a dark period of at least 4.5 hours occurring during the subjective night. In constant light conditions, or early in re-entrainment when the dark period occurs during the subjective day, normal development is inhibited. This model is analogous to the photoperiodic control of flowering time in short-day plants, in which light perceived during the subjective night inhibits the floral transition.

    1. Author Response

      Reviewer #1 (Public Review):

      Tafenoquine is an important 8-aminoquinoline antimalarial, mostly aimed at the management of Plasmodium vivax malaria. Through the retrospective analysis of several previously performed efficacy trials, the authors aimed to better understand the drugs mechanism of action, while exploring the possibility of improved efficacy through dose increment.

      Strengths: robust analysis approaches unlocked three main messages with the potential of improving the clinical practice:

      i. P. vivax recurrency is positively associated with tafenoquine terminal half-life and D7 methemoglobin levels.

      ii. The methemoglobin levels support the current view that tafenoquine, acts through its metabolites, similar to what is believed for primaquine.

      ii. Most importantly, the therapeutic window of tafenoquine is wider than previously considered, allowing the suggestion of a significant increase in dosing, from 300 mg to 450 mg, leading to significantly increased efficacy.

      Weaknesses: being a retrospective analysis, the work is limited to the available data. In particular, and as referred by the authors, no drug levels are reported. Additionally, there are some aspects that in my view need more detailed analysis and discussion, in particular, what seems to be a lack of exploration as to the importance (or lack of it) of the patient CYP2D6 status in Tafenoquine T1/2, methemoglobin levels, and overall efficacy. These mild weaknesses do not change the overall conclusions of the study.

      We thank the reviewer for their positive comments.

      The analysis estimates the parameters of the PK model from 4499 measured drug concentrations measured for 718 individuals between days 0 and 180. The active metabolites of tafenoquine are unknown and thus could not be quantified.

      Whilst the study is retrospective it includes 77% (651/847) of all patients enrolled in published P. vivax treatment trials of tafenoquine.

      We respond to the relationship between CYP2D6 polymorphisms and the other outcomes in our response to Reviewer #1, Comment 2.

      Reviewer #3 (Public Review):

      By assembling the vast majority of global tafenoquine pharmacology data from clinical treatment studies that led to the 8-aminoquinoline's registration in 2018, the authors of this manuscript have convincingly made their argument that the currently recommended treatment dosage of 300mg (in combination with chloroquine) is too low and needs to be increased by at least 50%. Access to the multiple data sets is thorough, the modelling reasonable and the conclusion reached is sound.

      How did we get here (again) under-dosing malaria patients with a class of drugs we have been working on for a century? Speaking as someone who was associated with tafenoquine development over two decades, it seems that worry about adverse events, specifically hemolysis in G6PD deficient persons, overcame the operational requirement to give enough drugs in a single dose regimen. However, tafenoquine is very safe in G6PD normal persons who by definition were the ones entered into the clinical treatment trials. Risk-benefit judgments cannot always be weighted towards "safety" especially when the real concern was that a single severe adverse event would derail the entire development program. Real-world effectiveness matters and should now result in the studies the authors state are needed to certify the higher dose regimen.

      1) The schizophrenic nature of tafenoquine development needs to be mentioned. This manuscript discusses malaria treatment and includes nearly all the relevant data, but extensive work was also done to support the chemoprophylaxis indication largely sponsored by the US Army. These prophylaxis efforts were often separate from the parallel efforts on treatment indication to the loss of both groups who were ostensibly working on the same drug. 450mg tafenoquine is not a large dose; 600mg (over 3 days) is routinely given at the beginning of malaria chemoprophylaxis. Up to twice that amount was given in phase 2 studies done in Kenya in 1998 which resulted in the only described severe hemolytic reaction when one G6PD deficient heterozygote woman was given 1200mg over 3 days due to incorrect recording of her G6PD status. It is not easy to hemolyze even G6PD-deficient erythrocytes due to the slow metabolism of tafenoquine. Nearly all clinical trials of both primaquine and tafenoquine have experienced similar hemolytic events when there were errors in the determination of G6PD status. This does not mean that all 8-aminoquinolines are dangerous drugs, only that a known genetic polymorphism needs to be accounted for when treating vivax malaria.

      It is notable that much larger doses of tafenoquine have been evaluated previously and these have been well tolerated in individuals with G6PD activity >30% (previous studies used semi-quantitative tests). We have added a review of all patients with P. vivax malaria who have been studied in treatment trials. A total of 847 were enrolled in all studies and our series contains individual patient data on 651 (77%) of these patients.

      We have added the following to the Discussion on lines 277-283:

      “Much larger doses have been studied in treatment and prophylaxis trials (up to 2100mg given over one week, Walsh et al., 1999, see Supplementary Appendix). The only report of a severe haemolytic reaction occurred in a female patient heterozygous for G6PD deficiency (A- variant) and received a total dose of 1200mg tafenoquine over 3 days (Shanks et al., 2001). In the same study, a homozygous female (A- variant) who was also given 1200mg tafenoquine over 3 days had an estimated 3g/dL drop in haemoglobin, but remained asymptomatic.”

      2) The authors point out the utility of 7-day methemoglobin concentrations in predicted drug success/failure in the prevention of subsequent relapses. This is important and stresses the requirement of drug metabolism to a redox-active intermediate as being a common property of all 8-aminoquinolines. Tafenoquine and primaquine are similar but not identical and the slow metabolism of tafenoquine to its redox-active intermediates explains its main advantage of being capable of supporting a single-dose cure. The main reason this was not appreciated much earlier is we were looking in the wrong place. Metabolic end-products (5,6 orthoquinones) are in very low concentrations after single-dose tafenoquine in the blood, but being water-soluble they are easily located in the urine. Such urine metabolites indicative of redox action are very likely to be complementary to methemoglobin measurements which mark the redox effect on the erythrocyte. Despite earlier simplifying assumptions made during tafenoquine development (no significant metabolites exist), metabolism to redox-active intermediates must be embraced as the explanation of drug efficacy and not a cause of undesirable adverse events.

      Another dark cloud over tafenoquine mentioned by the authors was the very disappointing results of the INSPECTOR trial in Indonesia whose full results are yet to be published. The failure of P vivax relapse prevention using 300mg tafenoquine with dihydroartemisinin-piperaquine in Indonesian soldiers was largely ascribed to under-dosing. Although this may have been partially true, another aspect indicated in figure 15 of the appendix is the nature of the partner drug. Artemisinin combinations with tafenoquine do not produce the same amount of methaemoglobin (indicative of redox metabolism) as when combined with the registered partner drug chloroquine. We do not understand tafenoquine metabolism, but it is increasingly clear that what drug is combined with tafenoquine makes a very substantial difference. Despite the great operational desire to use artemisinin combination therapy for all malaria treatment regimens, this may not be possible with tafenoquine. Chloroquine likely is driving tafenoquine metabolism as it has no real effect on latent hypnozoites in the liver by itself. Increased dose studies with tafenoquine need to be done with chloroquine, not artemisinin.

      We are aware that this is an area of intense interest and that ex vivo data were presented at the recent ASTMH conference in Seattle suggestive of a drug-drug interaction between artemisinisin and tafenoquine. However, there are as yet insufficient in-vivo data to conclude with tafenoquine reducing the methaemoglobin concentration indicative of reducing redox metabolism compared to chloroquine and tafenoquine. In addition these data as yet unpublished.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper is a continuation of other research by this group and represents another step back in time for peptide preservation in eggshells. It is exciting to see Miocene age peptides and that they overlap so completely with both extant ostrich struthiocalcin as well as the previously described Pliocene peptides. The biggest weakness is the lack of tables showing both the de novo peptides as well as those detected by database searching.

      We thank the Reviewer for their positive assessment of our work. We now provide a table with peptides identified by database searching as well as the annotated tandem mass spectra for the peptides.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Germanos et al present preclinical evidence of a dynamic interplay between tumor microenvironmental elements underlying prostate cancer initiation, progression, and emerging therapeutic resistance in the transgenic mouse model. The authors identify an intermediate luminal cell population trans-differentiating from a hypo-proliferative basal cell subset, meanwhile, hyper-proliferative basal cells replenish a non-differentiating basal subpopulation. The meticulous methodologic approach identifies candidate cellular interactions in fibroblasts, MDSCs, and immune cell populations associated with PTEN loss. The generalization of these findings to human data sets is of particular interest and recommended for future studies on this topic. Mechanistic studies with multi-cellular co-culture models are needed to extend and validate the findings in this report.

      We thank the reviewer for finding our research “meticulous” in its approach. We agree that validating our findings in human contexts is a vital next step and have added new orthogonal datasets in the revised manuscript (Figure 4D-E). We also agree that complex molecular studies will be needed to fully evaluate our cell-cell interaction hypotheses. To this end, we have elaborated on appropriate follow-up studies in the discussion (Lines 625-628, 642-643, 657-659, 675-677).

      Strengths and Weaknesses:

      The study focuses on a clinically highly relevant and timely topic. The strength of this manuscript is the meticulous description of the Methods and model development and the integration of state-of-the-art orthogonal data sets. However, the number of data points across the experiments (n = 2 or 3) with considerable variability in the Ptenfl/fl group limits the interpretation of findings. Additionally, further experiments are needed to validate these observations in human prostate cancer and establish the potential translational relevance of these findings.

      We are ecstatic that the reviewer finds our study “clinically highly relevant.” We agree that the low sample size is a potential limitation but believe that our overall results are robust and enable concrete conclusions for both epithelial and immune cell populations. This is in part because we validated our findings in orthogonal human datasets (Figure 4A-C, Figure 5H) in the original manuscript. However, to add rigor to our study, we have conducted new scRNAseq analysis showing that our findings correlate well with both human patient data (Figure 4D-E) and orthogonal mouse models (Figure 4F-G). Furthermore, we conducted additional scRNAseq on castrated WT murine prostate to demonstrate how castration plays an important role in translational heterogeneity in intermediate cells (Figure 4H, Figure 3 – figure supplement 1G).

      As such, the report is fairly descriptive, and expanding the discussion on the mechanistic studies needed to identify which of these interactions drives aggressive prostate cancer would improve this report.

      We agree with the reviewer that additional discussion of follow-up studies is necessary. As such, we have updated the discussion to highlight the molecular studies needed to fully characterize the cellular phenotypes described in this manuscript (Lines 625-628, 642-643, 657-659, 675-677).

      Reviewer #2 (Public Review):

      This work provides a thorough characterization of tumor cell and microenvironment dynamics in a castrate Pten null prostate cancer model and details the strength of cellular interactions using single-cell RNA sequencing. The search for a preexisting castrate-resistant prostate progenitor has been upended in recent years with the discovery that prostate luminal cells adapt to low androgen environments by undergoing lineage plasticity rather than an expansion of proximal progenitors. This paper provides indirect evidence that basal epithelia give rise to 'intermediate' epithelia through increased translation in intact and castrate Pten null mice cells, which is validated in a Pten null, 4ebp1 mutant mouse model.

      Strengths:

      The single-cell data are robust and expertly presented in the figures. The methods are largely appropriate and the delineation of experimental protocols is straightforward. The analysis is comprehensive and well described in relation to biological questions of interest to the community. The validation of the effect of translation on prostate epithelial viability in relation to initial findings advances our understanding of how cells survive in low androgen environments. The addition of a public portal for the data is highly useful.

      We thank the reviewer for evaluating our work as “robust and expertly presented,” “comprehensive,” and “highly useful.”

      In response to the reviewer’s in-depth comments, we have revised our nomenclature of WT epithelial cell subtypes to specifically distinguish between Krt4+/Tacstd2+ urethral, prostatic, and cancer-derived cells (Lines 163-185). We now find urethral and luminal progenitor groups in WT intact mice, which are distinct from “intermediate” cells arising from Pten loss (Figure 1 – figure supplement 1D-F). We have accordingly revised our interpretation of the potential origins of these intermediate cells in cancer (Lines 256-275).

      Weaknesses:

      The PB-Cre4 promoter seems to be promiscuously inactivating Pten in basal, intermediate, and luminal cells, which is problematic as this confounds the ability to differentiate between cells that are undergoing lineage plasticity vs. expansion of a pre-existing progenitor cell type. Much recent evidence points to lineage plasticity of prostate luminal tumor cells under androgen deprivation rather than survival and expansion of a pre-existing castrate-resistant basal or intermediate cell type. Accordingly, the observation that basal epithelia may transdifferentiate to intermediate epithelia or that a pre-existing intermediate luminal cell state is expanded under castration may be artifacts of the model without reproduction in human prostate cancer. The use of trajectory analysis of single-cell data to demonstrate basal or intermediate cell lineage transdifferentiation is a weaker type of evidence than the lineage tracing of individual cell types provided by other groups, which argue against transdifferentiation and for lineage plasticity.

      This is a very thoughtful and nuanced comment. We agree that the PB-Cre4 promoter is promiscuously inactivating Pten in basal, luminal progenitor cells, and luminal cells which does confound the ability to differentiate between cells that are undergoing lineage plasticity versus expansion of pre-exisiting progenitor cell types. As such, we now expand our results section to include non-basal routes to the expansion of the Pten intermediate cell population (Lines 261-275). Furthermore, we also comprehensively discuss the limitations of our models in the discussion section highlighting the need to validate our findings using lineage tracing or newer techniques such as DNA Typewriter (Lines 616-628) (Choi et al., Nature 2022).

      Currently it is not possible to conduct lineage tracing within the human prostate making it impossible to determine if basal epithelia may transdifferentiate to intermediate epithelia or if a pre-existing intermediate luminal cell state is expanded under castration. However, we do present new human scRNAseq data that the intermediate cell state, as reflected by the 5-gene castration signature, is enriched specifically in metastatic, but not localized prostate cancer (Figure 4D-E). Furthermore, we show that this gene signature is also relevant in a completely different progression model of murine prostate cancer (Figure 4F-G). Thus, while not perfect, our model does have potential human relevance despite the limitations which we address in the manuscript (Lines 261-275, 616-628).

    1. Author Response

      Reviewer #1 (Public Review):

      Kang et al. have performed whole exome sequencing of gall bladder carcinomas and associated metastases, including analysis of rapid autopsy specimens in selected cases. They have also attempted to delineate patterns of clonal and subclonal evolution across this cohort. In cases where BilIN was identified, the authors show that subclones within these precursor lesions can expand and diversify to populate the primary tumor and metastatic sites. They also demonstrate subclonal variation and branching evolution across metastatic sites within the same patient, with the suggestion that multiple subclonal populations may metastasize together to seed different sites. Lastly, they highlight ERBB2 amplification as a recurrent event observed in gall bladder carcinomas.

      While these data add to the literature and start to examine important questions related to clonal evolution in a relatively rare malignancy, the authors' findings are very descriptive and it is hard to draw many generalizable conclusions from their data. In addition, the presentation of their figures is somewhat confusing and difficult to interpret. For example, they do not separate their clonal analyses by disease site and by time in a readily interpretable manner, as in some instances of Figure 2 and Figure 3 the clone maps are from different sites collected at the same time point, while others show some samples at different time points. Depicting these hierarchies in a more organized and clearly understandable manner would help readers more easily interpret the authors' findings. In addition, the clinical implications of these clonal hierarchies and their heterogeneity are unclear, as the authors do not relate the observed evolution to intervening therapies and may not be powered to do so with this dataset.

      Thank you for the constructive and valuable comments about 1) figures and 2) clinical implications.

      1) We agree with your opinion that Figures 2 and 3 are confusing. Reflecting on your comment, Figures 2 and 3 have been modified. Now, the time point at which the tissue was obtained and the anatomical location of the tissue are readily visible in the redesigned figures.

      2) From a clinical point of view, we believe that our study highlights the importance of precise genomic analysis of multi-regional and longitudinal samples in individual cancer patients. In the current oncology clinics, cancer panel data of patients are being used to identify druggable mutations usually with a single tumor sample. However, we found that only a part of the mutations was clonal while a substantial proportion was subclonal, which is usually not an effective druggable target. For example, in the GB-S2 patient, after sequencing with GB tissue, ERBB2 targeting treatment would have been performed if a specific clinical trial is available because ERBB2 p.V777L is pathogenic. However, our clonal evolution analysis suggests that ERBB2 targeting strategy may not be effective in subclones without the ERBB2 p.V777L mutation, especially from regional metastasis. We have added the description for this part to the Discussion section (Page 13, Line 12-15).

      Additional areas that would require clarification include:

      1) There are very few details on how the authors performed their subclone analysis to identify major subclones, and what each of the clusters in Supplemental Figure 1 represents. In addition, they do not describe how they determined that the highlighted mutations in Table 2 were drivers for metastasis and subclonal expansion. Were these the only genes that exhibited increased allele frequencies in metastatic sites, or were other statistical criteria used?

      Thank you for the important comment about 1) clone analysis and 2) highlighted mutations in Table 2.

      1) Mutations were timed as clonal or subclonal through PyClone (Roth A et al., Nat Methods. 2014) clustering (Figure 1—figure supplement 1). Phylogenetic trees were constructed using the mutation clusters identified with PyClone as an input of CITUP (Malikic S et al., Bioinformatics. 2015) (Figures 2 and 3). We added the sentence "See Supplementary File 1 to check the matching information for the PyClone clusters and the CITUP clones." to the supplementary figure legend.

      2) A full list of mutations constituting a CITUP clone can be found in Supplementary File 1. Among the mutations, previously reported cancer-associated genes harboring them were selected manually and listed in Table 2. References for each gene are introduced in the 'Evolutionary trajectories and expansion of subclones during regional and distant metastasis' section.

      2) The authors do not discuss the relevance of variation in mutational signatures observed with disease progression/metastasis, e.g., is there any significance that signature 22 (aristolochic acid) and signature 24 (aflatoxin) are increased in metastases? In addition, when comparing their data to previously published reports in Figure 1B and Figure 4A, it would be helpful if the authors discussed possible reasons for some of the large differences in mutational or signature frequencies across datasets. For example, do the authors think the frequency of ERBB2 alterations is so much higher in their cohort than in prior reports due to methodological/data reasons or due to differences in patient population?

      Thank you for the constructive and valuable comments about 1) mutational signatures observed with disease progression/metastasis and 2) differences in mutational or signature frequencies across datasets.

      1) During the revision process, signatures 22 and 24 highlighted in the metastasis stage were validated by two additional tools, Signal (Degasperi A et al., Nat Cancer. 2020) and MuSiCa (Diaz-Gay M et al., BMC Bioinformatics. 2018) (Figure 4—figure supplement 3). Aristolochic acid is an ingredient of oriental herbal medicine (Debelle FD et al., Kidney Int. 2008, Hoang ML et al., Sci Transl Med. 2013). Given that all the patients in our cohort are Korean, and a recent study found that Korean cancer patients are frequently exposed to herbal medicines (Kwon JH et al., Cancer Res Treat 2019), one possible explanation is that some patients might have been exposed to herbal remedies containing aristolochic acid. On the other hand, aflatoxin is known to be contained in soybean paste and soy sauce, which are widely used in Korean food (Ok HE et al., J Food Prot. 2007). Considering that the signatures 22 and 24 are found not in early carcinogenesis but in late carcinogenesis and metastasis (Figure 4B and Figure 4—figure supplement 3), the two carcinogens appear to have little impact on the early stage of cancer development, but their impacts might be highlighted in overt cancer cells. Further investigation is required because it is difficult to determine the etiology of signatures 22 and 24 with this limited patient data. We updated this part in the Discussion section (Page 13, Line 4-7).

      2) In the two previous genomics studies on GBAC, the prevalence of ERBB2 alteration was 7.9% (Narayan RR et al., Cancer. 2019) and 9.4% (Li M et al., Nat Genet. 2014), respectively. Compared with these data, our data is characterized by relatively higher ERBB2 alterations (54.5%: amplification in 27.3% and SNV in 27.3%) (Figure 1B). A higher prevalence of ERBB2 alteration was also reported in other studies on GBAC, with corresponding rates of 28.6% (amplification and overexpression, Nam AR et al., Oncotarget. 2016) and 36.4% (amplification only, Lin J et al., Nat Commun. 2021). The variations in ethnicity and culture might have contributed to the differences. This part is described in the Discussion section (Page 11, Line 19-23). In addition, the discrepancy in Figure 4A might be attributed to the difference in analyzed samples: our study included precancerous and metastatic lesions while the other two studies uniformly analyzed primary tumors.

      Reference for reply 1)

      • Kwon JH, Lee SC, Lee MA, Kim YJ, Kang JH, Kim JY, et al. Behaviors and Attitudes toward the Use of Complementary and Alternative Medicine among Korean Cancer Patients. Cancer Res Treat. 2019;51(3):851-60.

      3) The authors try to describe and draw conclusions about the possibility of metastasis to metastasis spread in p.6, lines 6-10 "In our study, of 7 patients with 2 or more metastatic lesions, evidence of metastasis-to-metastasis spread was found in 2 patients (28.6%). In GB-A1 (Figure 2A), it appears that CBD, omentum 1-2, mesentery, and abdominal wall 2-4 lesions may originate from abdominal wall 1 (old) rather than from primary GBAC considering clone F." The authors conclude here that the spread arose from abdominal wall 1, but this lesion is only separated from the CBD lesion by 1 month. There is no history given about whether this timing difference is significant or if it was simply due to clinically-driven differences in when each lesion was sampled. Given the proximity of the CBD lesion to the original gall bladder cancer, it seems just as likely that all of these distant lesions were seeded from the CBD lesion. If this is the case, the author's conclusion about "metastasis to metastasis" spread does not seem strongly supported. It would be helpful if the authors could clarify this point and/or provide additional data to strengthen this conclusion.

      We appreciate your valuable comment. As addressed above, the manuscript has been modified to reflect your comments.

      Reviewer #2 (Public Review):

      Minsu Kang et al. analyzed 11 patients with gallbladder adenocarcinoma using multi-point sampling. Mutational analysis revealed evolutional patterns during progression where the authors found metastasis-to-metastasis spread and the migration of a cluster of tumor cells are common in gallbladder adenocarcinomas. The signature analysis detected signatures 22 (aristolochic acid) and 24 (aflatoxin) in metastatic tumors. Overall, the analyses are well-performed using established algorithms. However, the manuscript is highly descriptive. Therefore, it is very difficult to understand what the novel findings are.

      Major comments

      1) The sections "Evolutionary trajectories and expansion of subclones during regional and distant metastasis", "Polyclonal metastasis and intermetastatic heterogeneity", "Mutational signatures during clonal evolution", and "Discussion" are highly descriptive which makes it difficult to understand what the novel and/or important findings are. Those sections would profit from reorganization.

      Thank you for the important comment. We have reorganized the manuscript according to your comments.

      1) In the "Evolutionary trajectories and expansion of subclones during regional and distant metastasis" section, unnecessary sentences have been removed and Figures 2 and 3 have been changed to make it simpler to understand how subclones spread during metastasis.

      2) In the "Polyclonal metastasis and intermetastatic heterogeneity" section, after receiving feedback on statements that were conflicting (Reviewer #1's comment 4), we clarified the statements and removed any other extraneous sentences. Figures 2 and 3 have been changed to make it simpler to understand polyclonal metastasis and intermetastatic heterogeneity.

      3) In the "Mutational signatures during clonal evolution" section, after receiving comments that Figures 4B and 4C were confusing (Essential Revisions #6), we moved Figure 4B to Figure 4—figure supplement 2. Unnecessary sentences have been removed. We emphasized signatures 22 and 24 highlighted during metastasis. This result was validated by using two additional tools, Signal (Degasperi A et al., Nat Cancer. 2020) and MuSiCa (Diaz-Gay M et al., BMC Bioinformatics. 2018).

      4) In the Discussion section, duplicate descriptions and unnecessary extraneous explanations have been deleted. We emphasized that whereas aflatoxin and aristolochic acid had little impact on early cancer formation, their impacts could be more clearly seen in cancer cells that had already manifested (Page 13 Line 2-7). In addition, the limitations of the NGS test currently used in the clinical field were pointed out, and the clinical significance of this study was described (Page 13 Line 8-16).

      2) What would enhance this paper is more of a connection between the bioinformatics analysis and the biology. Although the authors analyzed multi-point sequencing data well, this paper lacks in-depth discussion. I understand that the results in the paper are "computationally" the most likely. However, the impact is lost by an incomplete connection to biology.

      As you commented, we analyzed the WES data obtained from patient samples by computational methods. In this study, we did not validate the various results using in vitro or in vivo models. However, we would like to emphasize the significance of our work because it is the first human study, covering the current theory of carcinogenesis from precancerous lesions to metastasis in GBAC. For example, polyclonal seeding has been previously confirmed in animal models (Cheung KJ et al., Science 2016). In humans, there have been reports in breast cancer (Ullah I et al., J Clin Invest. 2018) and colorectal cancer (Wei Q et al., Ann Oncol. 2017), but not in GBAC yet.

      3) In addition to the above concern, it is difficult to comprehend the cohort as the detailed information is lacking. I would suggest providing a brief table that contains the number of collected samples, frozen or FFPE, the clinical information, etc. by sample.

      Thank you for the constructive comment. Supplementary Table 1 was modified as you mentioned. It is now indicated from which organ, when, and by what method the tissue was obtained, what the tumor purity of the tissue was, and whether the tissue was fresh-frozen or FFPE. In addition, we updated the information about tissue acquisition sites in Figure 1A.

      4) The mutations with very low allele frequency (< 1%) are discussed in the manuscript. However, no validation data is provided. Please add a description of the accuracy of the mutation calling considering the following concerns.

      • FFPE samples are analyzed using the same method as frozen samples. FFPE contains much more artifacts. Is it adequate to use the same methods for both frozen and FFPE samples?

      Thank you for the valuable comment. We also considered the FFPE artifacts. However, we did not remove the possible artifacts. This part has been described above. Please see Essential Revisions #5.

      • How were those mutations with low allele frequency validated? Are those variants validated by other methods? Especially in FFPE.

      Thank you for the important comment. Firstly, we discarded any low-quality, unreliable reads and variants according to the pre-specified filtering criteria used in previous literature analyzed with the Genomon2 pipeline (Yokoyama A et al., Nature. 2019, Kakiuchi N et al., Nature. 2020, Ochi Y et al., Nat Commun. 2021). In the Method section, we have added an explanation for this part (Page 16 Line 5-12).

      As you commented, validation of low VAF mutation is required if the mutation is sample-specific. However, in this study, if a mutation in Supplementary File 1 has a low VAF in one sample, one of the other samples always has a higher VAF, which has passed our pre-specified filter. Therefore, validation is not required for that mutation. In addition, possible sequencing artifacts with low VAFs in FFPE tissues have been discussed above. Please see Essential Revisions #5.

      • Is the low variant allele frequency (0.2~1%) significantly higher than the background noise level?

      Thank you for the important comment. As you expected, FFPE samples had a higher number of sample-specific mutations than fresh-frozen ones in our study. However, we did not remove these mutations in the analysis of the FFPE samples. For a more detailed description, please see Essential Revisions #5.

      5) The authors compared mutational signatures divided by stages or timings. How are the signatures calculated although each sample has a distinct number of somatic mutations? Did the authors correct the difference?

      Thank you for the helpful comment. We classified all the mutations according to the specific criteria (Page 9 Line 9-18). For example, in Figure 4B (before revision, Figure 4C), mutations were classified by the timing of development during clonal evolution. After that, we could calculate the relative contributions of mutational signatures in each group using the three tools, Mutalisk (Lee J et al., Nucleic Acids Res. 2018), Signal (Degasperi A et al., Nat Cancer. 2020) and MuSiCa (Diaz-Gay M et al., BMC Bioinformatics. 2018). Although the number of mutations is different for each group, no additional correction was required because we compared the relative contributions among the groups.

      6) In distant metastasis tumors, signatures 22 and 24 are increased. Those two signatures are strongly associated with a specific carcinogen. Although the clinical information lacks, do the authors think that those patients were exposed to those chemicals after the diagnosis? Why do the authors think the two signatures increased in the metastatic tumors? Were those signatures validated by other methods?

      We appreciate your important and constructive comment.

      1) We think that the patients might have been exposed to aristolochic acid or aflatoxin before or after the cancer diagnosis. Aristolochic acid is an ingredient of oriental herbal medicine (Debelle FD et al., Kidney Int. 2008, Hoang ML et al., Sci Transl Med. 2013). Given that all the patients in our cohort are Korean, and a recent study found that Korean cancer patients are frequently exposed to herbal medicines (Kwon JH et al., Cancer Res Treat 2019), one possible explanation is that some patients might have been exposed to herbal remedies containing aristolochic acid. On the other hand, aflatoxin is known to be contained in soybean paste and soy sauce, which are widely used in Korean food (Ok HE et al., J Food Prot. 2007). Nevertheless, we believe that further investigation is required because it is difficult to determine the etiology of signatures 22 and 24 with this limited patient data.

      2) Summarizing the mutational signature results using the 3 different tools (Figure 4B and Figure 4—figure supplement 3), the signatures 22 and 24 are relatively rare in early carcinogenesis. However, the two signatures contributed more to late carcinogenesis and metastasis. Therefore, it is speculated that the two carcinogens appear to have little impact on the early stage of cancer development but might be highlighted in overt cancer cells. Further studies on this novel hypothesis are necessary.

      3) During the revision process, signatures 22 and 24 highlighted in the metastasis stage were validated by two additional tools, Signal (Degasperi A et al., Nat Cancer. 2020) and MuSiCa (Diaz-Gay M et al., BMC Bioinformatics. 2018) (Figure 4—figure supplement 3). We updated this part in the Result (Page 9 Line 18-21) and Discussion (Page 13 Line 2-7) sections.

      Reference for reply 1)

      • Kwon JH, Lee SC, Lee MA, Kim YJ, Kang JH, Kim JY, et al. Behaviors and Attitudes toward the Use of Complementary and Alternative Medicine among Korean Cancer Patients. Cancer Res Treat. 2019;51(3):851-60.

      7) Figures 2 are well-described. However, they are difficult for readers to fully understand. The colors for each clone are sometimes similar. The results of multi-time point and regional analyses in the cases with multiple sampling are not integrated. Driver mutations are separately described in the small phylogenetic trees. Evolutional patterns (linear or branching) are not described in the figures. Modifying the above concerns would improve the manuscript.

      We appreciate your important comment.

      1) In GB-S1, clones of similar colors were modified to be different colors.

      2) Figures 2 and 3 have been modified to make them easier to understand by separating time and space more clearly.

      3) Driver mutations are now indicated in both the phylogenetic tree and TimeScape result (Figures 2 and 3).

      4) Evolutional patterns (linear or branching) can be discovered by examining the phylogenetic tree in Figures 2 and 3. In addition, we described each patient's evolutionary pattern more clearly in the manuscript.

      8)"Among 6 patients having concurrent BilIN tissues, two patients were excluded from the further analysis because of low tumor purity in one patient and different mutational profiles between BilIN and primary GBAC in the other patient, suggesting different origins of the two tumors (Figure 1-figure supplement 2)." This seems cherry-picking. More explanation is necessary.

      • How is the tumor purity? Although the authors use 0.2% variant allele frequency as true mutation (for example Table 2), is the tumor purity lower than 0,2%?

      Thank you for the important comment. The calculated tumor purity of BilIN in the GB-S8 patient was 0.03 based on the WES data. We added this value to the manuscript (Page 6 Line 9) and Supplementary Table 1. Although variants were called in this case, the tumor purity was too low to estimate the allele-specific copy number, and thus sophisticated analysis as in other patients was not possible. In addition, the value of 0.2% in Table 2 is not the VAF, but cellular prevalence calculated by PyClone and CITUP. Although the value is low in the primary tumor, it is mentioned because it is high in metastatic lesions.

      • BilIN and GBAC of GB-S7 have some shared mutations. Why do the authors conclude that BilIN and GBAC have distinct origins? Do the authors think that those shared mutations are germline mosaic mutations?

      Thank you for the important comment.

      1) We think that the BilIN and GBAC of the GB-S7 patient are tumors of different origins because BilIN and GBAC of the GB-S7 patient have different truncal mutations (Figure 1—figure supplement 2C). This is a markedly different feature compared to BilIN and GBAC samples of other patients. We have added an explanation for this part to the Results section (Page 6 Line 9-11).

      2) We do not think that mosaicism occurred at the developmental stage. In addition, although some mutations were identified from both BilIN and GBAC, we cannot determine their importance because either one of the lesions had a very low VAF ranging from 0.001 to 0.04. If the mosaicism occurred only in the GB at the developmental stage, the VAF values of the shared mutations should be much higher than the current values, and the VAF values of the two BilIN and GBAC lesions should be similar.

      • Was the copy number profile compared between BilIN and GBAC?

      Thank you for the constructive comment. In this study, we obtained allele-specific copy numbers using Control-FREEC version 11.5 (Boeva V et al., Bioinformatics. 2012). The copy number of the mutations in the GB-S8 patient's BilIN could not be estimated by Control-FREEC due to low tumor purity (0.03). In the case of GB-S7, BilIN and GBAC were determined to be of a different tumor origin and thus disregarded from the analysis.

    1. Author Response

      Reviewer #1 (Public Review):

      It's here where my very mild (I truly liked this article - it is well done, well written, and creative) comments arise. The implications for stochastic strategies immediately emerge in the early results - bimodal strategies come about from the introduction of two variables. There is not enough credence given to the field of stochastic behavior in the introduction - the introduction focuses too much on previous models of predator-prey interaction, and in fact, Figure 1, which should set up the main arguments of the article, shows a model that is only slightly different (slight predator adjustment) that is eventually only addressed in the Appendix (see below). The question of "how and when do stochastic strategies emerge?" is a big deal. Figure 1 should set up a dichotomy: optimal strategies are available (i.e., those that minimize Tdiff) which would predict a single unimodal strategy. Many studies often advocate for Bayesian optimal behavior, but multimodal strategies are the reality in this study - why? Because if you consider the finite attack distance and inability of fish to evoke maximum velocity escapes while turning, it actually IS optimal. That's the main point I think of the article and why it's a broadly important piece of work. Further framing within the field of stochastic strategies (i.e., stochastic resonance) could be done in the introduction.

      We appreciate the comment provided by the reviewer. We changed the second paragraph of the introduction so as to focus more on the protean tactic (stochasticity). We added a new figure (Figure 1 in the new version) to conceptually show the escape trajectories (ETs) of a pure optimal tactic, a pure protean tactic, a combination of optimal and protean tactics, and an empirically observed multimodal pattern. We explained each tactic and described that the combination of the optimal and protean tactics still cannot explain the empirically observed multiple preferred ETs.

      The revised paragraph (L49-66) is as follows: Two different escape tactics (and their combination) have been proposed to enhance the success of predator evasion [16, 17]: the optimal tactic (deterministic), which maximizes the distance between the prey and the predator (Figure 1A) [4, 14, 15, 18], and the protean tactic (stochastic), which maximizes unpredictability to prevent predators from adjusting their strike trajectories accordingly (Figure 1B) [19-22]. Previous geometric models, which formulate optimal tactics, predict a single ET that depends on the relative speeds of the predator and the prey [4, 14, 15, 18], and additionally, predator’s turning radii and sensory-motor delay in situations where the predator can adjust its strike path [23-25]. The combination of the optimal tactic (formulated by previous geometric models), which predicts a specific single ET, and the protean tactic, which predicts variability, can explain the ET variability within a limited angular sector that includes the optimal ET (Figure 1C). However, the combination of the two tactics cannot explain the complex ET distributions reported in empirical studies on various taxa of invertebrates and lower vertebrates (reviewed in [26]). Whereas some animals exhibit unimodal ET patterns that satisfy the prediction of the combined tactics or optimal tactic with behavioral imprecision (e.g., [27]), many animal species show multimodal ETs within a limited angular sector (esp., 90–180°) (Figure 1D) (e.g., [4, 5, 28]). To explore the discrepancy between the predictions of the models and empirical data, some researchers have hypothesized mechanical/sensory constraints [17, 29]; however, the reasons why certain animal species prefer specific multiple ETs remain unclear.

      All experiments are well controlled (I especially liked the control where you varied the cutoff distance given that it is so critical to the model). Some of the figures require more labeling and the main marquee Figure 1 needs an overhaul because (1) the predator adjustment model that is only addressed in the Appendix shouldn't be central to the main introductory figure - it's the equivalent of the models/situations in Figure 6, and probably shouldn't take up too much space in the introductory text either (2) the drawing containing the model variables could be more clear and illustrative.

      (1) According to this comment and comment #11 from reviewer #2, we moved the two panels in the figure (Figure 1B and D in the original version) to Appendix-figure 1, and accordingly, we changed the first paragraph of the Model section so as to clearly describe that we focus on Domenici’s model in this study (L103-108).

      As for Figure 6 (Figure 7 in the new version) and related parts, we tempered our claims to clearly describe that our model has only the potential to explain the different patterns of escape trajectories observed in previous works. We would like to keep this figure in the main text because it is fundamental to explain the potential applicability of our model to other predator-prey systems.

      (2) To alleviate the burden for readers, we added the model variables to the figure and made them colored (Figure 2B in the new version).

      Finally, I think a major question could be posed in the article's future recommendations: Is there some threshold for predator learning that the fish's specific distribution of optimal vs. suboptimal choice prevents from happening? That is, the suboptimal choice is performed in proportion to its ability to differentiate Tdiff. This is "bimodal" in a sense, but a probabilistic description of the distribution (e.g., a bernoulli with p proportional to beta) would be really beneficial. Because prey capture is a zero-sum game, the predator will develop new strategies that sometimes allow it to win. It would be interesting if eventually the bernoulli description could be run via a sampler to an actual predator using a prey dummy; one could show that the predator eventually learns the pattern if the bernoulli for choosing optimal escape is set too high, and the prey has balanced its choice of optimal vs. suboptimal to circumvent predator learning.

      We thank the reviewer for this constructive comment. Actually, we are now developing a dummy prey system. We added the following sentence in the Discussion to mention future research.

      The added sentence (L496-499): Further research using a real predator and dummy prey (e.g., [48]) controlled to escape toward an optimal or suboptimal ET with specific probabilities would be beneficial to understand how the prey balances the optimal and suboptimal ETs to circumvent predator learning.

      Reviewer #2 (Public Review):

      First, it is unclear how the dummy predator is actuated. The description in the Methods section does not clearly address how rubber bands are used for this purpose.

      To clearly mention how the dummy predator was actuated by rubber bands, we added a figure (Figure 3-figure supplement 3B) and the following sentences.

      The added sentences (L608-611): The dummy predator was held in place by a metal pipe anchored to a four-wheel dolly, which is connected to a fixed metal frame via two plastic rubber bands (Figure 3—figure supplement 3B). The wheel dolly was drawn back to provide power for the dummy predator to strike toward the prey.

      Second, the predator's speed, which previous research has identified as a critical factor during predator-prey interactions, is not measured from the motion of the dummy predator in the experiments. Instead, it is estimated using an optimization algorithm that utilizes the mathematical model and the prey-specific parameters. It is unclear why the authors chose this method over measuring velocity from their experiments. Since the prey fish are responding to a dummy predator moving toward them at a particular speed during the interaction, it is important to measure the speed of the predator or clearly explain why estimating it using an optimization procedure is more appropriate.

      We chose this method (optimizing predator speed from the prey’s viewpoint) because there was no significant effect of predator speed on the escape trajectory in our experiment (L203-208). In other words, we considered that, at least in our case, the prey did not change the escape trajectory in response to the predator speed, and thus it would be more appropriate to use a specific predator speed estimated through an optimization algorithm from the prey’s point of view. It may be appropriate to use measured predator speed in other cases where the prey adjusts the escape trajectory in response to the change in predator speed. Therefore, we conducted a further analysis using actual predator speeds (both the predator speed at the onset of escape response, and the mean speed for the predator to cover the distance between the predator and prey). The results show that the model fit became worse when using measured predator speed per trial compared to the model using the fixed predator speed estimated through the optimization procedure (Table 3—source data 1; Figure 5—figure supplement 1). We added the above explanation in L219-226.

      One of the major claims of this article is that the model can explain escape trajectories observed in other predator-prey systems (presented in Figure 6). Figure 6 panels A-C show the escape responses of different prey in response to some threatening stimuli. Further, panels D-F suggest that the empirical data can be predicted with the model. But the modeling parameters used to produce the escape trajectories in D-F are derived from the authors' experiments with fish, instead of the experiments with the species shown in panels A-C.

      We thank the reviewer for this comment. We agree that this part in the previous version was an over-interpretation. Therefore, we tempered our statements to simply suggest that our approach has the potential to explain multiple ETs observed in other taxa. The revised sentences are as follows.

      Abstract (L27-30): By changing the parameters of the same model within a realistic range, we were able to produce various patterns of ETs empirically observed in other species (e.g., insects and frogs): a single preferred ET and multiple preferred ETs at small (20–50°) and large (150–180°) angles from the predator.

      Results (L395-407): Potential application of the model to other ET patterns. ...(sip)... To investigate whether our geometric model has the potential to explain these different ET patterns, we changed the values of model parameters (e.g., Upred, Dattack) within a realistic range, and explored whether such adjustments can produce the ET patterns observed in the original work. ...(sip)... These results indicate that our model has the potential to explain various patterns of observed animal escape trajectories.

      Discussion (L538-548): We show that our model has the potential to explain other empirically observed ET patterns (Figure 7). ...(sip)... Further research measuring the escape response in various species and applying the data to our geometric model is required to verify the applicability of our geometric model to various predator-prey systems.

    1. Author Response

      Reviewer #3 (Public Review):

      The authors use two-photon imaging to visualize various axonal organelle populations that they have virally labeled with fluorescent proteins, including DCVs and late endosomes/ lysosomes. The latter topic is a bit contentious, as the authors use two labels that tag potentially overlapping and not highly specific markers so that the nature of the tagged organelle populations remains unclear. Notably, the authors also have previously published a detailed account of how DCVs traffic in vivo, so the novelty is mostly in comparing the behavior of different organelles and the potential influence of activity.

      Overall, the reported results mostly corroborate the expectations from previous in vitro and in vivo work on these organelles and other cargoes, performed by the authors and their collaborators, as well as in many other laboratories:

      (i) Different organelles have different transport behaviors regarding speed, the ratio of anterograde to retrograde moving organelles, etc.

      (ii) Organelles move in different ways when they pass specific anatomical landmarks in the axons, such as presynaptic terminals.

      (iii) Activity of a neuron (here measured by calcium imaging) can impact the measured transport parameters, albeit in a subtle and mechanistically not well-defined manner. The chosen experimental design precludes a more detailed analysis, for example of the precise movement behavior (such as defining the exact pausing/movement behavior of organelles, which would require higher imaging speeds) or of a correlation of different organellar behavior at synaptic sites or during activity (which would require three-channel simultaneous imaging of two organelle classes plus a synaptic or activity marker).

      In summary, this publication uses sophisticated in vivo labeling and imaging methods to corroborate and complement previous observations on how different axonal organelles move, and what influences their trafficking.

      We thank the reviewer for the time dedicated to our manuscript. We are thankful for the critical and specific comments, which allowed us to further improve our manuscript. We agree that it would have been beneficial to have higher frame rates and there instead of two imaging channels. However, this would have further added technical complexity to an already complex experimental setup resolving fluorescent puncta with sizes below the resolution limit. And we are convinced that all our main conclusions are justified based on the imaging settings in the current data sets.

    1. Author Response:

      Reviewer #2 (Public Review):

      The study is well designed and provides exciting new insights into the plasticity of intracortical connections, (over-)compensating for the partial loss of thalamic inputs. To optically resolve the activity of single synapses in vivo during sensory stimulation is technically very challenging. It would be helpful to know whether the recordings were made in the binocular or monocular region of V1. The results argue against a generalized multiplicative upscaling of all inputs and suggest selective boosting of synapses that are part of sensory-driven subnetworks. However, it is not clear whether homeostatic plasticity occurred at the observed spines themselves or on the level of presynaptic neurons, which could then e.g. fire more bursts, leading to larger postsynaptic Ca transients. The possibility that thalamic inputs from the intact eye in layer 4 could be potentiated should be discussed. It would probably help to explain to the reader the layer-specific connectivity of V1 in the introduction, and why thalamic input synapses themselves were not optically monitored (may require adaptive optics). Technical limitations are a main reason why the conclusions are somewhat vague at this point ("... regulation of global responses"), this could be spelled out better.

      We thank the reviewer for these suggestions. We agree with the reviewer that we cannot determine (due to technical limitations) whether the changes are occurring pre- or post-synaptically or some combination (also related to the reviewer’s point 8). We have added this point to the discussion.

      "Finally, it is important to note that while we made these measurements in layer 5 pyramidal cells, the homeostatic changes mediated by TNF-α could occur outside of layer 5, including changes to upstream inputs or changes to the presynaptic responses, either through changes in presynaptic release (Vitureira et al., 2012) or through a change in activity patterns of the presynaptic cell (e.g., bursts compared to single spikes) (Linden et al., 2009)."

      One important point that was unclear in the earlier version of the manuscript is that the experiments conducted in visual cortex were done in the monocular visual cortex. As explained in comments to reviewer 1, there are not any visually-evoked responses following enucleation in our experiments.

      Reviewer #3 (Public Review):

      Weaknesses are largely restricted to suggested changes to the writing - specifically, there are additional explanations of the data whose discussion may strengthen the long-term impact of the manuscript.

      1) Most importantly, the hypothesis at the heart of this work (subset versus global processes) is framed as orthogonal to the status quo model of homeostatic processes (global). I suspect that adherents to the global argument would quickly point out that the current work is conducted in adult animals, and the majority of the homeostatic plasticity research (which forms the basis of the global model) is conducted in juvenile animals. This is an important distinction because the visual system is enriched in plasticity mechanisms during the ocular dominance critical period. Since Hubel and Wiesel at least, there is extensive evidence to suggest that sensory systems take advantage of critical periods to set themselves up in accordance with the statistics of the world in which they are embedded. The flip side of this is that sensory systems are far less readily influenced by experience once the critical period is closed (Vital-Durand et al., 1978, LeVay et al., 1980; Daw et al., 1992, Antonini et al., 1999, Guire et al., 1999, Lehmann and Lowel, 2008). Through this lens, one might predict that a key feature of the adult cortex is that sensory spines could benefit by being selectively protected from what would otherwise be global homeostatic processes. Either way, the manuscript can be read as if it is framing a show-down between the classical model and a newer, higher-resolution model. I worry that this will be interpreted as misleading without careful presentation/contextualization of the role of development in the introduction and a thorough dissection in the discussion. Currently, the first occurrence of the word, "adult", occurs in the methods, on page 27, line 512. "Juvenile" and "critical period" are not in the manuscript. The age of the animals in this study isn't mentioned until the methods (between P88 and P148 at the time of imaging).

      2) Goel and Lee (2007) seem quite pertinent here: they show that L2/3 neurons give rise to homeostatic regulation of mEPSCs in both juvenile and adult animals, but that the process is no longer multiplicative in nature once the animal is post-critical period. Multiplicity has been the basis of the argument for global change since Turrigiano 1998. Thus, the Goel and Lee finding seems to really bolster the current findings - and also perhaps reconcile the likelihood of a mechanistic difference between CP and adult homeostatic plasticity.

      We fully agree with the reviewer that our results are not in conflict with the developmental synaptic scaling literature. We have changed the text throughout the manuscript to highlight previous studies at different ages and made clear the age of the animals in this work (including in the abstract, introduction, results and discussion). We have also referenced Goel and Lee, 2007, which we agree should be included and thank the reviewer for pointing this out.

    1. Author Response

      Reviewer #1 (Public Review):

      While eDNA methods are becoming more established, there remains skepticism by many in the scientific community about the origins of the detected DNA (e.g. does it drift in from other areas or water layers?). If these concerns aren't addressed (i.e. by citing supporting literature on the fate of eDNA), the different biodiversity profiles between trenches could possibly be explained by differing oceanography. There is also some important methodological information that is missing from this manuscript. For example, sampling volumes will affect the amount of biodiversity detected, but it is not clear if sample volumes are consistent across depths and study areas. It was also not indicated whether field controls (blanks) were taken to assess the potential contamination of samples. Lastly, the literature in the eDNA field is progressing rapidly and there are some missing papers (e.g Thomsen et al. 2016, Canals et al. 2021, McClenaghan et al. 2020, Govindarajan et al. 2021, etc.) that are relevant to the technique used in this manuscript and the habitat studied.

      We are very grateful to this reviewer for providing such an in-depth review of our manuscript that allowed us to improve our manuscript significantly. We tried to follow explicitly almost every suggestion. In particular, we appreciate the input of other important missing literature that we readily included in this new version of our paper. The data on the volume of seawater filtered for each sample is given in Table Supplementary file 1a. Regarding field blanks, they were not collected per se. However, as part of the molecular protocol used (see Methodology) a “negative extraction control” was applied to check for possible contamination. Also, from the results themselves, we carefully checked for any indication of contamination that could have biased our results and conclusions.

      Reviewer #2 (Public Review):

      My primary critique is the near-absence of statistical analyses in the current version of the manuscript that are necessary to support the many descriptive observations made with a more formal hypothesis testing framework, as well. Developing an appropriate framework for such analyses throughout the paper, including consideration of the multiple tests that will be performed. This is important for many reasons, including by providing a more formal sense of uncertainty in the conclusions to readers, given the understandable sampling limitations. Planning and conducting these analyses will require considerable work.

      We thank the reviewer for raising such concern. We did include statistical analyses in part of our work. For example, all the phylogenetic analyses (using the IQ-tree software) implicitly include statistical analyses. The calculation of the Gini index in Figure 2 is also a statistical measure. However, we agree with the reviewer that some of our results lacked statistical analysis. We thus now include statistical significance to more statements in the text and additional panels to Figure S2—figure supplement 1 (with support on data in new tables in Supplementary file 1h and 1i) to illustrate the statistical support to some of our claims. We have also removed some unnecessary statement.

    1. Author Response

      Reviewer #2 (Public Review):

      This is a single RNA-seq analysis of traumatic brain injury (TBI) in mice that looks at recovery from milder TBI. It addresses an important question of why older individuals may have poor recovery. The investigators undertake unbiased analysis in both young and old mice and identify a number of macrophage, fibroblast, lymphocyte, and more specifically B cell inflammatory programs that are activated and some of which do not recover well in older mice. Taken together, these findings identify unique pathways that could be further investigated in functional studies to examine what immunologic mechanisms in the meninges may drive long-term problems from TBI. The models and analysis are well performed and compelling. This paper can serve as a resource for those who study brain immunology. Open questions include the following: 1) What exactly predisposes to such pro-inflammatory programs in the aged meninges? Epigenetic alterations?, 2) What are the effector mechanisms that negatively impact brain function, and 3) Can bioinformatic approaches reveal putative intercellular communication networks that would lend insight into the spatiotemporal sequence of events and ligand-receptor interactions?

      We are glad to hear that the Reviewer finds our work compelling, well performed and that it will be a good resource for those who wish to study brain immunology. The open questions that the Reviewer brings forth are very compelling areas of future investigation that we believe will help to shape and advance this field in the coming years.

    1. Reviewer #1 (Public Review):

      In this manuscript by Feng et al., the authors investigate the mechanism regulating the development of the levator veli palatini (LVP) in the posterior palate/pharyngeal region. While set up as a model to understand how myogenic progenitors migrate to discrete sites to form individual muscles, it is not clear how applicable the findings are to other subpopulations, though this is not a weakness. The mechanisms driving LVP development are of great interest to a broad group of developmental biologists, as LVP malformation is a common problem even in mild cases of cleft palate. The authors hypothesized that the perimysial population within palatal mesenchyme cells is a niche required for pharyngeal muscle development. Using exquisite analysis of scRNA-seq data from E13.5-E15.5 palatal cells, the authors illustrate that TGFb signaling is likely involved in perimysial cell development, using gene expression analysis in wild-type palatal sections to show that TGFb signaling precedes the arrival of myogenic cells. Inactivating ALk5 in palatal mesenchyme cells results in failure of LVP formation. The authors continue by identifying a number of transcription factors that presumably function downstream of TGFb signaling that drive LVP development. Among these are Fgf18, in which SMAD sites observed in the upstream region were validated to bind Smad2/3. The authors also identify Creb5 as a potential regulator of Fgf18. Overall, this is a remarkable use of scRNA-seq data, in which findings are supported by subsequent in vivo analysis of gene function using knockout mouse models. These findings will drive further analysis of LVP development and may shed light on the myogenesis of pharyngeal muscle in general.

      Strengths

      1) The treatment of scRNA-seq data using a variety of bioinformatic programs illustrates the utility of this type of data when using sufficient analysis software. The description of the approach is very clear and concise and the controls appear excellent. Further, the use of multiple time points further improves the analysis.

      2) The focus of perimysial cell expression patterns supports the hypothesis of the authors, though as with this type of data, one probably can make a story out of several pathways. The use of RNAscope to carefully examine where TGFb signaling in the posterior pharynx occurs between E12.5 and E16.5 is critical to the setup of this manuscript and is well done. Further aiding the interpretation of these results are cartoons associated with the staining, which illustrate where the staining is occurring, though never over-stating the observed patterns.

      3) Careful histological analysis illustrates the poor myogenic differentiation in the LVP of OSr1-Cre;Alk5fl/fl embryos.

      4) Identifying that TGFb is more important for regulating late perimysial cell development is important in identifying the targets of TGFb signaling.

      5) The use of CellChat to identify sending and receiving cells is well done and further supports the late function of TGFb signaling, in this context working through Fgf18 and Lama4.

      6) The attempt to build a signaling network again using CellChat (Figure 6) is admirable, though there are a few caveats to that approach (see below).

      7) While bead implant studies have been used for 40 years, the approach of culturing a piece of the pharynx and then performing a bead implant to prove that Fgf18 can positively influence myogenic development is admirable.

      Weaknesses

      1) In general, the authors are careful to not suggest that staining is significant unless showing quantification, though, at several points, this is not true.

      2) The authors identify five putative Smad2 sites upstream of Fgf18, using one of them in a Cut and Run assay whose results suggest enhanced Smad2/3 binding. The problem is that this likely would have worked with the other Smad sites and probably would have worked for any other putative site that one might pick. Proving that a putative site can be bound by its cognate transcription factor is not the same as proving that this occurs in vivo and is sufficient to control the process of LVP development. One would need reporter assays using that TF binding site to better support the points being made by the authors.

      3) In a similar manner, the authors try to define which factors might function with TGFb signaling to regulate myogenic development. Using SCENIC, the authors found a number of genes that might be involved in perimysial fibroblast development. Of these, they illustrate that Creb5 siRNA knockdown decreases Fgf18 expression in cultured palates. The focus on Creb5 was based on it showing, "the most specific expression patterning the late perimysial cells (Figure 6H)....". In fact, Creb5 appears the most broad, appearing to be expressed across the entire LVP, not just in the area where myogenic precursors are found. Thus, any statement or discussion about Creb5 being a direct regulator of Fgf18 should be removed probably needs to be reworded. However, the second problem is that Creb5 knockdown reducing Fgf18 expression does not prove any direct regulation. Both of these are rather circuitous arguments.

      4) While the disorganization of myogenic fibers in the posterior LVP is somewhat obvious, it is not as clear as the authors suggest. This change (which I believe) needs to be better quantified (length, width, area, etc.).

      We thank the reviewer for these “Public Review” comments. For point 1, we have added more quantification for clarification and rephased the wording when quantification was not performed. For point 4, we added measurement to quantify the changes of volume and cross-section area of the LVP in Osr2Cre;Fgf18fl/fl mice (Figure 7M-V).

      Reviewer #2 (Public Review):

      In this study, the authors take advantage of unbiased scRNA-seq datasets of the developing mouse soft palate that they previously reported and performed a new bioinformatic analysis to identify differential signaling pathway activities in the heterogeneous palatal mesenchyme. They found a strong association of TGF-beta signaling pathway activity with the perimysial cells and validated through immunofluorescent detection of pSmad2, which led to their hypothesis that TGF-beta signaling in the perimysial cells might regulate palatal muscle formation. They generated and analyzed Osr2-Cre;Alk5fl/fl mice and showed those mice have cleft soft palate and disruption of the levator veli palataini (LVP) muscle. They then performed a comparative scRNA-seq analysis of the soft palate tissues from E14.5 Osr2-Cre;Alk5fl/fl and control embryos and showed that the Osr2-Cre;Alk5fl/fl embryos exhibited defects in the perimysial cells, in particular reduction in Tbx15+ perimysial fibroblasts that directly associate with the LVP muscle progenitors. The FGF18 is one of the most highly enriched signaling molecules in the perimysial cells and showed that the Osr2-Cre;Alk5fl/fl embryos exhibited reduced Fgf18 expression together with loss of MyoD+ myoblasts in the prospective LVP region. Further data showed that pSmad2 bound in the Fgf18 promoter region in the developing soft palate tissues. In addition, bioinformatic gene regulatory network analysis of the scRNA-seq data identified Creb5 as a potential tissue-specific transcription factor in the perimysial cells and RNAi knockdown assays in palatal mesenchyme culture suggested that Creb5 is required for Fgf18 expression. Further studies identified a subtle deficiency in LVP in Osr2-Cre;Fgf18fl/fl mice and showed that exogenous Fgf18 bead implantation in explants of E14 Osr2-Cre;Alk5fl/fl embryonic head increased the MyoD+ myoblast population in the prospective LVP region. The authors concluded that TGF-beta signaling and Creb5 cooperatively regulate Fgf18 to control pharyngeal muscle development. While the study used multiple complementary approaches and the data presented are solid, important questions need to be addressed to resolve reasonable alternative explanations of the data to the authors' main conclusion.

      We thank the reviewer for the evaluation and suggestions. Responses to each of the suggested revisions are detailed below.

      Major points:

      1) TGF-beta signaling is known to be crucial for neural crest-derived palatal mesenchyme cell proliferation from E13.5 to E14.5. The Osr2-Cre;Alk5fl/fl mutant embryos exhibited obvious disruption of LVP myogenesis and reduced soft palatal shelf size at E14.5 (Fig3-Sup2A-D and Fig 4H-K). The cellular and molecular defects likely started prior to E14.5. Thus, it is important to examine at earlier stages (E13.5/E14.0) whether the palatal mesenchyme was already defective in cell proliferation/survival and/or perimysial cell marker expression, including Creb5 and Tbx15, to resolve whether the primary defect in the Osr2-Cre;Alk5fl/fl palatal mesenchyme could be a reduction in perimysial progenitor cell proliferation and/or differentiation of the myoblast-associated subset, for which Tbx15 and Fgf18hi act as marker genes rather than direct molecular targets. Furthermore, the apparent loss of Tbx15+ cells coincided with a specific reduction of Fgf18 expression in the myoblast-associated perimysial cells (Fig 4J/K versus Fig 5H-K), which raises the possibility that TGF-beta signaling regulates the differentiation of the Tbx5+ population from the mesenchymal progenitors while the reduction in Fgf18 expression might be a secondary consequence of the cellular defect. The data in Fig 6O showing a lack of significant induction of Fgf18 expression in the palatal mesenchyme culture in both control and Creb5-RNAi cells is also consistent with this alternative explanation.

      We thank the reviewer for the valuable suggestion to identify the primary defects of the perimysial cells. We compared the expression of Creb5, Tbx15 and Fgf18 as well as Smoc2 in E13.5-E14.5 palatal mesenchyme from control and Osr2-Cre;Alk5fl/fl mice (Osr2Cre;Tgfbr1fl/fl mice). We found that expression of Creb5 is prominent from E13.5 to E14.5 and is not affected in Osr2Cre;Tgfbr1fl/fl mice, suggesting that Creb5 may not be a downstream factor but just a “partner” for TGF-β signaling. At E13.5, Tbx15 is not expressed, while Smoc2 is expressed extensively in the palatal mesenchyme but is not affected in the Osr2Cre;Tgfbr1fl/fl mice. In contrast, Fgf18 is expressed as early as E13.5 and this expression was already reduced in the palatal of Osr2Cre;Tgfbr1fl/fl mice relative to controls at this stage, suggesting the changes of Fgf18 expression are primary and precede changes in the perimysial populations. While the proliferation and apoptosis at E13.5 remain unchanged in Osr2Cre;Tgfbr1fl/fl mice, Smoc2 expression in the palate starts to be reduced at E14.0 in Osr2Cre;Tgfbr1fl/fl mice. This suggests that TGF-β signaling is required for the activation of Smoc2 during E13.5-E14.0. In parallel, Tbx15 expression is just starting to be activated in a few cells at E14.0 and this expression increased between E14.0-E14.5 in the control but failed to increase in Osr2Cre;Tgfbr1fl/fl mice. This suggests that TGF-β signaling is also required for the activation of Tbx15 during E14.0-E14.5. Thus, loss of TGF-β signaling leads to differentiation defects of both Smoc2+ and Tbx15+ perimysial cells. For Figure 6O, we performed a time-course experiment of TGF-β induction and found a significant increase of Fgf18 expression after 4 to 18 hours of treatment (instead of 24 hours used in previous experiments), with more obvious changes at 4 hours, confirming the early response of Fgf18 expression to TGF-β induction. These results have been added to Figure 4-figure supplement 2, Figure 5I-L, 5U, Figure 6-figure supplement 2, and Figure 6C.

      2) Since the Osr2-Cre;Fgf18fl/fl mice exhibited much subtler palatal and LVP defects than the Osr2-Cre;Alk5fl/fl mice even though the latter still had a lot of Fgf18-expressing perimysial cells at E14.5, Fgf18 is likely a minor player in the TGF-beta mediated gene regulatory network regulating LVP formation. The major players acting downstream of TGF-beta signaling in the palatal mesenchyme, that control initial LVP progenitor migration to and/or proliferation in the soft palate region, remain to be identified and functionally validated. Whether and how Fgf18 directly regulates the perimysial-myoblast interaction is also not known.

      We agree with the reviewer that the phenotype of Osr2-Cre;Fgf18fl/fl mice is much milder than that of Osr2-Cre;Alk5fl/fl mice, as we postulate that Fgf18 is just one of several perimysial-derived signals that may be affected. It will be of great interest to explore the function of other players in future studies. However, we are more inclined toward the possibility that there may be no single “major” player but rather a combination of many signals associated with different aspects of the muscle development. For example, loss of Fgf18 seems to mainly affect the Myf5+ cell proliferation in Osr2-Cre;Fgf18fl/fl mice (Osr2Cre;Fgf18fl/fl mice), as we do not observe any differentiation defect except the reduced muscle size. It is likely that other factors may also play specific functions in specific subpopulations as well. To clarify whether Fgf18 can directly affect the myogenic cell fate, we treated C2C12 mouse myogenic cells with exogenous FGF18 and found that this treatment could indeed significantly increase the proliferation of these cells. We have added these results to Figure 7—figure supplement 2.

      3) While the title and the main conclusion of this manuscript imply a crucial role of Creb5 in the regulation of pharyngeal muscle development, there is no data supporting such a crucial role. Do Creb5-/- mice have specific defects in pharyngeal muscle development?

      We thank the reviewer for this insight. We agree that it is very likely that Creb5 itself may have many roles in the regulation of palatal development or pharyngeal muscle development, given the prominent expression of Creb5 throughout soft palate development and in other myogenic sites of the pharyngeal muscles. Creb5-/- mice (reported as Cre-bpa-/- mice) die immediately after birth; however, the detailed phenotype of this mice was merely described as “data not shown” in a previous publication and defects of craniofacial development in these mice remain unclear (Maekawa et al., 2010). In this study, we focused on the role of Creb5 as a partner of TGF-β signaling, but we plan to generate a Creb5fl/fl mouse model to thoroughly evaluate Creb5’s functions in craniofacial development as an independent study following this work.

      4) Data in Fig 6 are not sufficient to conclude that TGF-beta signaling and Creb5 cooperatively regulate Fgf18. The TGFb1 treatment did not significantly induce Fgf18 expression in either the control or Creb5-RNAi palate mesenchyme cells (Fig 6O). No data regarding how they act cooperatively to regulate Fgf18 expression.

      We appreciate the reviewer for carefully reviewing our data. We re-evaluated the temporal response of Fgf18 expression following TGF- induction and found a significant increase of Fgf18 expression 4 hours post-treatment (instead of 24 hours post-treatment as used in previous experiments). We repeated the Creb5-siRNA treatment experiment using the new experimental condition and replaced the previous Figure 6O with new results showing a significant increase of Fgf18 after TGF-β induction, which was attenuated by Creb5-RNAi treatment, suggesting a requirement of Creb5 for TGF-β-mediated Fgf18 expression. The new result is now included in Figure 6Q.

      Reviewer #3 (Public Review):

      In this study, the authors investigated cell-cell communication between perimysial cells and skeletal muscle progenitors during soft palate development in the mouse. The authors have previously reported on the development of this structure and here they propose that a TGF-β signaling and Creb5 act to regulate Fgf18, and this pathway regulates pharyngeal muscle development through the indicated cell populations. The study is of high quality, very nicely illustrated, and uses multiple approaches including inferences from single cell transcriptomics, validations on sections, and lineage-specific gene activations. In addition, the authors successfully optimized an organ culture system from thick sections to test locally the role of FGF signaling (bead implantation). The results largely confer with the conclusions and provide a valuable example of how subjacent cell populations cooperate to establish an embryonic structure.

      We thank the reviewer for the evaluation and suggestions.

    1. Author Response

      Reviewer #3 (Public Review):

      The PCNT gene is found on human chromosome 21, and the same group previously showed that its increased expression is associated with reduced trafficking to the centrosome and reduced cilia frequency, which suggests a possible connection between cilia and ciliary trafficking, SHH signaling, and Down syndrome phenotypes. Jewett et al build upon this prior work by closely examining the trafficking phenotypes in cellular models with different HSA21 ploidy, or its mouse equivalent, thereby increasing the copy number of PCNT (3 or 4 copies of HSA21). They show that most of the trafficking defects can be reversed through the knockdown of PCNT in the context of HSA21 polyploidy. They also begin to examine the in vivo consequences of these trafficking disruptions, using a mouse model (Dp10) that partially recapitulates trisomy 21, including an increased copy number of PCNT. While I think this work advances our understanding of the trafficking defects caused by increased PCNT and has significant implications for our understanding of the cellular basis of a major hereditary human disorder, some improvements can be made to strengthen the conclusions and improve readability.

      Major points:

      I'm a little confused by the authors' conclusion that the increased PCNT levels in T21 and Q21 result in delayed but not attenuated ciliogenesis. The data show lower percentages of ciliated cells at all time points analyzed (Fig 1E) by quite a large margin in both T21 and Q21. Do the frequencies of cilia in the T21 or Q21 cells ever reach the same level as D21, say after 48-72 hours? If not it seems like not simply a delay. A bit more clarity about this point is needed.

      We have now performed a ciliation time course in RPE1 D21, T21, and Q21 cells over 7 days. Our new data confirms that increasing HSA21 dosage delays but does not abolish ciliogenesis (Fig S1H). By day 3 of serum depletion, D21 and T21 cells reach similar ciliation frequencies, and after 4 days all three cell lines reach similar ciliation frequencies.

      The in vivo analysis of the cerebellum was interesting and important but it felt a bit incomplete given that it was a tie between the cell biology and a specific DS- associated phenotype. For example, it is interesting that the EGL of the P4 Dp10 pups is thinner. Does this translate into noticeable defects in cerebellar morphology later? Is there a reduction in proliferation that follows the reduced cilia frequency? I think it would be possible to look at the proliferation and cerebellar morphology at some additional stages without becoming an overly burdensome set of experiments. At a minimum, are there defects in cerebellar morphology at P21 or in the adult mice? The authors allude to developmental delays in these animals - maybe that complicates the analysis? But additional exploration and/or discussion on this point would help the paper.

      We have now analyzed P21 animals and found no significant differences in ciliation frequency or gross cerebellar morphology at this age. This is consistent with our new tissue culture data demonstrating that HSA21 ploidy delays but does not abolish ciliogenesis. We cannot rule out long term changes in neuronal processes or glial cells, but we believe this analysis is outside the scope of this paper.

      It was a bit unclear to me why specific cell lines were used to model trisomy 21 and why this changed part way through the paper. I understand the justification for making the Dp10 mice- to enable the in vivo analysis of the cerebellum, but some additional rationale for why the RPE cell line is initially used and then the switch back to mouse cells would improve readability.

      The rationale for switching to MEFs was twofold. First, Shh ciliary signaling cannot be easily studied in RPE1 cells. Therefore, ciliary function via Smoothened localization or GLI1 transcription, needed to be performed in a different cell line and the most commonly used line is MEFs. Second, the Dp mice allowed us to tease apart contributions to cilia defects from separate regions of HSA21. We have worked to clarify this point in the text.

    1. Author Response

      Reviewer #2 (Public Review):

      Grasses develop morphologically unique stomata for efficient gas exchange. A key feature of stomata is the subsidiary cell (SC), which laterally flanks the guard cell (GC). Although it has been shown that the lateral SC contributes to rapid stomatal opening and closing, little is known about how the SC is generated from the subsidiary mother cell (SMC) and how the SMC acquires its intracellular polarity. The authors identified BdPOLAR as a polarity factor that forms a polarity domain in the SMC in a BdPAN1-dependent manner. They concluded that BdPAN1 and BdPOLAR exhibit mutually exclusive localization patterns within SMCs and that formative SC division requires both. Further mutant analysis showed that BdPAN1 and BdPOLAR act in SMC nuclear migration and the proper placement of the cortical division site marker BdTANGLED1, respectively. This study reveals a unique developmental process of grass stomata, where two opposing polarity factors form domains in the SMC and ensure asymmetric cell division and SC generation.

      The findings of this study, if further validated, are novel and interesting. However, I feel that the data presented in the current manuscript do not fully support some crucial conclusions. The lack of dual-color images is the weakest point of this study. If it is technically impossible to add them, alternative analyses are needed to validate the main conclusions.

      1) Is BdPOLAR-mVenus functional? Although the authors interpret that weak BdPOLAR-mVenus expression partially rescued the bdpolar mutant phenotype in Fig. S4D, the localization pattern visualized by BdPOLAR-mVenus may not be completely reliable with this partial rescue activity.

      This is indeed a valid point. The partial complementation of weakly expressing translational reporters (Figure 3–figure supplement 1D) and the weak effect of BdPOLAR-mVenus overexpression lines (Figure 3–figure supplement 1J) at least suggest partial functionality which is strongly dependent on dosage. Yet the localization pattern and the temporal dynamics might indeed not fully reflect the spatiotemporal dynamics of the endogenous BdPOLAR. This criticism is, however, true for any transgenic reporter line–even when fully complementing–as the requirement for dosage, stability, and turnover likely varies strongly between different protein classes and functions.

      Nonetheless, we have added a sentence on p. 7, which mentions this potential caveat.

      2) Regardless of the functionality of the tagged protein, the authors need to provide more information on their localization. For example, is there a difference in polarity pattern depending on expression level? Does overexpressed BdPOLAR-mVenus invade the BdPAN1 zone? In such cases, might the loss of BdPOLAR polarity in the bdpan1 mutant be a side effect of overexpression, not PAN1 exclusion? Does BdPOLAR expression (no tag) show a dose-dependent effect, similar to the mVenus-tagged protein?

      The difference in polarity patterns in bdpan1 mutants and wild-type does not depend on expression level. BdPOLAR-mVenus was crossed into bdpan1 and mutant and wild-type siblings in the F2 generation were analyzed. This means that the data presented in Fig. 3E and F show exactly the same transgene insertion line in wt and bdpan1 and were imaged with the same setting for comparability. Therefore, the difference in localization is not due to different expression levels but indeed reflects a PAN1-dependent effect.

      To address if BdPOLAR without a tag is also sensitive to dosage, we have generated an untagged complementation line that includes the untagged, genomic locus of BdPOLAR including promoter (-3.1kb) and terminator (+1.1kb). Yet, even though this construct is much better at rescuing the mutant, we still see remaining defects in T0 lines (Figure 3–figure supplement 1K) suggesting that even without a tag we cannot fully recapitulate wild-type functionality. Yet, to actually measure protein levels of untagged BdPOLAR, we would need to raise an antibody against BdPOLAR, which we think is clearly out of the scope of this study.

      3) A major conclusion of this study was that the polarity domains of BdPOLAR and BdPAN1 are mutually exclusive. However, not all the cells in the figures were consistent with this statement. For example, the BdPOLAR signals at the GMC/SMC interphase appear to match BdPAN1 localization (compare 0:03 s in Video 1 and 0:20 s in Video 2 [top cell]). The 3D rendered image in Fig. 2F shows that BdPOLAR is excluded near the GMC on the front side of the SMC, where BdPAN1 is not localized. Some cells did not exhibit polarization (Fig. 3A, bottom left; Fig. 3E, bottom left). The most convincing data are the dual-color images of these two proteins. Otherwise, a sophisticated image analysis is required to support this conclusion.

      We agree that dual-color image analysis would have provided the most convincing data. As mentioned in our answers to the reviewing editor and reviewer 1, we have generated a dual marker line (BdPAN1p:BdPAN1-CFP; BdPOLARp:BdPOLAR-mCitrine), yet the BdPAN1-CFP signal (compared to mCitrine signal) was too weak to visualize the proximal BdPAN1 domain.

      This issue was also raised by reviewer 1 and deemed an essential revision. To determine how BdPOLAR and BdPAN1 relate spatially to each other, we have added data in Figure 2E where we manually traced mature SMC outlines to determine BdPOLAR-mVenus and BdPAN1-mCitrine occupancy along the SMC’s circumference. This confirmed that the polarization is indeed opposite yet not perfectly reciprocal (see details above, Essential Revisions #1).

      Finally, we realized that the 3D image renderings were more confusing than helpful and we removed them from the revised version.

      4) Another central conclusion was that BdPOLAR was excluded at the future SC division site, marked with BdTANGLED1. However, these data are also not very convincing, as such specific exclusion cannot be seen in some figure panels (e.g., Fig. 3A, bottom left; Fig. 3E, all three cells on the left). If dual-color imaging is not feasible, a quantitative image analysis is needed to support this conclusion.

      As for point 3, this was also criticized by reviewer 1 and deemed an essential revision by the reviewing editor.

      To determine whether the absence of BdPOLAR signal and the presence of BdTAN1 signal colocalize, we again manually traced mature SMC outlines to determine BdPOLAR-mVenus and BdTAN1-mCitrine occupancy along the SMC’s circumference. We plotted the relative average fluorescence intensity in Figure 4G-I nicely showing that BdTAN1 indeed resides in the BdPOLAR gaps above and below the GMC (again, details above, Essential Revisions #2).

      5) I could not find detailed imaging conditions and data processing methods. Are Figs. 2B and 2E max-projection or single-plane images? If they are single-plane images, which planes of the SMC are observed? In addition, how were Figs. 2C and 2F rendered? (e.g., number of images, distance intervals, processing procedures). This information is important for data interpretations.

      We agree that we might not have provided sufficient imaging condition details and have added more details regarding image acquisition in the method part (p. 20). We always use a consistent depth and show the midplane of SMCs. As mentioned above, we removed Figs. 2C and 2F and the supplemental movies as these data did not seem to be helpful.

      6) [Minor point] The authors should clearly describe where BdPAN1 is expressed and localized. Is it expressed in the GMC and localized at the GMC/SMC interface? Alternatively, is it expressed and localized in the SMC?

      BdPAN1 is expressed throughout the epidermis but starts to strongly accumulate at the GMC/SMC interface. According to the literature (Cartwright et al 2009 with immunostainings against ZmPAN1 and Sutimantanapi et al. 2014 with PAN1 and PAN2 reporter) and our own observations (Fig. S3), this accumulation occurs in the SMC rather than in the GMC. In Fig. S3A, third panel, second GMC from the top, for example, one can see that the early PAN1 polarity domain expands beyond the GMC/SMC interface suggesting that it is indeed forming in SMCs rather than in GMCs. We have specified this in the text more clearly now (p. 5).

    1. Author Response

      Reviewer #1 (Public Review):

      The research investigates the genetic basis for resistance to high CO2 levels in the human pathogenic fungus Cryptococcus neoformans. Screening collections of over 5,000 gene deletion strains revealed 96 with impaired growth, including a set of genes all related to the same RAM signaling pathway. Further genetic dissection was able compellingly to place where this pathway lies relative to upstream inputs and through the isolation of suppressor mutants as potential downstream targets of the pathway. Given the high levels of CO2 encountered by fungi in the human host, this work may provide new directions for the control of disseminated fungal disease.

      The research presents both strengths and weaknesses.

      Strengths include:

      (1) One of the largest scale analyses of genes involved in growth under high CO2 concentrations in a fungus, revealing a set of just under 100 mutants with impaired growth.

      (2) Elegant genetic epistasis analysis to show where different components fit within a pathway of transmission of CO2 exposure. For example, over expression of one of the kinases, Cbk1, can overcome the CO2-sensitivity of mutations in the CDC24 or CNA1 genes (but not in the reciprocal overexpression direction).

      (3) Isolation of suppressor mutations in the cbk1 background, now able to grow at high CO2 levels, was able to lead to the identification of two genes. Follow up characterization, which included examining in vitro phenotypes, gene expression analysis, and impact during mouse infection was able to reveal that the two suppressors restore a subset of the phenotypes impacted by mutation of CBK1. Indeed, one conclusion from this careful work is that the reduced virulence of the cbk1 mutant is not due to its sensitivity to high levels of CO2, perhaps an unexpected finding given the original goals of the study towards linking CO2 sensitivity with decreased virulence.

      Weaknesses include:

      (1) What is the rationale for examining gene expression using the NanoString technology of 118 genes rather than a more genome-wide approach such as RNA-sequencing?

      (2) Without additional species examined, some of the conclusions about differences in impact between ascomycetes and basidiomycetes might instead reflect differences between species. For example, RAM mutants in other strains of C. neoformans do not exhibit so strong a temperature sensitive phenotype. Or to extend the comparison further, one might assume given the use of CO2 for Drosophila manipulations that the RAM pathway components in an insect would not be required for surviving high CO2.

      (3) Given the relative ease of generate progeny of this species, it would have been informative to explore if the suppressors of cbk1 also suppressed the loss of genes like CDC24, CNA1, etc, equivalent to the experiment performed of overexpression of CBK1 in those backgrounds.

      We thank the reviewer for the kind summary of our work and the highlights of the major findings. We chose NanoString because we have already generated a probe set of 118 genes that are differentially expressed in response to CO2 based on RNA-seq profiles of multiple natural cryptococcal isolates in a separate study. Nanostring allowed us to focus on CO2 relevant transcripts and do multiple replicates and conditions in a way that is not practical using RNA-Seq.

      Although the RAM pathway has not been extensively characterized in different species of Cryptococcus, we do know that RAM pathway mutants lead to pseudohyphal growth in multiple strain backgrounds including two different species of Cryptococcus (Magditch, Liu, Xue, & Idnurm, 2012; Walton, Heitman, & Idnurm, 2006). We have added corresponding references and discussed this point on lines 167-169.

      We agree with the reviewer that it would be interesting to test the effects of the cbk1Δ suppressor mutations in the backgrounds of other CO2-sensitive gene knockout strains. This is part of our plan for future investigation in characterizing the signaling pathways involved in CO2 tolerance.

      Reviewer #2 (Public Review):

      In the paper by Chadwick et al., the authors identify the molecular determinants of CO2 tolerance in the human fungal pathogen Cryptococcus neoformans. The authors have screened a collection of deletion mutants to identify the genes that are sensitive at 37oC (host temperature) and elevated CO2 levels. The authors identified that the genes responsible for CO2 sensitivity are involved in the pathways responsible for thermotolerance mechanisms such as Calcineurin, Ras1-Cdc24, cell wall integrity, and the Regulator of Ace2 and Morphogenesis (RAM) pathways. Moreover, they identified that the mutants of the RAM pathway effector kinase Cbk1 were most sensitive to elevated temperature and CO2 levels. This study uncovers the previously unknown role of the RAM pathway in CO2 tolerance. Transcriptome data indicates that the deletion of CBK1 results in an alteration in the expression of CO2-related genes. To identify the potential downstream targets of Cbk1, the authors performed a suppressor screen and obtained the spontaneous suppressor mutants that rescued the sensitivity of cbk1 mutants to elevated temperature and CO2. Through this screen, the authors identified two suppressor groups that showed a modest improvement in growth at 37˚C and in presence of CO2.

      Interestingly, from the suppressor screen, the authors identified a previously known interactor of Cbk1 which is SSD1, and an uncharacterized gene containing a putative Poly(A)-specific ribonuclease (PARN) domain named PSC1 (Partial Suppressor of cbk1Δ) which acts downstream of Cbk1. Deletion of these two genes in cbk1 null mutants rescued the sensitivity to elevated CO2 levels and temperature but did not fully rescue the ability to cause disease in mice.

      This study highlights the underappreciated role of the host CO2 tolerance and its importance in the ability of a fungal pathogen to survive and cause disease in host conditions. The authors claim to gain insight into the genetic components associated with carbon dioxide tolerance. The experimental results including the data presented, and conclusions drawn do justice to this claim. Overall, it is a well-written manuscript. However, some sections need improvement in terms of clarity and experimental design.

      • One major drawback of the study is the virulence assay performed to test the ability of cbk1 mutants to cause the disease in the mouse model. The cbk1 null mutants are thermosensitive in nature. Using these mutants, establishing the virulence attributes in mice would undermine the mutants' ability to infect mice as they won't be able to survive at the host body temperature.

      • The rationale for choosing the genes to test further is not clear in two instances in the study. a) From a list of 96 genes, how do the authors infer the pathways involved? Was any pathway analysis performed that helped them in shortlisting the pathways that they subsequently tested? A GO term analysis of the list of genes identified through the genetic screen would be more helpful to get an overview of the pathways involved in CO2 tolerance. b) The authors do not clearly mention why they chose only four genes to test for the CO2 sensitivity out of 16 downregulated genes identified from the nano string analysis.

      • It would be more useful to the readers if the authors could also include a thorough analysis of the presence of the putative PARN domain-containing protein across various fungal species rather than mentioning that it is only observed in C. neoformans and S. pombe. Also, the authors may want to discuss the known role(s) of SSD1, if any, in pathogenic ascomycetous yeasts so that the proposed functional divergence is supported further.

      We are glad that the reviewer appreciated the approach, the findings, and the significance of this research, and we are grateful for the helpful suggestions to improve the manuscript.

      To remove temperature sensitivity as a variable when testing virulence, we have added a new infection model in the revised manuscript to test the cbk1Δ mutant and its suppressors. This infection model uses the Galleria mellonella larvae as a host. G. mellonella larvae are commonly used to test virulence for temperature sensitive strains as the body temperature of the larvae is similar to that of the environment. We performed cryptococcal infection in this model and the larvae were kept at 30°C rather than at 37°C. The results of these experiments are now described in results section 5 and shown in Figure 6 of the manuscript. The data using the larva-infection model supports our original conclusion about the virulence of these strains observed in mouse models.

      We performed a GO term analysis of the hits from our screening, but did not find any significant or outstanding pathways. From our list of 96 genes, we chose to focus on the RAM pathway because the mutants were among the most sensitive to CO2. We have added an explanation for the genes we decided to test for host CO2 level sensitivity from the 16 downregulated genes on lines 139-141.

      Through Blast searching, we have found that the PARN domain-containing protein has homologs in other basidiomycetes. There might be some homologs in a few zygomycetes and ascomycetes but the confidence scores were so low that we deemed unlikely. We now report this in the manuscript on lines 210-213, “This domain was previously reported to be found in S. pombe (Marasovic, Zocco, & Halic, 2013). Interestingly, through a Blast search of the PARN domain, we did not identify this domain in the genomes of S. cerevisiae, C. albicans or other ascomycetes, but found it in Basidiomycetes and higher eukaryotes”.

      Ssd1 has been studied in the pathogenic yeast Candida albicans and is also regulated by Cbk1 in this organism. We have added a discussion about possible functions of Ssd1 in C. neoformans based on references to studies in C. albicans in the discussion section on lines 401-408. “In C. albicans, Ssd1 plays an important role in polarized growth and hyphal initiation by negatively regulating the transcription factor Nrg1 (H. J. Lee, Kim, Kang, Yang, & Kim, 2015). The observation that cbk1Δpsc1Δ and cbk1Δssd1Δ suppressor mutants partially rescue cell separation defects or depolarized growth suggests that C. neoformans may primarily utilize Ssd1/Psc1 rather than a potential Ace2 homolog to regulate cell separation or polarization. Differential regulation of target mRNA transcripts by Ssd1 and Psc1 may explain the functional divergence of the RAM pathway we observed here between basidiomycete Cryptococcus and the ascomycete yeasts.”

      Reviewer #3 (Public Review):

      In this work the authors identify genes and pathways important for CO2 and thermotolerance in Cryptococcus neoformans. They additionally rule out the contribution of the bicarbonate or cAMPdependent activation of adenylyl cyclase to this pathway, which is important for CO2 sensing in other fungi, further solidifying the need to characterize CO2 sensing in basidiomycetes. The authors establish the importance of focusing on CO2 tolerance by testing the impact of CO2 on fluconazole susceptibility with varied pH, suggesting the ability of CO2 to sensitize cryptococcal cells to fluconazole. Furthermore, the authors compared the CO2 tolerance of clinical reference strains to environmental isolates. The characterization of the RAM pathway Cbk1 kinase illustrated the integration of multiple stress signaling pathways. By using a series of CBK1OE insertions in strains with deletions in other pathways, the ability of Cbk1 over-expression to rescue several strains from CO2 sensitivity was apparent. Additionally, NanoString expression analysis comparing cbk1∆ to H99 validated the author's screen of CO2-sensitive mutants as 16/57 downregulated genes were found in their screen, further confirming the interconnected nature of these pathways. The importance of the RAM pathway in maintaining CO2 and thermotolerance was also incredibly clear.

      Perhaps most interestingly, the authors identify suppressor colonies with distinctive phenotypes that allowed for the characterization of downstream effectors of the RAM pathway. These suppressor colonies were found to have mutations in SSD1 and PSC1 which somewhat restore growth at 37oC with CO2 exposure. Further confirming the importance of the RAM pathway, the cbk1∆ strain had markedly attenuated virulence during infection. Interestingly, the generated suppressor strains had varying impacts on fungal infection in vivo. While the sup1 suppressor was completely cleared from the lungs during both intranasal and IV infection, the sup2 strain, containing mutations in SSD1, maintained a high fungal load in the lungs and was able to disseminate into host tissues during IV infection but not intranasal infection.

      The authors make a strong case for the exploration of thermotolerance and CO2 tolerance as contributors to virulence. Through screening and characterization of RAM pathway kinase CBK1's ability to rescue other mutants from CO2 sensitivity, the overlapping contributions of several signaling pathways and the importance of this kinase were revealed. This work is important and will be valuable to the field. However, the cbk1∆ strain does show reduced melanization, urease secretion, and higher sensitivity to cell wall stressor Congo Red in SI Appendix, Figure S4. While the authors make a strong argument that these well-established virulence factors are not perfect predictors of virulence in vivo, the cbk1∆ strain is not an example of such a case as it does have defects in these important factors in addition to thermotolerance and CO2 tolerance. Not acknowledging the changes in these virulence factors in the cbk1∆ and their potential contribution to phenotypes observed is a weakness of the manuscript. Interestingly, the sup1 and sup2 strains also rescue these virulence factors compared to cbk1∆. Additionally, the assertion that "the observation that only sup2 can survive, amplify, and persist in animals stresses the importance of CO2 tolerance in cryptococcal pathogens" due to the sup2's slightly higher CO2 tolerance compared to sup1, could be better supported by the data. These suppressors did not restore transcript abundances of the differentially expressed genes to WT levels, suggesting post-transcriptional regulation. However, there may be differences in the ability of sup2 to resist stress better than sup1 especially given the known Ssd1 repression of transcript translation in S. cerevisiae. Finally, pH appears to impact the sup1 and sup2 strain's sensitivity to CO2 in SI Appendix Figure 4. This could be better explained and interrogated in the manuscript. Finally, this work includes a variety of genes in several signaling pathways. The paper would be greatly clarified by a graphical abstract indicating how CBK1 may be integrating these pathways or by indicating which genes belong to which pathways in the Figure 1 legend to make this figure easier to follow.

      We thank the reviewer for the thorough summary of the study. We appreciate the reviewer’s enthusiasm about this study as well as constructive critiques on the manuscript. Indeed, the suppressor mutations in the cbk1Δ mutant rescue more phenotypes of cbk1Δ in vitro than just thermotolerance and CO2 tolerance (Supplemental Figure 5), which could benefit the survival of these suppressor strains in vivo compared to the original the cbk1Δ mutant. However, between the sup1 and the sup2 mutants, the only clear difference in growth we observed was in host levels of CO2 and temperature. There was no obvious difference in their resistance to Congo red (cell wall stress), melanization, susceptibility to FK506 (calcineurin pathway inhibitor), sensitivity to H2O2 (ROS), or urease (Supplemental Figure 5). Nonetheless, we agree with the reviewer that there could be other reasons which may influence the outcome in vivo, given that the host environment is more complex than we know. We have changed our wording in the manuscript to make it clear that contribution of better tolerance of CO2 to better survival of the sup2 mutant is only our hypothesis and there could be other unrecognized contributing factors. “The only in vitro difference observed between sup1 and sup2 was better growth of sup2 at host CO2 levels which may explain the difference in their ability to propagate and persist in the mouse lungs. However, due to the complexity of the host environment, there could be other unrecognized factors contributing to their growth difference in vivo.” (Lines 276279).

      About growth at different pH levels, C. neoformans tends to grow better at lower pH, closer to pH 5. This fungus can grow at pH 3, the lowest pH that our lab had tested (it may be able to sustain viability even at pH 2 based on others’ conference presentations). The high temperature/CO2 combined with neutral or high pH likely causes worse growth of both H99 and the mutants tested.

      We tried making a model to integrate all the pathways and factors identified in this work as the reviewer suggested. However, in this process, we found it difficult to propose a model. Although the current findings clearly demonstrate the importance of Cbk1 in thermotolerance and CO2 tolerance (overexpression of CBK1 can partially restore thermotolerance and/or CO2 tolerance in the mutants defective in the cell wall integrity pathway, the calcineurin pathway or the Cdc24-Ras1 pathway, and that the reciprocal overexpression of these genes in the cbk1∆ mutant does not rescue any of the cbk1∆ mutant’s defects), we do not know the exact mechanisms underlying this phenomenon. Do these pathways directly interact with Cbk1, affect its phosphorylation status, or alter its subcellular localization? Or do these pathways act through some other massagers to indirectly activate Cbk1 or maybe Cbk1’s downstream targets? These are the questions that warrant further investigations in the future. To be prudent, we think it is better not to propose a model at this point given the uncertainty of the mechanism. The mutants belonging to each of the pathways are clearly specified in the texts in this revised manuscript to help orient the readers. For example “As the RAM pathway effector kinase mutant cbk1Δ showed the most severe defect in thermotolerance and CO2 tolerance compared to the mutants of the other pathways, we first overexpressed the gene CBK1 in the following mutants, cdc24∆ (Ras1-Cdc24), mpk1∆ (CWI), cna1∆ (Calcineurin), and the cbk1Δ mutant itself, and observed their growth at host temperature and host CO2 (Figure 2B)...”

    1. Author Response

      Public Evaluation Summary:

      The authors re-analyzed a previously published dataset and identify patterns suggestive of increased bacterial biodiversity in the gut may creating new niches that lead to gene loss in a focal species and promote generation of more diversity. Two limitations are (i) that sequencing depth may not be sufficient to analyze strain-level diversity and (ii) that the evidence is exclusively based on correlations, and the observed patterns could also be explained by other eco-evolutionary processes. The claims should be supported by a more detailed analysis, and alternative hypotheses that the results do not fully exclude should be discussed. Understanding drivers of diversity in natural microbial communities is an important question that is of central interest to biomedically oriented microbiome scientists, microbial ecologists and evolutionary biologists.

      We agree that understanding the drivers of diversity in natural communities is an important and challenging question to address. We believe that our analysis of metagenomes from the gut microbiomes is complementary to controlled laboratory experiments and modeling studies. While these other studies are better able to establish causal relationships, we rely on correlations – a caveat which we make clear, and offer different mechanistic explanations for the patterns we observe.

      We also mention the caveat that we are only able to measure sub-species genetic diversity in relatively abundant species with high sequencing depth in metagenomes. These relatively abundant species include dozens of species in two metagenomic datasets, and we see no reason why they would not generalize to other members of the microbiome. Nonetheless, further work will be required to extend our results to rarer species.

      Our revised manuscript includes two major new analyses. First, we extend the analysis of within-species nucleotide diversity to non-synonymous sites, with generally similar results. This suggests that evolutionarily older, less selectively constrained synonymous mutations and more recent non-synonymous mutations that affect protein structure both track similarly with measures of community diversity – with some subtle differences described in the manuscript.

      Second, we extend our analysis of dense time series data from one individual stool donor and one deeply covered species (B. vulgatus) to four donors and 15 species. This allowed us to reinforce the pattern of gene loss in more diverse communities with greater statistical support. Our correlational results are broadly consistent with the predictions of DBD from modeling and experimental studies, and they open up new lines of inquiry for microbiome scientists, ecologists, and evolutionary biologists.

      Reviewer #1 (Public Review):

      This paper makes an important contribution to the current debate on whether the diversity of a microbial community has a positive or negative effect on its own diversity at a later time point. In my view, the main contribution is linking the diversity-begets-diversity patterns, already observed by the same authors and others, to genomic signatures of gene loss that would be expected from the Black Queen Hypothesis, establishing an eco-evolutionary link. In addition, they test this hypothesis at a more fine-grained scale (strain-level variation and SNP) and do so in human microbiome data, which adds relevance from the biomedical standpoint. The paper is a well-written and rigorous analysis using state-of-the-art methods, and the results suggest multiple new experiments and testable hypotheses (see below), which is a very valuable contribution.

      We thank the reviewer for their generous comments.

      That being said, I do have some concerns that I believe should be addressed. First of all, I am wondering whether gene loss could also occur because of environmental selection that is independent of other organisms or the diversity of the community. An alternative hypothesis to the Black Queen is that there might have been a migration of new species from outside and then loss of genes could have occurred because of the nature of the abiotic environment in the new host, without relationship to the community diversity. Telling the difference between these two hypotheses is hard and would require extensive additional experiments, which I don't think is necessary. But I do think the authors should acknowledge and discuss this alternative possibility and adjust the wording of their claims accordingly.

      We concur with the reviewer that the drivers of the correlation between community diversity and gene loss are unclear. Therefore, we have now added the following text to the Discussion:

      “Here we report that genome reduction in the gut is higher in more diverse gut communities. This could be due to de novo gene loss, preferential establishment of migrant strains encoding fewer genes, or a combination of the two. The mechanisms underlying this correlation remain unclear and could be due to biotic interactions – including metabolic cross-feeding as posited by some models (Estrela et al., 2022; San Roman and Wagner, 2021, 2018) but not others (Good and Rosenfeld, 2022) – or due to unknown abiotic drivers of both community diversity and gene loss.”

      Additionally, we have revised Figure 1 to show that strain invasions/replacements, in addition to evolutionary change, could be an important driver of changes in intra-species diversity in the microbiome.

      Another issue is that gene loss is happening in some of the most abundant species in the gut. Under Black Queen though, we would expect these species to be most likely "donors" in cross-feeding interactions. Authors should also discuss the implications, limitations, and possible alternative hypotheses of this result, which I think also stimulates future work and experiments.

      We thank the reviewer for raising this point. It is unclear to us whether the more abundant species would be donors in cross-feeding interactions. If we understand correctly, the reviewer is suggesting that more abundant donors will contribute more total biomass of shared metabolites to the community. This idea makes sense under the assumption that the abundant species are involved in cross-feeding interactions in the first place, which may or may not be the case. As our work heavily relies on a dataset that we previously analyzed (HMP), we wish to cite Figure S20 in Garud, Good et al. 2019 PLoS Biology in which we found there are comparable rates of gene changes across the ~30 most abundant species analyzed in the HMP. This suggests that among the most abundant species analyzed, there is no relationship between their abundance and gene change rate.

      That being said, we acknowledge that our study is limited to the relatively abundant focal species and state now in the Discussion: “Deeper or more targeted sequencing may permit us to determine whether the same patterns hold for rarer members of the microbiome.”

      Regarding Figure 5B, there is a couple of questions I believe the authors should clarify. First, How is it possible that many species have close to 0 pathways? Second, besides the overall negative correlation, the data shows some very conspicuous regularities, e.g. many different "lines" of points with identical linear negative slope but different intercept. My guess is that this is due to some constraints in the pathway detection methods, but I struggle to understand it. I think the authors should discuss these patterns more in detail.

      We sincerely thank the reviewer for raising this issue, as it prompted us to investigate more deeply the patterns observed at the pathway level. In short, we decided to remove this analysis from the paper because of a number of bioinformatics issues that we realized were contributing to the signal. However, in support of BQH-like mechanisms at play, we do find evidence for gene loss in more diverse communities across multiple species in both the HMP and Poyet datasets. Below we detail our investigation into Figure 5b and how we arrived at the conclusion that is should be removed:

      (1) Regarding data points in Figure 5B where many focal species have “zero pathways”,we firstly clarify how we compute pathway presence and richness. Pathway abundance data per species were downloaded from the HMP1-2 database, and these pathway abundances were computed using HUMAnN (HMP Unified Metabolic Analysis Network). According to HUMAnN documentation, pathway abundance is proportional to the number of complete copies of the pathway in the community; this means that if at least one component reaction in a certain pathway is missing coverage (for a sample-species pair), the pathway abundance may be zero (note that HUMAnN also employs “gap filling” to allow no more than one required reaction to have zero abundance). As such, it is likely that insufficient coverage, especially for low-abundance species, causes many pathways to report zero abundance in many species in many samples. Indeed, 556 of the 649 species considered had zero “present” pathways (i.e. having nonzero abundance) in at least 400 of the 469 samples (see figure below).

      (2) We thank the reviewer for pointing out the “conspicuous regularities” in Figure 5B,particularly “parallel lines” of data points that we discovered are an artifact of the flawed way in which we computed “community pathway richness [excluding the focal species].” Each diagonal line of points corresponds to different species in the same sample, and because community pathway richness is computed as the total number of pathways [across all species in the sample] minus the number of pathways in the focal species, the current Figure 5B is really plotting y against X-y for each sample (where X is a sample’s total community pathway richness, and y is the pathway richness of an individual species in that sample). This computation fails to account for the possibility that a pathway in an excluded focal species will still be present in the community due to redundancy, and indeed BQH tests for whether this redundancy is kept low in diverse communities due to mechanisms such as gene loss.

      We attempted to instead plot community pathway richness defined as the number of unique pathways covered by all species other than the focal species. This is equivalent to [number of unique pathways across all species in a sample] minus the [number of pathways that are ONLY present in the focal species and not any other species in the sample]. However, when we recomputed community pathway richness this way, it is rare that a pathway is present in only one species in a sample. Moreover, we find that with the exception of E. coli, focal species pathway richness tended to be very similar across the 469 samples, often reaching an upper limit of focal species pathway richness observed. (It is unclear to what extent lower pathway richnesses are due to low species abundance/low sample coverage versus gene loss). This new plot reveals even more regularities and is difficult to interpret with respect to BQH. (Note that points are colored by species; the cluster of black dots with outlying high focal pathway richness corresponds to the “unclassified” stratum which can be considered a group of many different species.)

      Overall, because community pathway richness (excluding a focal species) seems to primarily vary with sample rather than focal species in this dataset when using the most simple/strict definition of community pathway richness as described above, it is difficult to probe the Black Queen Hypothesis using a plot like Figure 5B. As pointed out by reviewers, lack of sequencing depth to analyze strain-level diversity and accurately quantify pathway abundance, irrespective of species abundance, seems to be a major barrier to this analysis. As such, we have decided to remove Figure 5B from the paper and rewrite some of our conclusions accordingly.

      Finally, I also have some conceptual concerns regarding the genomic analysis. Namely, genes can be used for biosynthesis of e.g. building blocks, but also for consumption of nutrients. Under the Black Queen Hypothesis, we would expect the adaptive loss of biosynthetic genes, as those nutrients become provided by the community. However, for catabolic genes or pathways, I would expect the opposite pattern, i.e. the gain of catabolic genes that would allow taking advantage of a more rich environment resulting from a more diverse community (or at least, the absence of pathway loss). These two opposing forces for catabolic and biosynthetic genes/pathways might obscure the trends if all genes are pooled together for the analysis. I believe this can be easily checked with the data the authors already have, and could allow the authors to discuss more in detail the functional implications of the trends they see and possibly even make a stronger case for their claims.

      We thank the reviewer for their suggestion. As explained above, we have removed the pathway analysis from the paper due to technical reasons. However, we did investigate catabolic and biosynthetic pathways separately as suggested by the reviewer as we describe below:

      We obtained subsets of biosynthetic pathways and catabolic pathways by searching for keywords (such as “degradation” for catabolic) in the MetaCyc pathway database. After excluding the “unclassified” species stratum, we observe a total of 279 biosynthetic and 167 catabolic pathways present in the HMP1-2 pathway abundance dataset. Using the corrected definition of community pathway richness excluding a focal species, for each pathway type—either biosynthetic or catabolic—we plotted focal species pathway richness against community pathway richness including all pathways regardless of type:

      We observe the same problem where, within a sample, community pathway richness excluding the focal species hardly varies no matter which focal species it is, due to nearly all of its detected pathways being present in at least one other species; this makes the plots difficult to interpret.

      Reviewer #2 (Public Review):

      The authors re-analysed two previously published metagenomic datasets to test how diversity at the community level is associated with diversity at the strain level in the human gut microbiota. The overall idea was to test if the observed patterns would be in agreement with the "diversity begets diversity" (DBD) model, which states that more diversity creates more niches and thereby promotes further increase of diversity (here measured at the strain-level). The authors have previously shown evidence for DBD in microbiomes using a similar approach but focusing on 16S rRNA level diversity (which does not provide strain-level insights) and on microbiomes from diverse environments.

      One of the datasets analysed here is a subset of a cross-sectional cohort from the Human Microbiome Project. The other dataset comes from a single individual sampled longitudinally over 18 months. This second dataset allowed the authors to not only assess the links between different levels of diversity at single timepoints, but test if high diversity at a given timepoint is associated with increased strain-level diversity at future timepoints.

      Understanding eco-evolutionary dynamics of diversity in natural microbial communities is an important question that remains challenging to address. The paper is well-written and the detailed description of the methodological approaches and statistical analyses is exemplary. Most of the analyses carried out in this study seem to be technically sound.

      We thank the reviewer for their kind words, comments, and suggestions.

      The major limitation of this study comes with the fact that only correlations are presented, some of which are rather weak, contrast each other, or are based on a small number of data points. In addition, finding that diversity at a given taxonomic rank is associated with diversity within a given taxon is a pattern that can be explained by many different underlying processes, e.g. species-area relationships, nutrient (diet) diversity, stressor diversity, immigration rate, and niche creation by other microbes (i.e. DBD). Without experiments, it remains vague if DBD is the underlying process that acts in these communities based on the observed patterns.

      We thank the reviewer for their comments. First, regarding the issue of this being a correlative study, we now more clearly acknowledge that mechanistic studies (perhaps in experimental settings) are required to fully elucidate DBD and BQH dynamics. However, we note that our correlational study from natural communities is complementary to experimental and modeling studies, to test the extent to which their predictions hold in more complex, realistic settings. This is now mentioned throughout the manuscript, most explicitly at the end of the Introduction:

      “Although such analyses of natural diversity cannot fully control for unmeasured confounding environmental factors, they are an important complement to controlled experimental and theoretical studies which lack real-world complexity.”

      Second, to increase the number of data points analyzed in the Poyet study, we now include 15 species and four different hosts (new Figure 5). The association between community diversity and gene loss is now much more statistically robust, and consistent across the Poyet and HMP time series.

      Third, we acknowledge more clearly in the Discussion that other processes, including diet and other environmental factors can generate the DBD pattern. We also now stress more prominently the possibility that strain migration across hosts may be responsible for the patterns observed. For example, in Figure 1, we illustrate the possibility of strain migration generating the patterns we observe.

      Below we quote a paragraph that we have now added in the Discussion:

      "Second, we cannot establish causal relationships without controlled experiments. We are therefore careful to conclude that positive diversity slopes are consistent with the predictions of DBD, and negative slopes with EC, but unmeasured environmental drivers could be at play. For example, increased dietary diversity could simultaneously select for higher community diversity and also higher intra-species diversity. In our previous study, we found that positive diversity slopes persisted even after controlling for potential abiotic drivers such as pH and temperature (Madi et al., 2020), but a similar analysis was not possible here due to a lack of metadata. Neutral processes can account for several ecological patterns such as species-area relationships (Hubbell, 2001), and must be rejected in favor of niche-centric models like DBD or EC. Using neutral models without DBD or EC, we found generally flat or negative diversity slopes due to sampling processes alone and that positive slopes were hard to explain with a neutral model (Madi et al., 2020). These models were intended mainly for 16S rRNA gene sequence data, but we expect the general conclusions to extend to metagenomic data. Nevertheless, further modeling and experimental work will be required to fully exclude a neutral explanation for the diversity slopes we report in the human gut microbiome.”

      Finally, we now put more emphasis on the importance of migration (strain invasion) as a non-exclusive alternative to de novo mutation and gene gain/loss. This is mentioned in the Abstract and is also illustrated in the revised Figure 1.

      Another limitation is that the total number of reads (5 mio for the longitudinal dataset and 20 mio for the cross-sectional dataset) is low for assessing strain-level diversity in complex communities such as the human gut microbiota. This is probably the reason why the authors only looked at one species with sufficient coverage in the longitudinal dataset.

      Indeed, this is a caveat which means we can only consider sub-species diversity in relatively abundant species. Nevertheless, this allows us to study dozens of species in the HMP and 15 in the more frequent Poyet time series. As more deeply sequenced metagenomes become available, future studies will be able to access the rarer species to test whether the same patterns hold or not. This is now mentioned prominently as a caveat our study in the second Discussion paragraph:

      “First, using metagenomic data from human microbiomes allowed us to study genetic diversity, but limited us to considering only relatively abundant species with genomes that were well-covered by short sequence reads. Deeper or more targeted sequencing may permit us to determine whether the same patterns hold for rarer members of the microbiome. However, it is notable that the majority of the dozens of species across the two datasets analyzed support DBD, suggesting that the phenomenon may generalize.”

      We also note that rarefaction was only applied to calculate community richness, not to estimate sub-species diversity. We apologize for this confusion, which is now clarified in the Methods as follows:

      “SNV and gene content variation within a focal species were ascertained only from the full dataset and not the rarefied dataset.”

      Analyzing the effect of diversity at a given timepoint on strain-level diversity at a later timepoint adds an important new dimension to this study which was not assessed in the previous study about the DBD in microbiomes by some of the authors. However, only a single species was analysed in the longitudinal dataset and comparisons of diversity were only done between two consecutive timepoints. This dataset could be further exploited to provide more insights into the prevailing patterns of diversity.

      We thank the reviewer for raising this point. We now have considered all 15 species for which there was sufficient coverage from the Poyet dataset, which included four different stool donors. Additionally, in the HMP dataset, we analyze 54 species across 154 hosts, with both datasets showing the same correlation between community diversity and gene loss.

      Additionally, we followed the suggestion of the reviewer of examining additional time lags, and in Figure 5 we do observe a dependency on time. This is now described in the Results as follows:

      “Using the Poyet dataset, we asked whether community diversity in the gut microbiome at one time point could predict polymorphism change at a future time point by fitting GAMs with the change in polymorphism rate as a function of the interaction between community diversity at the first time point and the number of days between the two time points. Shannon diversity at the earlier time point was correlated with increases in polymorphism (consistent with DBD) up to ~150 days (~4.5 months) into the future (Figure S4), but this relationship became weaker and then inverted (consistent with EC) at longer time lags (Fig 5A, Table S8, GAM, P=0.023, Chi-square test). The diversity slope is approximately flat for time lags between four and six months, which could explain why no significant relationship was found in HMP, where samples were collected every ~6 months. No relationship was observed between community richness and changes in polymorphism (Table S8, GAM, P>0.05).”

      Finally, the evidence that gene loss follows increase in diversity is weak, as very few genes were found to be lost between two consecutive timepoints, and the analysis is based on only a single species. Moreover, while positive correlation were found between overall community diversity and gene family diversity in single species, the opposite trend was observed when focusing on pathway diversity. A more detailed analysis (of e.g. the functions of the genes and pathways lost/gained) to explain these seemingly contrasting results and a more critical discussion of the limitations of this study would be desirable.

      We agree that our previous analysis of one species in one host provided weak support for gene loss following increases in diversity. As described in the response above, we have now expanded this analysis to 15 focal species and 4 independent hosts with extensive time series. We now analyze this larger dataset and report the more statistically robust results as follows:

      “We found that community Shannon diversity predicted future gene loss in a focal species, and this effect became stronger with longer time lags (Fig 5B, Table S9, GLMM, P=0.006, LRT for the effect of the interaction between the initial Shannon diversity and time lag on the number of genes lost). The model predicts that increasing Shannon diversity from its minimum to its maximum would result in the loss of 0.075 genes from a focal species after 250 days. In other words, about one of the 15 focal species considered would be expected to lose a gene in this time frame.

      Higher Shannon diversity was also associated with fewer gene gains, and this relationship also became stronger over time (Fig 5C, Table S9, GLMM, P=1.11e-09, LRT). We found a similar relationship between community species richness and gene gains, although the relationship was slightly positive at shorter time lags (Fig 5D, Table S9, GLMM, P=3.41e-04, LRT). No significant relationship was observed between richness and gene loss (Table S9, GLMM, P>0.05). Taken together with the HMP results (Fig 4), these longer time series reveal how the sign of the diversity slope can vary over time and how community diversity is generally predictive of reduced focal species gene content.”

      As described in detail in the response to Reviewer 1 above, we found that the HUMAnN2 pathway analyses previously described suffered from technical challenges and we deemed them inconclusive. We have therefore removed the pathway results from the manuscript.

      Reviewer #3 (Public Review):

      This work provides a series of tests of hypothesis, which are not mutually exclusive, on how genomic diversity is structured within human microbiomes and how community diversity may influence the evolution of a focal species.

      Strengths:

      The paper leverages on existing metagenomic data to look at many focal species at the same time to test for the importance of broad eco-evolutionary hypothesis, which is a novelty in the field.

      Thank you for the succinct summary and recognition of the strengths of our work.

      Weaknesses:

      It is not very clear if the existing metagenomic data has sufficient power to test these models.

      It is not clear, neither in the introduction nor in the analysis what precise mechanisms are expected to lead to DBD.

      The conclusion that data support DBD appears to depend on which statistics to measure of community diversity are used. Also, performing a test to reject a null neutral model would have been welcome either in the results or in the discussion.

      In our revised manuscript, we emphasize several caveats – including that we only have power to test these hypotheses in focal species with sufficient metagenomic coverage to measure sub-species diversity. We also describe more in the Introduction how the processes of competition and niche construction can lead to DBD. We also acknowledge that unmeasured abiotic drivers of both community diversity and sub-species diversity could also lead to the observed patterns. Throughout the manuscript, we attempt to describe the results and acknowledge multiple possible interpretations, including DBD and EC acting with different strengths on different species and time scales. Our previous manuscript assessing the evidence for DBD using 16S rRNA gene amplicon data from the Earth Microbiome Project (Madi et al., eLife 2020) assessed null models based on neutral ecological theory, and found it difficult to explain the observation of generally positive diversity slopes without invoking a non-neutral mechanism like DBD. While a new null model tailored to metagenomic data might provide additional nuance, we think developing one is beyond the scope of the manuscript – which is in the format of a short ‘Research Advance’ to expand on our previous eLife paper, and we expect that the general results of our previously reported null model provide a reasonable intuition for our new metagenomic analysis. This is now mentioned in the Discussion as follows:

      “In our previous study, we found that positive diversity slopes persisted even after controlling for potential abiotic drivers such as pH and temperature (Madi et al., 2020), but a similar analysis was not possible here due to a lack of metadata. Neutral processes can account for several ecological patterns such as species-area relationships (Hubbell, 2001), and must be rejected in favor of niche-centric models like DBD or EC. Using neutral models without DBD or EC, we found generally flat or negative diversity slopes due to sampling processes alone and that positive slopes were hard to explain with a neutral model (Madi et al., 2020). These models were intended mainly for 16S rRNA gene sequence data, but we expect the general conclusions to extend to metagenomic data. Nevertheless, further modeling and experimental work will be required to fully exclude a neutral explanation for the diversity slopes we report in the human gut microbiome.”

    1. Author Response

      Reviewer #2 (Public Review):

      1&2) Throughout the paper, the authors use a BiFC assay to monitor direct interactions between GDOWN1 and other transcription factors in the cell. While this assay works well for their experiments, we are unsure why GDOWN1 appears to interact with every protein found in the cytoplasm. This is particularly concerning when we look at GDOWN1 interacting with itself (Figure 1D), as GDOWN1 is not known to self-oligomerize. The authors should provide a negative control that GDOWN1 does not non-specifically interact with any cytoplasm-localized protein. Additionally, every GDOWN1 truncation tested was able to interact with NELF-E. We are unsure why each truncation tested (given that they tested multiple non-overlapping GDOWN1 regions) can interact with NELF-E. Do the authors believe that NELF-E directly interacts with every tested GDOWN1 construct? We believe that demonstration of BiFC specificity is critical for the conclusions drawn in the manuscript.

      Thank you for your comments and valuable suggestions! We added more negative BiFC controls in the revised manuscript to demonstrate the specificity of BiFC assays (Figure 1——figure supplement 1D). Since both reviewers brought up this question, we provided our answers to this question above in the “Common concerns by the Reviewers” session (Q#1).

      3) The authors note that the NES1 site is not as strong as the NES2 site at regulating exportin 1-dependent nuclear export. However, they suggest this is because mutating the NES2 site is more likely to disrupt the CAS site nearby. We ask the authors to expand on this concept. Do they have direct evidence that NES2 disrupts CAS activity (such as regulating its association with the nuclear pore complex)?

      From Figure 4A, we can see that both NES1 (4A-b) and NES2 (4A-d) work as functional nuclear export signals. When NES1 was mutated (4A-c), NES2 and CAS both remained functional in blocking GDOWN1’s nuclear shuttling upon LMB addition. However, when NES2 was mutated (4A-e), comparing the localization changes before and after LMB addition, we concluded that NES1 remained functional, while the cytoplasmic retention activity of CAS was partially lost. From the quantification of the images, it seems that NES1 has a stronger activity than NES2 in terms of the LMB responsiveness/CRM1-depentent nuclear export activity, while apparently NES2 exhibits another layer of regulation/correlation on the CAS activity.

      To further confirm this observation, we generated a HeLa stably cell line expressing GDOWN1(NES2 mutant)-Venus and tested the subcellular localization of this mutant. As shown in the Figure 4C of the revised manuscript, compared with the wild type GDOWN1, loss of the NES2 activity directly caused the loss of the perinuclear staining, which was consistent to the defect of the CAS mutant. These results further support that the mutagenesis of NES2 disrupts the CAS-mediated association to the nuclear pore complex.

      4) The authors show the critical role of the NES1, NES2, and CAS sites for the localization and function of GDOWN1. Have the authors checked post-translational modification databases to check if any of the identified sites could be post-translationally modified and thereby regulated? Elucidation of the mechanism by which GDOWN1 localization is regulated is of broad interest to the transcription community.

      Good suggestion! It is worthy of checking and testing the potential modifications on the key arginines identified in CAS (R352, R354, and R357). We did check the web tools for arginine methylation site prediction (http://msp.biocuckoo.org/online.php), but none pf the known motifs was found to match with the CAS sequences of GDOWN1. In addition, our pilot studies for the treatments using the inhibitors of arginine methyltransferases (- or + LMB) did not result in any nuclear accumulation of GDOWN1 (data not shown). So far, we do not have any strong evidence to confirm that these arginines are directly modified in our assays, and we cannot exclude the possibilities of other amino acids nearby also play key roles on the CAS function. Thus, more research is badly needed to uncover the regulatory mechanism of CAS.

    1. Author Response

      Reviewer #1 (Public Review):

      This study aimed to test the hypothesis that resident immune cells are strategically positioned along the epididymal duct to provide different immunological environments to prevent pathogens from ascending the urogenital tract. By using an epididymitis mouse model, the differential responses at different segments along the epididymis were examined at both histological and gene expression levels, and the data appeared to support their hypothesis. Furthermore, single-cell RNA-seq analyses identified the composition of resident immune cell types along the epididymal duct, and the parabiosis model further corroborated the major findings. Overall, the study was well conducted and the major conclusion seems well supported. The only caveat is the lack of elucidation on the direct or indirect impact of the resident immune cells on sperm maturation.

      We thank the reviewer for his/her feedback and the valuable comments.

      We are aware of the fact that the current manuscript lacks further experimental evidence on the effects of immune cells on organ function, especially sperm maturation, and agree that this would constitute a relevant object to study. Although the assessment of the direct or indirect impact of particular immune cells on sperm maturation would require further intensive research, encompassing e.g. the consequences of targeted cell depletions (using several transgenic mouse models) with comprehensive follow-up analysis (i.e. by detecting anti-sperm antibodies, assessing the potential appearance of sperm-induced autoimmune reactions in vivo and conducting in vitro co-culture assays besides conducting sperm functional tests to evaluate capacitation and fertilization competencies). A study of this magnitude is outside of the scope of the present manuscript and would form a separate examination that alone would take more than a year to perform. Therefore, our intention was to submit this article as a ‘Tools and Resource’ article as it is providing a detailed overview of all immune cell types that are shaping the regional immunological landscape based on crucial information about their transcriptional profiles on single cell resolution. In our view the provided data are closing a gap in the current state of knowledge (particularly regarding the transcriptional identity and distribution of described immune cell populations) and will serve as a relevant common platform for current and future approaches.

      Reviewer #2 (Public Review):

      Pleuger et al. investigated the heterogeneity of resident immune cells in the murine epididymis. The response of immune cells in the different epididymal segments was characterized following acute bacterial infection by flow cytometry, and immunofluorescence microscopy. Single-cell RNA sequencing analysis and parabiosis experiments were performed to provide an atlas of resident immune cells and their etiology in the epididymis under steady-state conditions. The authors conclude that distinct immune cell phenotypes govern specific responses of the different epididymal segments during acute bacterial infection. Overall, the conclusions of this study are well supported by the data, but some specific aspects related to the region-specific phenotypes of resident immune cells need to be revisited.

      1) In order to conclude that there was an infiltration of neutrophils and monocytes following bacterial injection, the authors should provide flow cytometry quantification of the percentages of immune cell subsets relative to live cells, rather than relative to the CD45+ population.

      Following the reviewer’s request, we have replaced the data previously shown in figure 2 by a completely new high-dimensional flow cytometry analysis including FltSNE visualization of CD45+ cell populations in different epididymal regions (IS, Caput, Corpus, Cauda) under different conditions (naive, sham, UPEC 10 days post infection). In addition, we have included bar diagrams displaying the percentage of all investigated immune cell subsets in relation to single live cells. The results displayed in the new figure are similar to previous shown data, but the overall figure layout and visualization method is clearer and more comprehensible. We thank the reviewer for the helpful comment.

      2) In general, all flow cytometry and immunofluorescence data should be presented and discussed with respect to previously published studies.

      This is reflected in the discussion (line 564-575) and in addition by addressing similar points raised by the reviewers.

      3) A surprisingly low number of CX3CR1-EGFP cells was detected by immunofluorescence in the cauda. This is not in agreement with previous studies showing a similar % of CX3CR1-EGFP cells in the IS and cauda regions by immunofluorescence and flow cytometry. The authors need to discuss this discrepancy. Perhaps the different fixation procedures used in the current study compared to those used in previous studies could account for the loss of EGFP in epididymis cryo-sections. As such, cells that appear to be F4/80 positive but negative for EGFP by immunofluorescence might simply be due to the loss of cytoplasmic EGFP, while F4/80 immunogenicity remained intact.

      Within our study, we have shown by combining scRNASeq, flow cytometry and immunostaining that distinct macrophage subgroups co-exist within the epididymis and that the diversity increases towards the cauda. Based on these data, we can assume that cells that appear to be F4/80 positive but negative for CX3CR1 (e.g. clusters 6-9 of the macrophage clustering show a very low level or even lack of Cx3cr1 expression) are distinct from CX3CR1+F4/80+ cells (e.g. clusters 1 and 2 of the macrophage subclustering, both showing a high expression of Cx3cr1). Therefore, our immunostaining (on Cx3cr1GFPCcr2RFP reporter mice) and flow cytometry data (on wild type C57BL/6J mice) in Figure 6 are in line with our transcriptomic data and strongly support the co-existence of both populations. We have seen the described gradient of macrophage numbers (decreasing from IS towards cauda) in all independently performed experiments (naive control group in infection experiments, steady-state characterization in wild type and transgenic mice). A previous study, however, demonstrated a constant CX3CR1+ cell ‘number’ throughout all epididymal regions (~5-6% in live cells, (Battistone et al., 2020)). Here, indeed we notice a discrepancy to our results that show a relatively high ‘number’ of CX3CR1+ cells in the initial segment of naive mice (20% in single live cells, new Figure 2G) that decreases towards the cauda (~5% in single live cells, data shown in the new Figure 2 of naive mice). [It needs to be mentioned that these numbers are slightly different to the percentage of CD45+ cells in single live cells shown in Figure 4 due to different settings in the flow cytometry (thresholding to exclude spermatozoa and debris)]. However, another study (Voisin et al., 2018) showed a comparable ratio of total macrophages within caput and cauda with a similar gradient throughout the epididymal regions (significantly lower ratio within the cauda compared to the caput). Although this study discriminated only between caput and cauda, these data are in line with our results.

      Nevertheless, it needs to be noted that calculating the percentage of a population in single live cells is not representing an unbiased quantification approach as this calculation is highly dependent on previous gating (thresholding, aimed events, single cells as well as live cells; the latter is, in turn, dependent on the experimental procedures that may have an impact on the cell viability and antigen recognizability, see below). Rather, it provides important information about the population distribution among regions or conditions. For this reason, a comparison among studies as requested above is not expedient from our point of view. This as well as other studies are limited in the way that they lack an absolute quantification of immune cell populations as that would require e.g. a prior cell-counting or the relation of absolute cell numbers to mg of tissue as conducted in the parabiosis experiment shown in Figure 7 (that in turn is also limited for the epididymal regions due to the necessity of pooling tissue from several mice to obtain a sufficient cell number and thus, masking individual differences). Another alternative would be quantitative morphometric analysis of stained sections that has not been performed in the present study.

      By comparing the protocol for the cell isolation and preparation of the single cell suspension between our study and previous reports (Battistone et al., 2020), it appears that different protocols have been applied that indeed could have a major impact. In this regard, the study of (Battistone et al., 2020) used a mixture of collagenase type I (0.5 mg/ml) and collagenase type II (0.5 mg/ml) and incubated tissue fragments for a short period (30 minutes) at 37°C. In contrast, in this study we have chopped the tissue fragments with scissors until no fragments were visible anymore then followed by enzymatic digestion (shaking at 37°C for 45 minutes with 1.5 mg/ml collagenase type IV and 60 U/ ml DNAse). Afterwards, we aspirated the digest 5-6 times through a 30G needle (to release pre-digested sticky cells from each other by shear forces) before passing through a 70 µm cell strainer. We have experienced that we can significantly increase the number of viable cells when using collagenase type IV for a longer time at the ideal concentration at 1.5 mg/ml (similar concentration and incubation duration with collagenase I resulted in a higher proportion of dead cells in the analysis). A longer incubation time increases the obtained cell numbers especially from the IS where the epithelial cells are densely connected to each other. In general, collagenase type IV has a lower tryptic activity than other collagenases and therefore, the usage of collagenase IV limits the damage on membrane proteins and receptors (an overview of the different collagenase types with respective references can be found at: https://www.worthington-biochem.com/products/collagenase/manual).

      In summation, we agree with the reviewer that very likely methodological differences account for the mentioned discrepancy of our data to Battistone et al (2020) and raised this point in the revised discussion(ses line 559-564).

      The statement "Intriguingly, our data revealed that distinct immunological landscapes exist within proximal (IS, caput) and distal regions (corpus, cauda), that are tailored to the respective needs of the microenvironments" implies that this is the first study that describes immune cell heterogeneity in the epididymis. Please rephrase this statement as previous studies have already shown the segment-specific heterogeneity of resident immune cells in this organ.

      To address the reviewer's comment, we have rephrased the statement to “our data unraveled the transcriptional identity and tissue location of extravascular immune cells and further support the existence of distinct immunological environments along the epididymal duct that are tailored to the respective needs of the microenvironment” within the discussion section (line 555-558). Moreover, the previous investigations on epididymal immune cells were acknowledged and cited within the introduction (line 107-124) as well as in the discussion (line 549-554, line 564-575, line 580-584,). We hope that this satisfactorily addresses the reviewer’s critique.

      The conclusion that macrophages constitute the major immune cell population of the murine epididymis is not supported by the data provided here. In fact, the authors found that macrophages account for only approximately 20% of CD45+ immune cells in the cauda. The authors should, therefore, modify their conclusion to state that macrophages constitute the major immune cell population in the IS. In fact, this conclusion would be more in line with previously published studies.

      The reviewer is correct and we have changed the conclusion to “macrophages constitute the major immune cell population, especially within the IS” accordingly (see line 559-560).

      The authors conclude that fewer intraepithelial CX3CR1-EGFP+ cells are present in the cauda, but they do not explain how they actually quantified these intraepithelial cells. A description of how these results were obtained is missing.

      We agree with the reviewer that we did not quantify cells based on our immunostaining. All quantification approaches were obtained by flow cytometry on wild type mice with respective surface staining (acc. to previous selection of markers derived from scRNASeq, see Figure 6) and show only ratios, but no absolute numbers. An additional counting of the immunostained section would be required to ultimately determine whether these cells are quantitatively different in the cauda compared to the IS. The respective sentence, however, does not intend to compare the abundance of these cells among epididymal regions, rather it is stating that ‘the distal regions are populated by a more heterogeneous macrophage pool consisting of less intraepithelial CX3CR1+ macrophages, but higher abundance of interstitial pro-inflammatory monocyte-derived CCR2+MHC-II+, vasculature-associated TLF+ macrophages as well as CX3CR1-TLF-CCR2- macrophages’. This statement is pointing to the increasing macrophage heterogeneity towards the distal parts and is based on the clustering of the scRNASeq data, flow cytometry analysis and supported by the immunostaining that localized these populations in the epididymal compartments. For this reason flow cytometry and immunostaining are combined included in Figure 6 to display the ratio of identified macrophage subgroups to each other (Fig. 6B, bar diagram showing % of distinct subpopulations in total F4/80+ cells) with supportiving immunostaining using the same marker for localization.

    1. Author Response

      Reviewer #3 (Public Review):

      Weaknesses

      The spontaneous activity of the network is extremely low, with [0.02 0.09] spks/s considered as a high activity range. Granted, this is based on ex vivo measurements. However, if this phenomenon is to be considered computationally relevant, as the authors claim, the paper should have examined the reliability of propagation and routing with in vivo activity levels.

      The above weakness is a special case of the issue that the limits of applicability/robustness of results to model assumptions have not been well established. In particular, it is not clear how strong the strongest weights must be whilst still enabling long sequences, and what is the dependence of the results on the parameters of the distance-dependent connectivity.

      Regarding the two first weaknesses listed in Reviewer #3 Public Review, we wish to note that:

      ● The statement that our estimate of spontaneous activity “is based on ex vivo measurements” is incorrect. Our single-cell and connectivity parameters are certainly based on ex vivo measurements, but the range of spontaneous activity that the Reviewer cites ([0.02 0.09] spks/s) is an estimate from in vivo recordings. Furthermore, in our model, we explored mean firing rates higher than this in vivo range and still observed sequences.

      ● While the Reviewer states that “it is not clear how strong the strongest weights must be”, we do provide a lower-bound estimate. We explored simulations where we truncated sections of the distribution of synaptic strengths and observed that networks that included the bottom 90% of connections did not produce sequences.

    1. Author Response

      Reviewer #1 (Public Review):

      This study sets out to decipher whether the eDNA that promotes biofilm dispersal in Caulobacter crescentus biofilms is released when a random portion of cells lyse within biofilms, or whether eDNA release is a regulated process. They start by investigating whether any of the C. crescentus TA systems contribute to biofilm-associated cell death, and find that one of the systems, ParDE4 is responsible for cell death and eDNA release. They go on to show that this system is O2-regulated and thus contributes to cell death in particular in the oxygen limited interior regions of biofilms. These findings contribute significantly to our understanding of the biological functions of toxin-antitoxin systems, mechanisms of bacterial programmed cell death, and biofilm growth. The notion that TA systems function in cell death in particular has been controversial, and often based on overexpression of the toxin component, therefore the fact that this study uses a TA system in its native genomic context is notable. The authors also show clearly the somewhat counterintuitive result that the cell death (and presumably, toxin activity) is negatively correlated with transcription of the TA system. This is consistent with what is known about TA biology (but not with many past TA papers, which often correlated TA transcription with toxin activation). The study also provides a logical rationale for how ParDE4 mediated cell death ultimately contributes to bacterial fitness. The paper is well written and figures are clear and easy to follow.

      There are two relatively minor shortcomings of the paper, both acknowledged as caveats by the authors in their discussion. First, while the authors do include one experiment that addresses whether the toxin is responsible for the cell death (Fig 3), they do not show direct evidence of the activity of the toxin other than cell death/eDNA release. Second, the authors do not address whether the reduced TA transcription they observe is what causes the release of the toxin and thus the cell death phenotype. This seems likely to be the case based on previous studies of other TA systems (e.g. TA systems involved in plasmid segregation, most clearly shown for CcdAB, or more recently the ToxIN system during phage infection). Connecting this directly would be a very valuable addition to this study.

      We thank the reviewer for those positive comments. We agree that the TA system we describe in this study needs to be characterized in more detail. Understanding how this TA expression levels are linked to cell death is our next goal and will be the scope of a future publication.

      We now discuss the important missing point about possible TA expression being linked to cell death and refer to CcdAB, ToxIN and other relevant systems, as well characterized examples of such mechanisms. In the introduction, we now present the role of TAS in plasmid addiction and phage defense mechanisms. We also provide more information about those systems in the discussion and speculate the similarities with the TAS described here (see our reply to essential revisions above).

      Reviewer #2 (Public Review):

      In this work, the authors present compelling evidence that a toxin-antitoxin system contributes to biofilm dispersal under oxygen limited conditions. This work makes important contributions to two areas of microbial physiology; functional understanding of toxin-antitoxin systems, which have remained largely elusive, and mechanistic regulation or biofilm dispersal, is a critical, but less understood aspect of biofilm physiology.

      A major goal of the work described in this manuscript was to better understand the regulation of biofilm dispersal. These authors provide compelling evidence that the parDE4 toxin-antitoxin (TA) system in Caulobacter crescentus mediates enhanced cell death under conditions of oxygen limitation. This group previously reported that extracellular DNA (eDNA) inhibits attachment of new-born swarmer cells. Here they build on that observation by identifying a genetic module that contributes to cell death and DNA release under oxygen limitation, a sub-optimal condition present in a dense biofilm community, and demonstrate that parDE4 affects biofilm development. Together, this work makes important contributions toward understanding functional roles for toxin-antitoxin systems and regulation of mature stages of biofilm development. In addition, although eDNA is often depicted as having a structural role in strengthening and maintaining biofilms in some species, this work further establishes that eDNA can have multiple roles in biofilms including contributing to dispersal in Caulobacter.

      Strengths of this work include 1) comprehensive evaluation of multiple paralogous TAS and specific identification of the contribution of parDE4 to cell death, eDNA release and biofilm restriction, 2) genetic dissection of the TA pair to establish that the ParD4-antitoxin prevents eDNA release and promotes biofilm formation in a ParE4-toxin dependent manner, 3) provision of evidence that the parDE system affects cell death / eDNA release, but not responsiveness to eDNA, 4) demonstration of an anti-correlation between expression of parDE and ccoN, a hypoxic responsive gene, at both the population level under different growth conditions and at the single cell level within different growth conditions.

      We thank the reviewer for these positive comments.

      One weakness of this work is that the authors do not directly measure O2 concentrations in their growth conditions. However, they do monitor activity of an established hypoxic responsive promoter, which provides strong evidence that the various conditions tested do indeed affect oxygen concentrations in the culture medium. Nevertheless, it is difficult to assess oxygen availability in the flow cell experiments, which will be dependent on both dissolved oxygen in the media pumped through the flow cell and cell density within the flow cells. In the competition experiments, the ∆parDE4 mutant has an advantage before there seems to be an appreciable cell density, perhaps reflecting low oxygen in the growth medium or a monolayer of cells that is not obvious in the images as presented. It would be interesting to evaluate expression of ccoN in biofilms grown under these flow conditions.

      We agree with the reviewer that one limitation of our study is that we could not directly measure the O2 concentration in our different growth conditions. Unfortunately, we were unable to find a way to reliably and reproducibly assay the dissolved O2 concentration in our experimental set-ups (both static biofilms and flow-cells). We think that regulation of parDE4 expression is linked to the composition of the local environment surrounding each cell, and offering a proxy via ccoN expression is the best method we could provide to assess this. Results provided in Figures 7 and S3 (now S5) clearly show that cells that respond to limiting O2 levels (by activating ccoN expression) have low parDE4 expression. We also show in this set of experiments that, at the population level, there are cells highly expressing ccoN or parDE4 regardless of the culture conditions and the overall O2 levels.

      We now provide the expression of ccoN in different areas of biofilms, in addition to the already presented parDE4 expression, in Fig. 8A. We quantified ccoN transcription levels using the PccoN-lacZ construct (already used to generate data in Figure 5) and the fluorogenic ß-galactosidase substrate we used to quantify parDE4 expression in biofilms in the first version of this manuscript (Figure 8A). These new results now show that in biofilm areas where parDE4 is more expressed, ccoN expression is low and vice-versa and confirm other observations made throughout this work.

      The discussion regarding the observation that parDE expression drops under activating (oxygen limiting) conditions is contradictory to what I would expect based on the early findings about TA systems as genetic stabilization systems. The authors seem to expect that conditions that activate the toxin should correspond to increase expression of the TA operon. However, TA systems have frequently been characterized as DNA stabilization systems for plasmids or other mobile elements because the toxin proteins are more stable than the antitoxin proteins. In these cases, if the gene pair is lost (or in this case if expression is decreased) then the toxin protein persists longer than the antitoxin protein, effectively activating the toxin to arrest or kill cells that have lost (or in this case turned off) the gene pair. Thus I disagree with the statement that this is a "novel regulatory mechanism of PCD that remains to be understood" (line 436-7).

      The sentence preceding this one was "We are unaware of cases where reduced TAS expression is correlated with the condition that activates the PCD in biofilm regulation." and we suggested a "novel regulatory mechanism of PCD" in the context of biofilm formation. However, we realize now that our statements could be misleading and we entirely rewrote this section (Lines 510-519: " It is interesting to note that the "neutralized" steady state of the ParDE4 TAS, when the toxin is inactivated, seems to be when O2 is abundant, i.e, when parDE4 transcription is at its highest. In most studied TAS, stresses have been shown to induce transcription of TAS (LeRoux et al., 2020, Jurėnas et al., 2022), but here, the stress inflicted on the cells by O2 limitation is accompanied by a lower expression of parDE4. We are unaware of cases where reduced TAS expression is correlated with the condition that activates the PCD in biofilm regulation. This suggests a novel regulatory mechanism of PCD, in the context of biofilms, that remains to be understood.").

      Differential stability of toxin and antitoxin proteins provides a reasonable regulatory mechanism to explain the programed cell death observed. Testing of this, or other, mechanistic model(s) will be important in future studies of this system.

      We agree with the reviewer and testing protein stability is definitively on the list of experiments to do to dissect this TA killing mechanism in the near future. As mentioned above, we have been unable to obtain antibodies to these proteins so far, delaying these types of experiments.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper proposes a 2D U-Net with attention and adaptive batchnorm modules to perform brain extraction that generalises across species. Generalisation is supported by a semi-supervised learning strategy that leverages test-time monte-carlo uncertainty to integrate the best-predicated labels into the training strategy. Monte-Carlo dropout maps also tend to align with inter-rate disagreement from manual segmentations meaning that they can realistically be used for fast QC. The networks (trained on a range of source domains) have been made publicly available, meaning that it should be relatively simple for users to apply them to their own cohorts, allowing for retraining on a very small number of labelled datasets. Overall the paper is exceptionally well written and validated, and the tool has broad application.

      We thank this reviewer very much for these encouraging and valuable comments.

      Reviewer #2 (Public Review):

      In this manuscript, the authors are proposing a generalizable solution to masking brains from medical images from multiple species. This is done via a deep learning architecture, where the key innovation is to incorporate domain transfer techniques that should allow the trained networks to work out of the box on new data or, more likely, need only a limited training set of a few segmented brains in order to become successful.

      The authors show applications of their algorithm to mice, rats, marmosets, and humans. In all cases, they were able to obtain high Dice scores (>0.95) with only a very small number of labelled datasets. Moreover, being deep-learning-based segmentation once a network has been trained is very fast.

      The promise of this work is twofold: to allow for the easy creation of brain masking pipelines in species or modalities where no such algorithms exist, and secondly to provide higher accuracy or robustness of brain masking compared to existing methods.

      I believe that the authors overstate the importance of generalizability somewhat, as masking brains is something that we can by and large do well across multiple species. This often uses specialized tools for human brains that the authors acknowledge work well, and in the usually simpler non-human (i.e. lissencephalic rodent) brains also work well using image registration or multi-atlas segmentation style techniques. So generalizability adds definite convenience but is not a game-changer.

      The key to the proposed algorithm is thus that it works better than, or at least as well as, existing tools. The authors show multiple convincing examples that this is the case even after retraining with only a few samples. Yet in those examples, the authors proposed retraining the network on even subtle acquisition changes, such as moving in field strength from 7 to 9.4T. I tried it on some T2 weighted ex-vivo and T1 weighted manganese enhanced in-vivo mouse data and found that the trained brain extraction net does not generalize well. None of the pre-trained networks provided by the authors produced reasonable masks on my data. Using their domain adaptation retraining algorithm on ~20 brains each resulted in, as promised, excellent brain segmentations. Yet even subtle changes to out-of-sample inputs degraded performance significantly. For example, one set of data with a slight intensity drop-off due to a misplaced sat band created masks that incorrectly excluded those lower intensity voxels. Similarly, training on normal brains and applying the trained algorithm to brains with stroke-induced lesions caused the lesions to be incorrectly masked. BEN thus seems to be in need of regular retraining to very precisely matched inputs. In both those examples, the usual image registration/multi-atlas segmentation approach we use for brain masking worked without needing any adaptation.

      Overall, this paper is filled with excellent ideas for a generalized brain extraction deep learning algorithm that features domain adaptation to allow easy retraining to meet different inputs, be they species or sequence types. The authors are to be highly commended for their work. Yet it appears to at the moment produce overtrained networks that are challenged by even subtle shifts in inputs, something I believe needs to be addressed for BEN to truly meet its promised potential.

      We sincerely thank the reviewer for these constructive comments. We appreciate that the article is considered to be a valuable contribution to the field of neuroimaging by providing BEN as an efficient and generalisable deep learning based tool for brain extraction. The major concern of this Reviewer is that a pretrained BEN leads to unsatisfactory performance on some external data (e.g. the reviewer’s own data), although the domain adaptation retraining algorithm on ~20 brains did lead to, as promised, excellent segmentation results. Here, we would like to emphasize that the initial version of BEN on Github was designed to reproduce the results we presented in the manuscript, not an optimized version for processing external datasets. To address this issue, we have optimized the BEN pipeline in the revised version, which is summarized as follows:

      1) Orientation detection. We found that in the original version of BEN, our training rodent images for BEN are all axial views, so it works the best on testing images of axial view. Therefore, if rodent MR images are loaded in other views (such as sagittal, coronal), the performance of BEN will degrade. To solve this issue, we have updated an orientation detection function in the BEN pipeline and automatically align other orientations to axial view, thus optimizing BEN’s performance.

      2) Performance optimization using plug-and-play functions. We have added post-processing steps to improve performance and running logs for quick inspection.

      3) Validation and tutorials. To further validate BEN’s generalization, we have evaluated BEN on two new external public ex-vivo MRI datasets (rTg4510 mouse: 25 ex-vivo scans, and C57BL/6 mouse: 15 ex-vivo scans). When only one label is used for BEN adaptation/retraining, impressive performance is achieved on both datasets, despite the fact that BEN was originally designed for in-vivo MRI data. To make the implementation transparent and give detailed guidance to users, we have prepared video tutorials on our Github/Documentation (https://github.com/yu02019/BEN#video-tutorials). Note that BEN’s performance may degenerate when dealing with MR images with low image quality. As an open-resource tool, BEN is extensible, our team will continuously maintain and update it.

      Nevertheless, there could be a couple of reasons that cause suboptimal performance when using a pretrained BEN. We discuss them below and have revised the manuscript accordingly (last paragraph in Discussion).

      On the one hand, as pointed out by the reviewer, domain generalization is a challenging task for deep learning. Although BEN could adapt to new out-of-domain images without labels (zero-shot learning) when the domain shift is relatively small (e.g. successful transfer between modalities and scanners with different MR strengths), the domain gap exists in ex-vivo MRI data used by the reviewer and in-vivo images in our training images could be so large that it compromises the performance. In this case, additional labeled data and retraining are indeed necessary for BEN to perform few-shot learning, which we have emphasized and demonstrated in our manuscript and confirmed by the reviewer (although in our opinion, it is possible we only need <5 more brains instead of 20 to complete the task).

      On the other hand, as a deep learning tool, it is difficult or nearly impossible to guarantee optimal performance on any unseen data. This is also a motivation for us to design BEN as an extensible tool. As stated in the manuscript, the source domain for BEN is flexible and does not bind to Mouse-T2-11.7T, in our manuscript. Instead, users can provide their own data and pretrained network as a new source domain, therefore facilitating domain generalization by reducing the domain gap between the new source and target domains.

    1. eLife assessment

      This paper will be of interest to those studying DNA replication in the context of chromatin and development. This important study uncovers a new interaction partner for the chromatin protein SuUR and tries to understand how this complex (SUMM4) functions to control under-replication in polytene chromosomes. While the experiments are of high quality and carefully controlled, the data currently do not fully support all the conclusions, particularly as they relate to conclusions about DNA replication timing.

      We appreciate a positive evaluation of our work. We agree that the relevance of under-replication phenomenon to the establishment of late replication in dividing cells has only been established based on circumstantial evidence. In the revised manuscript, we expand the explanation of this relationship and discuss limitations of the endoreplication model as applied to understanding of late DNA replication in the cell cycle of diploid cells. We also edited the abstract to soften our conclusions. We believe that the improvements made in the revised manuscript produced a more stringent alignment between our data and the conclusions.

      Reviewer #1 (Public Review):

      Andreyeva et al. developed a novel purification/mass spec approach to identify SuUR-associated proteins. From this biochemical tour de force, they identify a complex consisting of the insulator-associated protein Mod(Mdg4) and SuUR that they term, SUMM4. They show that this complex (at least SuUR) has ATPase activity, which is an exciting result was no known biochemical activity associated with SuUR. Given SuUR's function in the under-replication of Drosophila salivary glands, the authors show that SuUR and Mod(Mdg4) at least partially localize on polytene chromosomes and that SuUR displays at least a partial dependence on Mod(Mdg4) for localization to IH, but not PH regions. Finally, using two independent genetic reporters, they show that SuUR itself has an insulator function, which is a new function for SuUR and exciting as it is likely a diploid cell-specific function for SuUR. The authors then attempt to show the Mod(Mdg4) functions in under-replication. Unfortunately, under-replication is minimally, if at all, changed in the Mod(Mdg4) mutant. While the authors bring up several possible scenarios of why this could be, it is still uncertain whether Mod(Mdg4) has a direct effect on under-replication.

      Strengths:<br /> The authors developed a very useful strategy to identify protein interactions through multiple purification steps using mass spectrometry. This approach can be applied to different systems and will be generally useful to the community. Through this approach, they provide very compelling data that SuUR and Mod(Mdg4) form a complex. Furthermore, the experiments all have been rigorously performed and the data is of high quality.

      Weaknesses:<br /> The way the paper is written, its main focus is on under-replication. What the authors were not able to conclusively demonstrate is whether Mod(Mdg4) functions in under-replication.

      We thank the Reviewer for a positive evaluation of our work, specifically the biochemical and cytological results. Unfortunately, this Reviewer was less convinced by our conclusions about the role of Mod(Mdg4) in regulation of under-replication. However, we believe that our data strongly implicate Mod(Mdg4) in under-replication:

      1) Although SuUR is considered a bona fide suppressor of under-replication, its mutation does not fully restore DNA copy numbers in under-replicated regions of polytene chromosomes but, rather, by ~78% on average (Table 1). Although the mutation of mod(mdg4) produces a weaker recovery (~26% on average, Table 1), it is still robust and statistically significant. Presently, there is only one other mutant (Rif1) known to restore DNA copy numbers at most under-replicated regions in salivary gland polytene chromosomes.

      2) DNA copy numbers in SuUR and Rif1 mutants, which are homozygous viable and fertile, are measured in L3 larvae produced from crosses of homozygous parents, i.e. in the absence of maternally contributed gene products. In contrast, mod(mdg4) is essential for viability, and the DNA copy numbers have to be measured in homozygotes that have Mod(Mdg4) protein and RNA loaded by heterozygous mothers. Since endoreplication initiates before the maternal product is exhausted, it limits the observed suppression. However, when we directly compare zygotic functions of SuUR and mod(mdg4) by analyzing the progeny of heterozygous mod(mdg4)/+ and SuUR/+ parents, they appear indistinguishable.

      3) Finally, we demonstrate that Mod(Mdg4) is essential for the proper loading of SUUR in polytene chromosomes, thus implicating it as a direct, SUUR-dependent effector of late DNA replication.

      In the revised manuscript, we provide a clearer explanation of our results. We hope that our arguments and modifications of the manuscript will alleviate the Reviewer’s concerns.

      Reviewer #2 (Public Review):

      This paper from the Fyodorov lab reports the isolation of a native protein complex of SUUR, a Drosophila SNF2-related factor, in a complex with Mdg4, an established chromatin boundary protein. The discovery of this native complex, called SUMM4, was enabled by the development of a mass spec-linked proteomic analysis of fractions from an unbiased, conventional multi-step chromatographic purification of low-abundance protein complexes. The authors validate the native interactions by co-immunoprecipitation and show further with recombinant proteins that SUUR displays ATPase activity, a property not previously shown, and which is stimulated by Mdg4. From a functional perspective, authors demonstrate that both components SUUR and Mdg4 mediate activities of the Drosophila gypsy insulator that blocks enhancer-promoter interactions and acts as a heterochromatin-euchromatin barrier, and moreover, has a role in the under-replication of intercalary heterochromatin.

      Overall, this work is a substantial contribution to the field in two respects. First, it provides a new approach to the identification of novel native complexes that are of low abundance and difficult to isolate and identify by conventional biochemistry and mass spectrometry. Second, the interaction between Mdg4 and SUUR is novel and offers an ATP-driven pathway to be further investigated for understanding the mechanism of insulator (gypsy) function. Together, these advances are supported by the compelling quality and quantity of data. However, the paper does not read smoothly and can benefit from rewriting for readers who are not familiar with mass-spec proteomics or Drosophila biology.

      We thank the Reviewer for a positive evaluation of our work. To improve clarity, we made several modifications of our manuscript as requested by the Reviewer.

    1. Author Response

      Reviewer #1 (Public Review):

      The layered costs and benefits of translational redundancy by Raval et al. aim to investigate the impact of gene copy number redundancy on E. coli fitness, using growth rate in different media as the primary fitness readout. Genes for most tRNAs and the three ribosomal RNAs are present in multiple copies on the E. coli chromosome. The authors ask how alterations in the gene copy number affect the growth rate of E. coli in growth media that support different rates of growth for the wild type.

      While it was shown before that mutants with reduced numbers of ribosomal RNA operons grow at reduced rates in rich medium (LB), this study extends these findings and reaches some important conclusions:

      1) In a poor medium (supporting slow growth rates), the mutants with fewer rRNA operons actually grow faster than the wild type, showing that redundancy comes at a cost.

      2) The same is true for mutants with reduced gene copy number of certain tRNAs and correlates with slower rates of protein synthesis in these mutants.

      3) That rRNA operon gene copy number is more decisive for growth rate than any tRNA gene copy number (>1).

      In addition, measurements of strains with deletions of genes encoding tRNA-modification enzymes that affect tRNA specificity are included. While interesting, no unifying conclusion could be reached on the impact of these mutations on growth rate.

      Thank you for this clear summary of our work.

      The well-known "growth law" relationships between growth rate and macromolecular composition (RNA/protein ratio, for example) specifically concern steady-state growth rates. It is concerning that all growth rates in this work were measured on cultures that were only back-diluted 1:100 from overnight LB precultures. That only allows 6-7 doubling times before the preculture OD is reached again. The exponential part of growth would end before that, allowing perhaps only 3-4 generations of growth in the new medium before the growth rate was measured. Thus, the cultures were not in balanced growth ("steady state") when the measurements were made, rather they were presumably in various states of adapting to altered nutrient availability.

      A detailed connection with exact growth rate laws indeed requires growth rate measurement in steady-state. Hence, we refrained from making such a connection in this manuscript, though it would be an interesting future avenue to explore. Our main goal here was to ask how E. coli growth rate is affected by external nutrient availability and internal translation components. For this, the key comparisons involve the WT vs. gene deletion mutations, and rich vs. poor growth media. For any given comparison, strains were tested under identical conditions and experimental protocols, and hence we can address our main questions without the need to obtain steady-state growth. As an aside, we note that the nutrient fluctuations inherent in such experiments may also be more relevant than steady-state growth for natural bacterial populations.

      As noted by the reviewer, we measured fitness only in a relatively narrow growth regime of several doublings; but we do capture exponential growth by focusing on the early data points (representing the exponential phase) for our growth rate calculations. We have now explicitly mentioned this in the methods section “Measuring growth parameters”.

      A second concern is the use of the term "tRNA expression levels" in the text in Figure 4. I believe the YAMAT-seq method reports on the fractional contribution of a given tRNA to the total tRNA pool. Thus, since the total tRNA pool is larger in fast-growing cells than in slow-growing cells, a given tRNA may be present at a higher absolute concentration in the fast than in the slow-growing cells but will be reported as "higher in poor" in figure 4, if the given tRNA constitutes a smaller fraction of the total tRNA pool in rich than in poor medium. For this reason, the conclusions regarding the effect of growth medium quality on tRNA levels are not justified.

      Thank you for this important point. We agree that our phrasing was incorrect, and we have modified the relevant text and figures accordingly. The fractional contribution of a given tRNA isotype to the total tRNA pool is still useful to compare, and is justified as now rephrased.

      Reviewer #2 (Public Review):

      Raval et al. by creating a series of deletion mutants of tRNAs, rRNAs, and tRNA modifying enzymes, have shown the importance of gene copy number redundancy in rich media. Moreover, they successfully showed that having too many tRNAs in poor media can be harmful (for a subset of the examined tRNAs). Below, please find my comments regarding some of the methodologies, conclusions, and controls needed to stratify this manuscript's findings.

      Figure 2 presents Rrel as a relative measurement (GRmut/GRwt). Therefore, I'm confused as to how Rrel can be negative, as shown in supplemental file 3 (statistics).

      We apologize for the confusion. Supplemental file 3 shows details of the statistical analysis (not raw data), and we included the effect size here (mean difference between the WT and the mutant relative growth rate) along with statistical significance. Thus, if the rel R of a given mutant is 1.1, the mean difference would be (1–1.1) = –0.1, meaning that it is performing 10% better than the WT.

      The “raw” relative growth rates are provided in source data files (labeled figure-wise), and there are no negative values there, as expected.

      We have now explicitly (and separately) referenced the source and statistics data files in the data analysis section in the methods, and in each figure legend. We hope this avoids confusion and makes it easier for readers to find the correct file.

      Does Figure 3 show the mean of 4 biological replicates or technical replicates? It should be stated clearly in the legend of figure 3.

      All replicates are biological replicates until unless stated otherwise. This is now stated in the methods (lines 185-187), and in the figure legends.

      Do all strains (datapoint on figure 3 left panel) significantly perform better than the WT in nutrient downshift? Looking at supplemental file 3 I see this is not the case. Please mark the statistically significant points. I suggest giving each set a different symbol/shape and coloring the significant ones in red.

      We had considered indicating statistical significance in the plot, but decided not to do so because it was difficult to show the many potentially useful layers of information without cluttering the plot. One other practical difficulty was that each point in the figure represents two values: one from the upshift (Y axis) and one from the downshift (X axis). For some mutants the fitness difference was significant in only one direction, so it was not straightforward to indicate significance. Further, our main goal here was to show where strains from different deletion Sets (Figure 1) fall in this plot (i.e. which quadrant they occupy), and so we wanted to ensure that points were easily distinguished by Set. In the text we do not include statistically non-significant points in the summary of observed patterns, and refer readers to information on statistical significance provided in the supplemental file.

      Another issue is that in the statistics of figure 2 (in supplemental file 3), positive values reflect cases where the mutant performs poorly compared to the WT, while in figure 3 the negative values indicate this. Such discrepancy is not very clear. And again, how can Rrel be negative?

      As noted in response to an earlier comment, Rrel values (given in source data files) are not negative, but effect sizes (given in supplemental file with statistics) may be negative or positive since they show differences in the relative growth rate of WT and mutant. We agree that the discrepancy between the calculation of mean difference for Figs 2 and 3 was confusing. We have now fixed this: in both cases, negative mean difference values now indicate that the mutant performs better.

      Both axes say glycerol. What about galactose?

      The typo has been corrected.

      Lines 414-419: The authors state that "all but one had a growth rate that was comparable to WT (16 strains) or higher than WT (10 strains) after transitioning from rich to poor media (i.e. during a nutrient downshift, note data distribution along the x-axis in Fig 3; Supplementary file 3). In contrast, after a nutrient upshift, 11 strains showed significantly slower growth in one or both pairs of media, and only 2 showed significantly faster growth than WT (note data distribution along the y-axis in Fig 3; Supplementary file 3)".

      Looking at the Rrel values when transitioning from TB to Glycerol and vice versa suggests no direction in the effect of reducing redundancy. During downshift, four strains perform better, and three strains perform worse than the WT. During upshift, four stains perform better, and six strains perform worse. Only during downshift and upshift from TB to Gal and vice versa give a strong signal.

      The authors should write it clearly in the text because the effect is specific to that transition/conditions and not of general meaning is written in the text (e.g., transition from every rich to every poor media and vice versa). I am convinced that the authors see an actual effect when downshifting or upshifting from TB to galactose and vice versa. In that case, the conclusion is that redundancy is good or bad depending on the conditions one used and not as a general theme.

      Also, this is true just for some tRNAs, so I don't think the conclusion is general regarding the question of redundancy.

      The fitness impacts of altered redundancy are best explained by a combination of multiple factors (in addition to nutrient availability): the number of tRNA genes deleted, number of tRNA gene copies remaining as a backup, availability of wobble or ME as backup, and codon usage. Thus, any of these variables alone would provide only partial explanation for the observed fitness effects of all strains.

      In many tRNA deletion strains – especially single gene deletions – redundancy was not significantly lowered by the deletion, as we explain in the results section. These strains were therefore not expected to show major fitness impacts or follow strong nutrient dependent trends, and this is what we observe.

      The same is true for nutrient upshift-downshift experiments, where a vast majority of strains were not expected to show a specific pattern because they do not show significant fitness impacts in general, nor do they show a strong correlation in relative fitness impacts vs. growth rate (Figure 1d). In addition, in these experiments the difference between the two media also matters. For example, comparing the maximum WT growth rate, M9 Gal is poorer than M9 Glycerol. Therefore, shifts between TB-Gal are nutritionally more drastic than TB-Gly shifts, and one would expect a larger fitness impact in the former (for strains with significantly altered redundancy). Hence, despite differences across media pairs, our broader conclusions about the impact of redundancy are generalizable as long as redundancy and nutrients are both substantially altered, e.g. due to deletion of 3 tRNA genes, deletion of tRNA+ME, or deletion of multiple rRNA operons.

      Figures are indicated differently along the text. Sometimes they are written "figure X", sometimes FigX. Referring to the supplemental figures are also not consistent.

      We have now corrected this.

      Line 443-444: "In fact, 10 tRNAs were significantly upregulated in the poor medium relative to the rich medium".

      This result contradicts the author's hypothesis. If redundancy is bad in poor media because the cells have more tRNAs than they need, the tRNAs level will be downregulated, not upregulated. How do the authors explain this?

      This statement referred to the WT strain, and was meant to highlight that (as noted by the reviewer) some tRNAs appear to be upregulated in poor medium, which is counterintuitive. However, as noted by reviewer 1 (see their comment on the interpretation of YAMAT-seq data), we can only infer the relative contribution of each tRNA isotype to the total tRNA pool (rather than absolute up- or down- regulation). Thus, we have removed this specific sentence, and instead we focus on the mismatch between the media-dependent changes in the composition of the tRNA pool and the fitness effects of different tRNA isotypes (lines 475-482).

      Line 445-447: "In contrast (and as expected), all tested tRNA deletion strains had lower expression of focal tRNA isotypes in the rich medium (Fig 4B, left panel), showing that the backup gene copies are not upregulated sufficiently to compensate for the loss of deleted tRNAs". It is great that the authors validated the expression in their strains. However, for accuracy, please indicate that it was done in four strains to avoid the impression that they did it in all the strains.

      We have now reworded this sentence to remind readers that we measured 4 tRNA deletion strains in this experiment.

      Finally, across the manuscript, the authors reveal that deleting some tRNAs or modifying enzymes can be deleterious in rich media or advantageous in poor media. However, I think this result and the conclusions derived from it could be more convincing if the authors would show in a subset of their strains that expressing the deleted tRNAs or modifying enzymes from a plasmid can rescue the phenotype.

      Thank you for this suggestion. For a small subset of strains, we now include data showing that complementation from a plasmid indeed rescues the deletion phenotype (Fig 2 – Fig supplement 7).

      Reviewer #3 (Public Review):

      In this manuscript, Raval et al. investigated the cost and benefit of maintaining seemingly redundant components of the translation machinery in the E. coli genome. They used systematic deletion of different components of the translation machinery including tRNA genes, tRNA modification enzymes, and ribosomal RNA genes to create a collection of mutant strains with reduced redundancy. Then they measured the effect of the reduced redundancy on cellular fitness by measuring the growth rate of each mutant strain in different growth conditions.

      This manuscript beautifully shows how maintaining multiple copies of translation machinery genes such as tRNA or ribosomal RNA is beneficial in a nutrient-rich environment, while it is costly in nutrient-poor environments. Similarly, they show how maintaining parallel pathways such as non-target tRNA which directly decodes a codon versus target tRNA plus tRNA modifying enzymes which enable wobble interactions between a tRNA and a codon have a similar effect in terms of cost and benefit.

      Further, the authors show the mechanisms that contribute to the increased or reduced fitness following a reduction in gene copy number by measuring tRNA abundance and translation capacity. This enables them to show how on one hand reduced copy numbers of tRNA genes result in lower tRNA abundance in rich growth media, however in nutrient-limiting media higher copy number leads to increased expression cost which does not lead to an increased translation rate.

      Overall, this work beautifully demonstrates the cost and benefits of the seemingly redundant translation machinery components in E. coli.

      Thank you for the clear summary and encouraging comments.

      However, in my opinion, this work’s conclusion should be that the seeming redundancy of the translation machinery is not redundant after all. As mentioned by the authors, it is known that tRNA gene copy number is associated with tRNA abundance (Dong et al. 1996, doi: 10.1006/jmbi.1996.0428), this effect is also nicely demonstrated by the authors in the section titled “Gene regulation cannot compensate for loss of tRNA gene copies”. Moreover, this work demonstrates how the loss of the seeming redundancy is deleterious in a nutrient-rich environment. Therefore, I believe the experiments presented in this work together with previous works should lead to the conclusion that the multiple gene copies and parallel tRNA decoding pathways are not redundant but rather essential for fast growth in rich environments.

      The point is well taken. However, as described in the introduction, here we focus on functional redundancy at the cellular level, where there are multiple ways of achieving the same translation rate. Hence we say that translation components are redundant at this level of analysis. One of the key conclusions from our work is that such redundancy is context-dependent, i.e. it is essential when rapid growth is possible, but is costly and dispensable otherwise. Therefore, we show that the definition of redundancy itself changes with environmental conditions.

      The following analogy may help convey this. There may be many ways to reach a flight on an airport: multiple entrances, multiple check-in and security check counters, multiple boarding gates, etc. On a deserted airport these may seem redundant and even costly to maintain. On the other hand, they have a utility when traffic is high. Hence even though from a purely architectural perspective the multiple routes are redundant, from a utilitarian perspective it depends on the flux of passengers.

    1. Author Response

      Reviewer 2 (Public Review):

      The paper addresses the question of how brain circuits associate stimuli onto abstract representations, and how both the neuronal activity and the synaptic connectivity change during this process. To do so, the authors make use of a feedforward network model that learns to map stimuli vectors onto two categories by means of gradient descent. They show that the model successfully learns the abstract classes in a simple and context-dependent categorisation task. The authors analyse a number of measures, like category and context selectivity to link their results to experimental findings. Moreover, they analyse the network thoroughly and unravel network and task properties that may underlie previous, seemingly contradictory experimental findings. The paper is very well written, the analyses and mathematical derivations are very thorough and the results are convincing. However, the work and its presentation would benefit from a few changes:

      1) The paper may benefit from a more thorough discussion on how the results fit into the current literature (neuroscience and machine learning) and how the findings may generalise to more complex tasks and network structures (Dale’s principle, including recurrent/feedback connections, more than two categories, more than one hidden layer, alternatives to gradient descent).

      2) While the simulations and detailed analyses in the results and methods section are very convincing, some claims should be also supported by more intuitive explanations so that a broader audience can be reached.

      3) The introduction to the context-dependent task may need to be revised because as now the difference to the simple task presented first is not immediately clear.

      4) It would be nice if their findings could be related back to the experimental literature more qualitatively. While the authors mention the contradictory findings in monkey and rat PFC vs. monkey LIP in their introduction, a thorough comparison with those findings is missing.

      We thank the reviewer for his detailed assessment and his supportive words. We hope that our revision addresses your suggestions. Concerning point 4: we agree with the reviewer that a thorough comparison with experimental findings would be important, and is currently missing. A thorough comparison would require, however, a number of additional steps that we feel lie beyond the scope of this manuscript (adapt the tasks to each different experimental setup, e.g. by increasing the number of categories and changing the structure of context-dependent associations; re-analyse experimental data).

      We have thus decided to leave this major effort for future work.

    1. Author Response

      Public Evaluation Summary

      The authors aim to tackle a fundamental question with their study: whether there is a direct age-associated increase of transcriptional noise. To investigate this question, they develop tools to analyze single-cell sequencing data from mouse and human aging datasets. Ultimately, application of their novel tool (Scallop) suggests that transcriptional noise does not change with age, changes in transcriptional noise can be attributed to other sources such as subtle shifts in cell identity. This study is in principle of broad interest, but it currently lacks a definitive demonstration of the robustness of Scallop. Systematic testing of this new package would ultimately strengthen the key conclusion of the work and give additional users more confidence when using the tool to estimate expression noise.

      We have now attempted to further demonstrate the robustness of Scallop by performing a more systematic analysis and a side-by-side comparison to other existing methods using a set of artificially generated datasets. These analyses have resulted in the inclusion of six supplementary figures that are presented in the subsections Scallop membership score accurately identifies transcriptionally noisy cells, Ability to detect noisy cells within cell types, Effect of cellular composition, Effect of dataset size, Effect of feature expression and Effect of cell type marker expression within the Results section of the revised manuscript.

      We have also included a supplementary figure showing an in-depth analysis of a dataset where ageassociated increase in transcriptional noise was detected using alternative methods, but whose closer dissection has revealed that the difference in noise is due to a single donor and to the choice of methods. We discuss this is in the subsection Distance-to-centroid methods detect transcriptionally stable cell subtypes as transcriptional noise within the Results section.

      Finally, we have revised the manuscript to clarify the main points raised by the reviewers: the definition of transcriptional noise, the reasoning behind the choice of the single-cell aging datasets and Leiden’s rationale. Also, we have expanded the description of the method to make the definition of membership score more clear to the readers, and discussed the implications of our main findings (a lack of evidence for age-related transcriptional noise) in the broader context of theories of aging.

      Reviewer #1 (Public Review):

      In the present study, Ibanez-Sole et al evaluate transcriptional noise across aging and tissues in several publicly available mouse and human datasets. Initially, the authors compare 4 generalized approaches to quantify transcriptional noise across cell types and later implement a new approach which uses iterative clustering to assess cellular noise. Based on implementation of this approach (scallop), the authors survey noise across seven sc-seq datasets relevant for aging. Here, the authors conclude that enhanced transcriptional noise is not a hallmark of aging, rather changes in cell identity and abundances, namely immune and endothelial cells. The development of new tools to quantify transcriptional noise from sc-seq data presents appeal, as these datasets are increasing exponentially. Further, the conclusion that increased transcriptional noise is not a defined aspect of aging is clearly an important contribution; however, given the provocative nature of this claim, more comprehensive and systematic analyses should be performed. In particular, the robustness and appeal of scallop is still not sufficiently demonstrated and given the complexity (multiple tissues, species and diverse relative age ranges) of datasets analyzed, a more thorough comparison should be performed. I list a few thoughts below:

      Initially, the authors develop Decibel, which centralizes noise quantification methods. The authors provide schematics shown in Fig 1, and compare noise estimates with aging in Fig 2 - Supplement 2. Since the authors emphasize the necessary use of scallop as a ”better” pipeline, more systematic comparisons to the other methods should be made side-by-side.

      We thank the reviewer for their positive assessment of the manuscript and their suggestions. We agree that side-by-side benchmarking of Scallop with the methods implemented in Decibel, as well as a more thorough analysis on the effect of different features such as dataset size, cellular composition, etc. might have on the output of Scallop will reinforce the main points of the manuscript. To experimentally respond to these requests, we took advantage of a set of four artificial datasets previously generated by us with the R package splatter (v1.10.1; as described in Ascensión et al. [1]). In the present work, we first run a side-by-side comparison between Scallop and two distance-to-centroid (DTC) methods on the four artificial datasets with increasing degrees of transcriptional noise present in them (the novel data are included as Figure 1 – Figure supplement 1 in the revised manuscript). Then, we compared Scallop to one DTC method regarding their ability to detect noisy cells in different cell types (Figure 1 – Figure supplement 2). Finally, we implemented four simulations to test the effect of the following features on the performance of Scallop: cellular composition (Figure 1 – Figure supplement 3), dataset size (Figure 1 – Figure supplement 4), number of genes (Figure 1 – Figure supplement 5) and marker gene expression (Figure 1 – Figure supplement 6). A summary of these results follows.

      Side-by-side comparison of Scallop vs DTC methods

      Each of the four artificial datasets used consists of 10K cells, from 9 populations, named Group1 to Group9, with the following relative abundances: 25, 20, 15, 10, 10, 7, 5.5, 4, and 3.5%, respectively. The four datasets only differ in the de.prob parameter used in their generation. The de.prob parameter determines the probability that a gene is differentially expressed between subpopulations within the dataset. The greater the de.prob value, the more differentially expressed genes there will be between clusters, meaning that the different cell types present in the dataset will cluster in a more robust way. Decreasing the value of de.prob results in datasets with noisy cells, with populations that do not have such a strong transcriptional signature. In order to study how Scallop can capture the degree of robustness with which cells of the same cell type cluster together, we selected four de.prob values (0.05, 0.016, 0.01 and 0.005) and measured transcriptional noise using Scallop and two DTC methods, the whole transcriptome-based Euclidean distance to cell type mean and the invariant gene-based Euclidean distance to tissue mean expression. These two methods were selected because GCL does not yield a transcriptional noise measure per cell, so no comparisons can be made with respect to the amount of noisy cells the method is able to detect within a cluster. Similarly, comparing Scallop to the ERCC spike in-based method was not possible for artificial datasets. Importantly, these analyses showed that Scallop, unlike DTC methods, was able to discern between the core transcriptionally stable cells within each cell type cluster from the more noisy cells that lie in between clusters (provided in the Figure 1 - Supplement 1 of revised manuscript).

      Effect of dataset features on the performance of Scallop

      We simulated five artificial datasets with the same nine cell type populations but whose relative abundances were different between datasets. We used the imbalance degree (ID) to measure class imbalance in each of them and to make sure that the selected cell compositions represented a wide range of imbalance degrees (to this end, we explored ID values between 1.2 and 5.3). The ID provides a normalized summary of the extent of class imbalance in a dataset in so-called ”multiclass” settings, that is to say, where more than two classes are present. It was specifically developed to improve the commonly used imbalance ratio (IR) measurement, whose calculation only considers the abundance of the most and the least popular classes and which gives the same summary for datasets with different numbers of minority classes. The presence of multiple minority classes is not uncommon in single-cell RNAseq datasets, as tissues might contain several rare cell types. We observed that the transcriptional noise measurements provided by Scallop were very robust to changes in imbalance degree (see Figure 1 - Supplement 3), both in qualitative and in quantitative terms. For instance, Group2 and Group8 were always detected as the most stable and noisiest cell types, respectively, regardless of their relative abundance in the dataset, and their average percentage of noise had little variation between different ID values: it ranged between 0-0.14% (Group2) and 16-18% (Group8).

      The effect of dataset size (number of cells) and the number of genes was evaluated by generating versions of an artificial dataset where cells/genes had been subsampled from an original artificial dataset (the one generated with de.prob=0.001). We tested datasets sized 1,000-10,000 cells and with a number of genes between 5,000 and 14,000. Dataset size had nearly no impact on the transcriptional noise measurements provided by Scallop (Figure 1 - Supplement 4 of the revised manuscript). The average percentage of transcriptional noise per cell type remained within a narrow range as we implemented a ten-fold increase in dataset size. Perhaps more strikingly, removing the expression of most genes did not substantially impact transcriptional noise measurements per cell type (Figure 1 - Supplement 5). The variation when removing half of the genes (7,000 genes) was minimal, and we did not see important changes in transcriptional noise measurements unless over 60% of the genes from the original dataset were removed. For example, Figure 1 - Supplement 5C shows that noise measurements suffer important variations when removing 8,000 and 9,000 genes (and therefore keeping 6,000 and 5,000 genes, respectively), but only some cell types (Groups 4, 7, 8 and 9) were affected by these variations.

      In order to measure the effect marker gene expression has on the membership with which cells are assigned to their cell type cluster, we ran a simulation where the top 10 markers for a cell type were removed from the dataset one by one, so that the first simulation lacked the expression of the Top1 marker, the second simulation had the effect of the first 2 markers removed (Top1 and Top2), and so on. Then, we ran Scallop on each of the resulting datasets and observed a steady increase in transcriptional noise associated with that cell type. This provided evidence that the strength of cell type marker expression in a cluster is directly related to its transcriptional stability (or lack of transcriptional noise). We included the result of this experiment in the revised version of the manuscript (Figure 1 - Supplement 6).

      In conclusion, by using artificially generated datasets where the ground truth (cell type labels, degree of noise, etc) was known, the newly provided systematic analyses showed that Scallop had a remarkably robust response to said changes in dataset features, further reinforcing the manuscript conclusions.

      For example, scallop noise estimates (Fig 2) compared to other euclidean distance-based measures (Fig 2 supplement 2) looks fairly similar.

      It is true that some datasets show similar trends regardless of the transcriptional noise quantification method. For instance, the murine brain dataset by Ximerakis et al. shows no overall change in noise between the age groups across different methods. However, we do observe important differences in other examples. This is the case of the human pancreas dataset by Enge et al. and the human skin dataset by Solé-Boldo et al., where not only the magnitude but also the directionality of the trend are different depending on the method used to measure noise. In the former, three methods (Scallop, invariant gene-based Euclidean distance to average tissue expression and GCL) show an age-related increase in noise, whereas one method (whole transcriptome-based Euclidean distance to the cell type mean) shows a decrease in noise. In the latter, two methods (Scallop and GCL) yield a decrease in noise and the two DTC methods measure a mild increase in noise. These inconsistencies can now be reconciled with our proposed explanation that said ”noise” may actually be referring to substantially different biology in the diverse experimental settings.

      Are downstream observations (ex lung immune composition changes more than noise) supported from these methods as well? If so, this would strengthen the overall conclusion on noise with age, but if not, it would be relevant to understand why.

      Studying changes in cell type composition in the lung and other aged tissues would be highly pertinent. Nevertheless, we have measured changes in cell type composition using only one method that is based on Generalized Linear Models, covered in the subsection Age-related cell type enrichment of the Methods. The methods that we have compared in our study (DTC methods, ERCC-based methods, GCL, etc.) were all designed to measure transcriptional noise, but not changes in cell type composition.

      Whether the effects of cell type composition changes are bigger than changes in noise for the rest of the methods used to measure noise was probably not clear enough in the original manuscript. We found no evidence for an increase in noise associated with aging, regardless of the method used. Although not included in the manuscript, we did generate heatmaps similar to the one shown in Figure 3B for each of the noise quantification methods. However, as the heatmap on the right side (the one showing cell type enrichment) was identical in each figure, we considered them to be redundant and decided not to include them, since they did not provide any additional insight besides giving more examples of lack of evidence for transcriptional noise, this time at the cell type level. We consider that the lack of evidence was already well demonstrated in the previous analyses (Figure 2 and Figure 2 - Supplement 2.

      Similarly, the ’validation of scallop seems mostly based on the ability to localize noisy vs stable cells in Fig 1 supplement 1 and relative robustness within dataset to input parameters (Fig 1 supplement 2). A more systematic analysis should be performed to robustly establish this method. For example, noise cell clustering comparisons across the 7 datasets used. In addition, the Levy et all 2020 implemented a pathway-based approach to validate. Specifically, surrogate genes were derived from GCL value where KEGG preservation was used as an output. Similar additional types of analyses should be performed in scallop.

      We believe that this legitimate concern is now solved with the newly included data. In particular, with the systematic comparison between Scallop and DTC methods on three artificially generated datasets with different degrees of transcriptional noise provided in Figure 1 - Supplement 2. The ability of Scallop to detect cells that are particularly noisy within a cell type, or cells that lie between cell types, may represent its biggest advantage with respect to other methods. DTC methods fail to discern between stable and noisy cells within cell types. Also, in our analysis, DTC methods were unable to distinguish between cell types that have a marked transcriptional program (which systematically cluster together) and those that have a less clear transcriptomic identity (which have at least part of their cells be assigned to other cell types across bootstrap iterations). However, comparing the performance of Scallop on the same datasets showed that our method was able distinguish between the two cases.

      The conclusion that immune and endothelial cell transcriptional shifts associate more with age than noise are quite compelling, but seem entirely restricted to the mouse and human lung datasets. It would be interesting to know if pan-tissues these same cell types enrich age-related effects or whether this phenomenon is localized.

      We agree with the reviewer that it would be very interesting to see whether a change in cell type composition (and particularly, an increase in abundance of immune cell types) is observed in aged tissues other than the lung. Qualitative cell type composition changes in the aging lung have been described in the literature [5]. Specifically, the higher abundance of immune cell types was observed in a single-nucleus RNAseq dataset of cardiopulmonary cells in Macaca fascicularis [6]. However, we believe that trying to answer the question whether this phenomenon holds in other tissues would require a systematic analysis of several datasets for each tissue with a sufficient number of donors/individuals in each of them. This is because our approach to measure age-associated cell type enrichment using generalized linear models relies heavily on having multiple biological replicates for each age group. Unfortunately, this is not the case for most published single-cell RNAseq datasets of aging. In any case, we have toned down the last sentence in the subsection Changes in the abundance of the immune and endothelial cell repertoires characterize the human aging lung by making it more clear that our claim regarding changes in the cellular composition of aged tissues is based on lung datasets (the text in italics represents what was added in the revised version of the manuscript):

      "Even though the evidence for changes in tissue composition are based on a single tissue, we hypothesize that these facts may have influenced previous analyses of transcriptional noise associated with aging."

      As discussed in the original manuscript, there is evidence published by other groups pointing out to pantissue changes in cellular composition with age, which undoubtedly will influence those analyses that did not pay attention to cellular composition changes in the datasets that they compared. Cellular composition is in fact a very important aspect that has been greatly overlooked. In fact, only one [7] out of the seven articles that had measured transcriptional noise in aging (the datasets used in Figure 2) had attempted to remove its effect by subsampling cells to balance compositions between age groups prior to their noise analysis. In any case, we do not believe this is the only phenomenon underlying the purported increase in transcriptional noise associated with age. Each dataset will most probably have different issues that the authors originally misread as an increase in noise or loss of cellular identity of a particular organ or tissue. As an additional example of such phenomena, we have now included a re-analysis of the data by Enge et al. [3] on ”noisy” β-cells in the aged human pancreas (Figure 5–Figure supplement 2 of the revised manuscript). In this case, rather than observing an age-dependent pattern, the 21-year-old donor presents much lower transcriptional noise values than the rest of the donors. However, there is no significant difference between the 22-year-old donor and the rest of the donors. We conclude that the statistically significant differences between the ”young” and ”old” age categories can be attributed to the abnormal noise values obtained for the 21-year-old donor, of uncertain origin. Finding out all causes of apparent transcriptional noise in other organs and tissues would be too lengthy, and certainly out of scope for the present manuscript.

      Related to these, there does not seem to be a specific rationale for why these datasets (the seven used in total or the lung for deep-dive), were selected. Clearly, many mouse and human sc-RNA-seq datasets exist with large variations in age so expanding the datasets analyzed and/or providing sufficient rationale as to why these ones are appearing for noise analyses would be helpful. For example, querying ”aging” across sc-seq datasets in Single cell portal yields 79 available datasets: https://singlecell.broadinstitute. org/single_cell?type=study&page=1&terms=aging&facets=organism_age%3A0%7C103%7Cyears.

      We now realize that the reasoning behind our selection of aging datasets was not sufficiently clear in the original manuscript. We thank the reviewer for pointing out this omission. We have made a more explicit reference to Appendices 2, 3, 4 and 6 in the revised manuscript. The seven selected scRNAseq datasets are those where transcriptional noise had originally been measured by the authors, using the computational methods that we later implemented in Decibel. Our aim was to first recapitulate previous reports of transcriptional noise using our novel method (Scallop). Thus, we downloaded all publicly available scRNAseq datasets of aged tissues where transcriptional noise had explicitly been measured. Some of them had reported an increase in transcriptional noise only in some cell types (for instance, the human aged pancreas dataset by Enge et al. [3]), whereas others found an increase in most cell types [7]. Appendix 2 summarizes the main features of those seven datasets (tissue, organism and number of cells) and provides information on whether an increase in transcriptional noise was observed in the original article where they were published. Additionally, the ”scope” column indicates where that increase was found (in which cell types), and the ”Method” column briefly describes the computational method used to measure transcriptional noise in that article. Appendix 3 provides information on the final datasets that were used in our analysis (Figure 2). Not every sample from the original dataset was included, so the inclusion criteria are specified there, as well as the number of cells, individuals and age of each of the cohorts. Appendix 4 shows the abnormal count distribution of two samples that were discarded from the Kimmel lung dataset. As for the selection of lung for the deep dive, the reason was that this was the organ with most datasets available, both for mouse and human. Appendix 6 provides information on the number of cells and donors per age cohort in the human lung datasets included in this study.

      We have included the following sentence in the Increased transcriptional noise is not a universal hallmark of aging subsection in the Results:

      "We provide a summary of the main characteristics of each dataset, as well as the findings regarding transcriptional noise obtained in each of the original studies, whether changes in transcriptional noise were restricted to particular cell types, and the computational method used to measure noise (see Appendix 2)."

      The analysis that noise is indistinguishable from cell fate shifts is compelling, but again relies on one specific example where alternative surfactant genes are used as markers. The same question arises if this observation holds up to other cell types within other organs. For example the human cell atlas contains over dozens of tissue with large variations in age (https://www.science.org/doi/10.1126/science. abl4290).

      We sympathize with this comment but hope that the reviewer will agree with us that providing an additional example of different phenomena originally reported as ”transcriptional noise” (in this case in aged human pancreas; see Figure 5 – Figure supplement 2), but actually reflecting something else, may be sufficient to prevent interested readers. In our opinion, it is likely that diverse phenomena will underlie the purported increases in transcriptional noise, and a re-analysis should be made case-by-case. We can only hope that researchers in the field re-analyze the available aging datasets in this new light.

      Reviewer #2 (Public Review):

      In this manuscript, Ibanez-Sole et al. focus on an important open question in ageing research; ”how does transcriptional noise increase at the cellular level?”. They developed two python toolkits, one for comparison of previously described methods to measure transcriptional noise, Decibel, and another one implementing a new method of variability measure based on cluster memberships, Scallop. Using published datasets and comparing multiple methods, they suggest that increased transcriptional noise is not a fundamental property of ageing, but instead, previous reports might have been driven by age-related changes in cell type compositions.

      I would like to congratulate the authors on openly providing all code and data associated with the manuscript. The authors did not restrict their paper to one dataset or one approach but instead provided a comprehensive analysis of diverse biology across murine and human tissues.

      While the results support their main conclusions, the lack of robustness/sensitivity measures for the methods used makes it difficult to judge the biology.The authors use real data to compare between methods but using synthetic data with known artificial ’variability’ across cell clusters can first establish the methods, which would make the results more convincing and easier to interpret. Despite the comprehensive analysis of biological data, a detailed prior description of how the methods behave against e.g. the number of cells in each cell type cluster, the number of cell types in the dataset, and % feature expression, would make the paper more convincing. Once the details of the method is provided, the python toolkit can be widely used, not limited to the ageing research community. I am also concerned that a definition of ’transcriptional noise’ (e.g. genome-wide noise, transcriptional dysregulation in cell-type-specific genes, noise in certain pathways) and its interpretation with regard to the biology of ageing is missing. Differences in different methods could be explained by the different biology they capture. Moreover, the interpretation of a lack of different types of variability may not be the same for the biology of ageing.

      Increased transcriptional noise is compatible with genomic instability, loss of proteostasis and epigenetic regulation. Showing a lack of consistent transcriptional noise can challenge the widespread assumptions about how these hallmarks affect the organism. Overall, I found the paper very interesting and central to the field of ageing biology. However, I believe it requires a more detailed description of the methods and interpretations in the context of biology and theories of ageing.

      We thank the reviewer for their positive assessment of the manuscript and their suggestions. We respond to each of the specific comments below.

      Major comments

      1) The concept of transcriptional noise is central to the manuscript; however, what the authors consider as transcriptional noise and why is not clear. Genome-wide vs. function or cell-type specific noise could have different implications for the biology of ageing. In line with this, a discussion of the findings in the context of theories of ageing is necessary to understand its implications.

      We thank the reviewer for pointing out the lack of clarity in this key point. The use of the ”transcriptional noise” term in the literature is quite heterogeneous, and we agree that the lack of a consensus definition may be confusing to the reader. For this reason, we adopted in the introduction the definition by Raser and O’Shea [8] as ”the measured level of variation in gene expression among cells supposed to be identical”, i.e. the sum of both intrinsic and extrinsic noise as previously defined by Swain and colleagues [9, 10]. In our opinion, this is generally what the literature of age-associated transcriptional noise is referring to.

      With Scallop, we aimed to translate this concept to the context of single-cell RNAseq datasets, where clusters obtained using a community detection algorithm are typically annotated as distinct cell types.

      Therefore, we aimed to measure transcriptional noise here defined as ”lack of membership to cell type clusters”. When running a clustering algorithm iteratively, if a cell is not unambiguously assigned to the same cluster, we consider it to be noisy. Conversely, when a cell consistently clusters with the same group of cells, we consider it to be stable. The membership score we use as a measure of stability is the frequency with which any given cell was assigned to the same cluster across all iterations.

      We have included in the Results section an explicit reference to the Methods subsection that explains how Scallop works in detail, so that the readers can easily find that information:

      "A detailed description of the three steps of the method (bootstrapping, cluster relabeling and computation of the membership score) is provided in the Scallop subsection in the Methods."

      Additionally, we have now realized that the formula to compute the membership score might be more easily understood if we renamed the freq_score as freq_score(c), to make it clear that each cell is assigned a score. Also, we have used n and m instead of i and j in this notation, to avoid confusing the readers with the notation used in the previous section, where i and j represented the i-th and j-th bootstrap iterations. Finally, we have included a small paragraph to clarify what each component of the formula refers to. Below we show the formula and text included in the Methods section of the revised manuscript:

      "Where |cn| is the number of times cell c was assigned to the n-th cluster, and Pm∈clusters |cm| is the sum of all assignments made on cell c, which is the same as the number of times cell c was clustered across bootstrap iterations."

      Thus, and in order to accommodate this reviewer’s concerns, we have now included this exact definition of how we measure noise plus a statement making clear that we refer to the sum of both intrinsic and extrinsic noise aspects, with no distinction among them.

      Similarly, we had discussed our findings in the framework of different theories of aging, such as their potential relationship to some of the established hallmarks of aging (genomic instability, epigenetic deregulation and loss of proteostasis), as well as with more recent theories of aging such as cell type imbalance in aged organs [11] and inter-tissue convergence [12]. However, it is now clear to us that this was not enough so we have now expanded these paragraphs to make our understanding of the work implications better understood. More specifically:

      "Our results suggest that transcriptional noise is not a bona fide hallmark of aging. Instead, we posit that previous analyses of noise in aging scRNAseq datasets have been confounded by a number of factors, including both computational methods used for analysis as well as other biology-driven sources of variability."

      2) While I found the suggested method, Scallop, quite exciting and valuable, I would suggest including a number of performance/robustness measures (primarily based on simulations) on how sensitive the method is to the number of cells in each cell type (cellular composition), misannotations, % feature expression (number of 0s) etc.:

      We have analyzed the effect of cellular composition and the percentage of feature expression by using artificially generated datasets (see Figure 1 - Supplements 3 and 5, respectively; and section Effect of dataset features on the performance of Scallop in the response to reviewer #1). Although studying the effect of misannotations on downstream analysis is important, we believe that Scallop was already designed so that its effects could be avoided, since the membership is measured for each cluster (and not for each cell type label). That is to say, a reference clustering is obtained at the beginning of the pipeline and memberships are computed using that output as a reference, which means Scallop noise values attributed to each cell are not affected by the original labeling of the dataset.

      The output of these analyses reinforced our original conclusions, and it is now included in the Results section:

      "In order to characterize and validate our method for transcriptional noise quantification, we conducted three types of analyses. First, we used artificially generated datasets containing various degrees of transcriptional noise to compare the performance of Scallop and DTC methods side-by-side, regarding their ability to measure transcriptional noise and detect noisy cells within cell types. Next, we ran simulations using artificial datasets in order to study the effect of a number of dataset features on the performance of Scallop: cellular composition, dataset size, number of genes and marker expression. Finally, we graphically evaluated the output of Scallop on a dataset of human T cells, we analyzed its robustness to its input parameters, and we studied the relationship between membership and robust marker expression, using a PBMC dataset."

      2.1) Most importantly, knowing that cell-type composition changes with age, it is important to know how sensitive community detection is to the number of cells in each cell type. While the average can be robust, I wonder if the size of the cell-type cluster affects membership (voting).

      We have included an analysis on a set of artificial datasets with different cellular compositions to evaluate the performance of Scallop in the presence of different degrees of class imbalance (see Figure 1 - Supplement 3). We explain the output of this analysis, which reinforces the algorithm’s robustness, in the Results section:

      "Next, we ran a series of simulations on artificially generated datasets to evaluate the performance of Scallop in the presence of different levels of class imbalance, dataset size, number of genes, and different degrees of expression of cell type markers. Our analysis showed that Scallop was remarkably robust to changes in cellular composition (see Figure 1 - Supplement 3). Both the average percentage of noise and the distribution remained unchanged for a wide range of class imbalance degrees. Similarly, altering the dataset size (number of cells) and the number of genes of an artificial dataset did not cause any major changes on the transcriptional noise values attributed to each cell type (see Figure 1 - Supplements 4 and 5). Additionally, we conducted an analysis where we identified the 10 most differentially expressed gene markers for a cell type and measured the transcriptional noise associated with that cell type as we removed the expression of those genes from the dataset (Figure 1 - Supplement 5). Transcriptional noise steadily increased as we removed the effect of the top marker genes that defined the cell type under study (see Figure 1 - Supplement 5B). This experiment provides further evidence on how strong marker expression is related to robust cell type identity and how the lack of it results in transcriptional noise."

      3) Although the Leiden algorithm is widely used by many single-cell clustering methods, since the proposed methodology is heavily dependent on clustering, I suggest including a description of the Leiden algorithm.

      We agree that understanding how community detection algorithms in general –and Leiden in particular– work is crucial to understand the core of the paper, so we have included a brief introduction to these methods in the Methods section, at the beginning of the Scallop subsection:

      Leiden is a graph-based community detection algorithm that was designed to improve the popular Louvain method [13]. Graph-community detection methods take a graph representation of a dataset. In the context of single-cell RNAseq data, shared nearest neighbor (SNN) graphs are commonly used. These are graphs whose nodes represent individual cells and edges connect pairs of cells that are part of the K-nearest neighbors of each other by some distance metric. The aim of community detection algorithms like Leiden is to find groups of nodes that are densely connected between them, by optimizing modularity. For a graph with C communities, the modularity (Q) is computed by taking, for each community (group of cells), the difference between the actual number of edges in that community (ei) and the number of expected edges in that community ( K2/1/2m).

      Where r is a resolution parameter (r > 0) that controls for the amount of communities: a greater resolution parameter gives more communities whereas a low resolution parameter fewer clusters. Since maximizing the modularity of a graph is an NP-hard problem, different heuristics are used, and Leiden has shown to outperform Louvain in this task both in terms of quality and speed [14]. However, users can choose to run the Louvain method instead by setting the parameter clustering="louvain" in the initialization of the Bootstrap object.

      3.1) Most importantly, the authors comment that they found stronger expression of cell-type specific markers in the cells with high membership values - is it already a product of the Leiden algorithm that it weighs highly variable (thus cell-type specific) features higher - resulting in better prediction of cell-types for cells with strong cell-marker expression? It is important to make a description of transcriptional noise at this stage as it could be genome-wide or more specific to cell-type markers. Can authors provide any support that their method can capture both?

      We agree with the reviewer that finding a stronger expression of cell-type markers in cells with high membership values is indeed something we expected. The graph representation of the dataset taken as input by Leiden is built after running highly variable gene detection and PCA. The neighbors of each cell are detected based on the expression of genes that are highly variable, as the reviewer pointed out, so genes that are differentially expressed between cells are more likely to contribute to the clusters found by Leiden.

      Whether Scallop measures genome-wide or cell type-specific noise (or a mixture of both) is a very interesting question. Clusters in single-cell RNA sequencing datasets are often mainly driven by the presence/absence of a few cell type markers, rather than changes in expression levels of broader sets of genes. Moreover, it has been shown that single-cell RNAseq datasets generally preserve the same population structure even after data binarization [15]. This is a consequence of the sparsity of single-cell RNAseq datasets. In our case, any difference in expression between one cluster vs the rest of the cells in the dataset –be it the expression of a gene that was not detected in the rest of the cells or a higher expression of a gene whose presence is weaker in other clusters– will certainly have an impact on the output of every downstream analysis, from clustering to dimensionality reduction. The influence of the expression of cell type-specific markers on Scallop membership has been demonstrated in several analyses. First, the simulation where we measured the impact of removing the 10 most defining markers for a particular cell type on transcriptional noise measurements (included in the Figure 1 - Supplement 6 of the revised manuscript). Also, Figure 5 provides evidence that the differential expression of a handful of genes (in this case, genes coding for surfactant proteins) can have an impact on the clustering solutions obtained for a set of human alveolar macrophages, and this in turn influences the membership scores obtained with Scallop. In essence, Scallop merely provides a measure of the robustness of clustering at the single-cell level, so any type of transcriptional noise might have an impact on Scallop memberships, provided it is sufficiently strong to influence the output of the clustering algorithm used. In other words, the fact Scallop membership captures a mixture of both types of noise (genome-wide and that associated with cell type-specific markers) is a consequence of the influence both types of noise have on clustering.

      4) The authors conclude that Scallop outperforms other methods through the analysis of biological data, where there is no positive and negative control. I suggest creating synthetic datasets (which could be based on real data), introducing different levels of noise artificially (considering biological constraints like max/min expression levels) and then testing the performance where the truth about each dataset is known. Otherwise, the definitions of noisy and stable cells, regardless of the method, are arbitrary.

      Our initial focus was on biological datasets, were no positive and negative controls regarding transcriptional noise could be used, but we agree in the need of including an analysis using simulations on artificial datasets. We analyzed artificially generated datasets with known degrees of transcriptional noise in order to evaluate the performance of Scallop on a setting where the ground truth is known beforehand. The way we modeled transcriptional noise was by tuning the de.prob parameter, which determines the probability that a gene will be differentially expressed between clusters. The creation of these datasets is explained in detail in the Methods section of the revised manuscript, and specifically in the subsections Performance of Scallop and two DTC methods on four artificial datasets with increasing transcriptional noise. and Ability to detect noisy cells within cell types.

      We have now included the following section in the Results:

      "We compared the output of Scallop and two DTC methods (the whole transcriptome-based Euclidean distance to average cell type expression and the invariant gene-based Euclidean distance to average tissue expression) on four artificially generated datasets containing various levels of transcriptional noise. The analysis showed that Scallop, unlike DTC methods, was able to discern between the core transcriptionally stable cells within each cell type cluster from the more noisy cells that lie in between clusters (see Figure 1 - Supplement 1). We then compared one of the DTC methods to Scallop regarding their ability to detect noisy cells within each of the cell types, by plotting the top 10% noisiest and top 10% most stable cells and (see Figure 1 - Supplement 2A). Analyzing the distribution of noise values for each cell type separately revealed that Scallop can distinguish between clusters that mainly consist of transcriptionally stable cells from noisier clusters that do not have such a distinct transcriptional signature (Figure 1 - Supplement 2B."

      Reviewer #3 (Public Review):

      In this manuscript, Ibáñez-Solé et al aim to clarify the answer to a very basic and important question that has gained a lot of attention in the past ∼5 years due to fast-increasing pace of research in the aging field and development/optimization of single-cell gene expression quantification techniques: how does noise in gene expression change during the course of cellular/tissue aging? As the authors clearly describe, there have been multiple datasets available in the literature but one could not say the same for the number of available analysis pipelines, especially a pipeline that quantifies membership of single cells to their assigned cell type cluster. To address these needs, Ibáñez-Solé et al developed: 1. a toolkit (named Decibel) to implement the common methods for the quantification of age-related noise in scRNAseq data; and 2. a method (named Scallop) for obtaining membership information for single-cells regarding their assigned celltype cluster. Their analyses showed that previously-published aging datasets had large variability between tissues and datasets, and importantly the author’s results show that noise-increase in aging could not be claimed as a universal phenotype (as previously suggested by various studies).

      We thank the reviewer for their positive assessment of the manuscript and their suggestions.

      Comments:

      1) In two relevant papers (doi.org/10.1038/s41467-017-00752-9anddoi.org/10.1016/j.isci. 2018.08.011), previous work had already shown what haploid/diploid genetic backgrounds could show in terms of intercellular/intracellular noise. Due to the direct nature of age/noise quantification in these papers, one cannot blame any computational pipeline-related issues for the ”unconventional” results. The authors should cite and sufficiently discuss the noise-related results of these papers in their Discussion section. These two papers collectively show how the specific gene, its protein half-life and ploidy can lead to similar/different noise outcomes.

      We agree that we have failed to mention and sufficiently discuss the effects of measuring transcriptional noise from data generated via destructive experimentation, where no longitudinal analyses are possible. As aforementioned in the response to other reviewers, the body of literature on transcriptional noise is quite wide and based on heterogeneous assumptions. We have focused our efforts in measuring actual noise in scRNAseq aging datasets, which by definition imply sampling of different cells and thus make assumptions at the population level. We believe our results provide a different and interesting perspective into transcriptional noise and aging, but we agree with this reviewer in the need to discuss our findings in the context of other attempts to measure transcriptional noise in a more direct way. We have now included a brief discussion of the work by Sarnoski et al. and Liu et al.. This point is explained in more detail later in the letter.

      2) While the authors correctly put a lot of emphasis on studying the same cell type or tissue for a faithful interpretation of noise-related results, they ignore another important factor: tracking the same cell over time instead of calculating noise from single-cell populations at supposedly-different age points. Obviously, scRNAseq cannot analyze the same cell twice, but inability to assess noise-in-aging in the same cell over time is still an important concern. Noise could/does affect the generation durations and therefore neighboring cells in the same cluster may not have experienced the same amount of mitotic aging, for example. Also, perhaps a cell has already entered senescence at early age in the same tissue. This caveat should be properly discussed.

      The distinction between intrinsic and extrinsic noise and the impossibility to discern between the two in destructive experiments is a relevant point that we have now included in the Discussion (the newly added text is shown in italics):

      "Transcriptional noise could be related to genomic instability [18], epigenetic deregulation [19, 20] or loss of proteostasis [21], all established hallmarks of aging. Some authors consider transcriptional noise to be a hallmark of aging in and of itself [22]. In any case, the origin of transcriptional noise is unclear, as it could arise from many different sources. Most importantly, it not possible to distinguish between intrinsic and extrinsic noise from a snapshot of cellular states, i.e., one cannot tell whether the observed differences between cells in a single-cell RNA experiment reflect time-dependent variations in gene expression or differences between cells across a population [23]. Interestingly, recent work by Liu et al. measuring intrinsic noise in S. cerevisiae showed that aging is associated with a steady decrease in noise, with a sudden increase in soon-to-die cells. Another longitudinal study found an increase extrinsic noise and a lack of change in intrinsic noise in diploid yeast [16]."

      Regarding the caveat of cells of individuals in the Young groups showing signs of aging, we can only agree that this is correct: there will be cells sampled that already show signs of cellular damage in the absence of chronological aging. However this applies to every study of aging that samples cells in a destructive manner and it is generally assumed by the field that this is a discrete phenomenon that does not affect the overall results in a meaningful way.

      3) Another weakness of this study is that the authors did not show the source/cause of decreasing/stable/increasing noise during aging. Understanding the source of loss of cell type identity is also important but this manuscript was about noise in aging, so it would have been nice if there could be some attempts to explain why noise is having this/that trend in differentially aged cell types in specific tissues.

      The reviewer raises here a very important point that we would like to discuss in detail. The papers that we have re-analyzed generally assume that an increase in transcriptional noise and a loss in cell type identity are equivalent terms. However, as this reviewer points out, you could theoretically have cells that lose their cell type identity without a concomitant increase in transcriptional noise, for instance by a sharp decrease in a limited number of marker genes that collectively define that cell within a given cell type/cluster. Thus, transcriptional noise can certainly arise from different sources and several mechanisms have been proposed to explain its presence in the context of cellular aging. We agree with the reviewer that discussing how transcriptional noise could be related to aging is of interest to the readers. However, as pointed out in the responses to similar concerns by the other reviewers, our main finding is that we don’t detect meaningful and reliable increases in transcriptional noise associated with cell aging. Instead, what we see is a number of different technical and biological issues/phenomena that have been interpreted as transcriptional noise. We hope this reviewer will agree that the manuscript now presents a full and robust story and that finding the causes of up/down ”noise” trends in the different datasets may be more appropriately tackled by follow up studies.

      4) In the discussion section, the authors say that ”Most importantly, Scallop measures transcriptional noise by membership to cell type-specific clusters which is a re-definition of the original formulation of noise by Raser and O’Shea.” It is not clear what the authors refer to by ”the original formulation of noise by Raser and O’Shea”. Intrinsic/extrinsic noise formulations?? Please be more specific.

      We thank the reviewer for pointing this out, since we agree that the sentence needed to be reformulated for the sake of clarity. What we meant by the definition by Raser and O’Shea was ”the measured level of variation in gene expression among cells supposed to be identical”, which does not make any distinction between intrinsic and extrinsic noise. Since their definition is previous to the development of single-cell technologies, we meant to state our attempt to bring this classic concept to the context of single-cell RNAseq. Nowadays, cell clusters produced by a community detection algorithm are given cell type annotations depending on their expression of known cell type markers. What Scallop aims to measure is the extent of membership each individual cell has for their cluster as evidence of its transcriptional stability. In order to make this point more clear, we have now rewritten the paragraph as follows:

      Most importantly, Scallop measures transcriptional noise by membership to cell type-specific clusters which is a re-definition of the original formulation of noise by Raser and O’Shea: measurable variation among cells that should share the same transcriptome. This is in stark contrast to measurements of noise including other phenomena (as demonstrated in Figure 5) by the distance-to-centroid methods prevalent in the literature.

      References

      [1] M. Alex Ascensión, Olga Ibáñez-Solé, Iñaki Inza, Ander Izeta, and Marcos J Araúzo-Bravo. Triku: A feature selection method based on nearest neighbors for single-cell data. GigaScience, 11, 2022. doi: 10.1093/gigascience/giac017.

      [2] M. Ximerakis, S. L. Lipnick, B. T. Innes, S. K. Simmons, X. Adiconis, D. Dionne, B. A. Mayweather, L. Nguyen, Z. Niziolek, C. Ozek, V. L. Butty, R. Isserlin, S. M. Buchanan, S. S. Levine, A. Regev, G. D. Bader, J. Z. Levin, and L. L. Rubin. Single-cell transcriptomic profiling of the aging mouse brain. Nat Neurosci, 22(10), 2019. doi: https://doi:10.1038/s41593-019-0491-3.

      [3] M. Enge, H. E. Arda, M. Mignardi, J. Beausang, R. Bottino, S. K. Kim, and S. R. Quake. Single-cell analysis of human pancreas reveals transcriptional signatures of aging and somatic mutation patterns. Cell, 171(2), 2017. doi: https://doi:10.1016/j.cell.2017.09.004.

      [4] L. Solé-Boldo, G. Raddatz, and S. et al. Schütz. Single-cell transcriptomes of the human skin reveal age-related loss of fibroblast priming. Commun Biol, 3(188), 2020. doi: https://doi.org/10.1038/ s42003-020-0922-4.

      [5] Jaime L. Schneider, Jared H. Rowe, Carolina Garcia-de Alba, Carla F. Kim, Arlene H. Sharpe, and Marcia C. Haigis. The aging lung: Physiology, disease, and immunity. Cell, 184(8):1990–2019, 2021. doi: 10.1016/j.cell.2021.03.005.

      [6] Shuai Ma, Shuhui Sun, Jiaming Li, Yanling Fan, Jing Qu, Liang Sun, Si Wang, Yiyuan Zhang, Shanshan Yang, Zunpeng Liu, and et al. Single-cell transcriptomic atlas of primate cardiopulmonary aging. Cell Research, 31(4):415–432, 2020. doi: 10.1038/s41422-020-00412-6.

      [7] I. Angelidis, L. M. Simon, and I. E. et al. Fernandez. An atlas of the aging lung mapped by single cell transcriptomics and deep tissue proteomics. Nature Communications, 2019. doi: https://doi.org/10. 1038/s41467-019-08831-9.

      [8] Jonathan M. Raser and Erin K. O’Shea. Noise in gene expression: origins, consequences, and control. Science, 309(5743):2010–2013, 2005. doi: 10.1126/science.1105891.

      [9] Michael B. Elowitz, Arnold J. Levine, Eric D. Siggia, and Peter S. Swain. Stochastic gene expression in a single cell. Science, 297:1183– 1186, 2002. doi: 10.1126/science.1070919.

      [10] Peter S. Swain, Michael B. Elowitz, and Eric D. Siggia. Intrinsic and extrinsic contributions to stochasticity in gene expression. Proc Natl Acad Sci U S A., 99:12795–12800, 2002. doi: 10.1073/pnas.162041399.

      [11] Alex Cagan, Adrian Baez-Ortega, Natalia Brzozowska, Federico Abascal, Tim H. H. Coorens, Mathijs A. Sanders, Andrew R. J. Lawson, Luke M. R. Harvey, Shriram Bhosle, David Jones, Raul E. Alcantara, Timothy M. Butler, Yvette Hooks, Kirsty Roberts, Elizabeth Anderson, Sharna Lunn, Edmund Flach, Simon Spiro, Inez Januszczak, Ethan Wrigglesworth, Hannah Jenkins, Tilly Dallas, Nic Masters, Matthew W. Perkins, Robert Deaville, Megan Druce, Ruzhica Bogeska, Michael D. Milsom, Björn Neumann, Frank Gorman, Fernando Constantino-Casas, Laura Peachey, Diana Bochynska, Ewan St. John Smith, Moritz Gerstung, Peter J. Campbell, Elizabeth P. Murchison, Michael R. Stratton, and Iñigo Martincorena. Somatic mutation rates scale with lifespan across mammals. Nature, 604: 517–524, 2022. doi: 10.1038/s41586-022-04618-z.

      [12] Hamit Izgi, Dingding Han, Ulas Isildak, Shuyun Huang, Ece Kocabiyik, Philipp Khaitovich, Mehmet Somel, and Handan Melike Dönertas. Inter-tissue convergence of gene expression during ageing suggests age-related loss of tissue and cellular identity. eLife, 11, 2022. doi: 10.7554/eLife.68048.

      [13] Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10): P10008, oct 2008. doi: 10.1088/1742-5468/2008/10/p10008. URL https://doi.org/10.1088/ 1742-5468/2008/10/p10008.

      [14] V. A. Traag, L. Waltman, and N. J. van Eck. From louvain to leiden: guaranteeing well-connected communities. Scientific Reports, 9, 2019. doi: https://doi.org/10.1038/s41598-019-41695-z.

      [15] Peng Qiu. Embracing the dropouts in single-cell rna-seq analysis. Nature Communications, 11(1), 2020. doi: 10.1038/s41467-020-14976-9.

      [16] Ethan A. Sarnoski, Ruijie Song, Ege Ertekin, Noelle Koonce, and Murat Acar. Fundamental characteristics of single-cell aging in diploid yeast. iScience, 7:96–109, 2018. doi: 10.1016/j.isci.2018.08.011.

      [17] Ping Liu, Ruijie Song, Gregory L. Elison, Weilin Peng, and Murat Acar. Noise reduction as an emergent property of single-cell aging. Nature Communications, 8(1), 2017. doi: 10.1038/s41467-017-00752-9.

      [18] Jan Vijg. From dna damage to mutations: All roads lead to aging. Ageing Res Rev., 68(101316), 2021. doi: 10.1016/j.arr.2021.101316.

      [19] Yuancheng Lu, Benedikt Brommer, Xiao Tian, Anitha Krishnan, Margarita Meer, Chen Wang, Daniel L. Vera, Qiurui Zeng, Doudou Yu, Michael S. Bonkowski, Jae-Hyun Yang, Songlin Zhou, Emma M. Hoffmann, Margarete M. Karg, Michael B. Schultz, Alice E. Kane, Noah Davidsohn, Ekaterina Korobkina, Karolina Chwalek, Luis A. Rajman, George M. Church, Konrad Hochedlinger, Vadim N. Gladyshev, Steve Horvath, Morgan E. Levine, Meredith S. Gregory-Ksander, Bruce R. Ksander, Zhigang He, and David A. Sinclair. Reprogramming to recover youthful epigenetic information and restore vision. Nature, 588(7836):124–129, 2020. doi: 10.1038/s41586-020-2975-4.

      [20] Giorgio Oliviero, Sergey Kovalchuk, Adelina Rogowska-Wrzesinska, Veit Schwämmle, and Ole N. Jensen. Distinct and diverse chromatin proteomes of ageing mouse organs reveal protein signatures that correlate with physiological functions. eLife, 11(e73524), 2022. doi: 10.7554/eLife.73524.

      [21] Jingyi Li, Yuxuan Zheng, Pengze Yan, Moshi Song, Si Wang, Liang Sun, Zunpeng Liu, Shuai Ma, Juan Carlos Izpisua Belmonte, Piu Chan, Qi Zhou, Weiqi Zhang, Guang-Hui Liu, Fuchou Tang, and Jing Qu. A single-cell transcriptomic atlas of primate pancreatic islet aging. Natl Sci Rev., 8(2): nwaa127, 2020. doi: 10.1093/nsr/nwaa127.

      [22] Alexander R. Mendenhall, George M. Martin, Matt Kaeberlein, and Rozalyn M. Anderson. Cellto-cell variation in gene expression and the aging process. Geroscience, 43(1):181–196, 2021. doi: 10.1007/s11357-021-00339-9.

      [23] Lucy Ham, Marcel Jackson, and Michael PH Stumpf. Pathway dynamics can delineate the sources of transcriptional noise in gene expression. eLife, 10, 2021. doi: 10.7554/elife.69324.

    1. Author Response

      Reviewer #1 (Public Review):

      This work identifies distinct contribution of direct (D1+) and indirect (Adora+, D2+) amygdalostriatal medium spiny cells in fear learning and plasticity. The authors combined freely moving calcium imaging with auditory fear learning assay to reveal tone, foot-shock and behavior (movement)-evoked activity of the two MSN population. While D1+ cells show plastic changes driven by fear learning and reaching their maximum tone responsiveness (PSTH) at fear retrieval, Adore+ cells activation remained constant. Furthermore, using optogenetic silencing they showed that the two MSN groups differently contribute to retrieval of fear memory. Both cells receive topographically organized insular cortical inputs which go through learning-induced long-term synaptic changes with opposite direction: postsynaptic LTP at D1 cells, while presynaptic LTD at Adora+ cells. These synaptic changes provide some level of explanation for distinct behavioral contribution of the two cell types in fear learning.

      This study focuses on a so far neglected member of the 'extended' amygdalar circuitry, the amygdalostratal transition zone. The data is well-presented, the experiments are in logical order, built on each other and the paper is easy to read and follow.

      However, some information regarding the connectivity (and function) of Astr have been presented in recent and earlier papers are missing from, or contradicting with, the present work. One reason to explain these is that the targeted striatal regions vary between experiments, and so, it is difficult to judge when the Astr and when the other part of the caudal (tail) striatum is examined. As these striatal regions are involved in different neuronal networks, their functional consequences could also be distinct. Without precisely clarifying and consistently targeting the aimed striatal region, it is difficult to interpret the findings of the present study (though those are relevant and important).

      We thank this reviewer for his/her overall positive evaluation of our paper.

      We agree with the criticism that in the first submission, we have not stringently defined the anatomical region of the amygdala - striatal transition zone (AStria). After validating our previous data, and after performing new anatomical experiments studying the expression of Cre in the D1RCre and AdoraCre mouse lines used here (see Figure 1D; Figure 1 - figure supplement 1; Figure 3 - figure supplement 1), we now refer to the region targeted in our study as "ventral tail striatum" (vTS), as opposed to the more narrowly defined, and more ventrally located "AStria". Therefore, we have changed the word "AStria" to ventral tail striatum ("vTS") throughout the paper.

      We have also improved our introduction to the posterior striatum (p. 4 bottom, p. 5 top), and we briefly discuss the targeting of the vTS (as opposed to the AStria)(p. 19 top).

      Reviewer #2 (Public Review):

      Kintscher et al present a nice study on the responses of Adora2a and D1R expressing cells in the tail of the striatum/amygdala transition zone during auditory fear conditioning. Overall the conclusions are that (1) D1R cells show plasticity in activity patterns during the task, with the emergence of tone/movement co-modulated cells; (2) Adora2a cells show less of such changes; (3) gain of function of activity does little where (4) loss of function of activity in each cell class has moderate effects on the learned behavior (i.e. freezing to the CS). There is a nice section on rabies tracing which maps inputs to both cell types which then motivates an analysis of insular cortex inputs onto both cell types and reveals that (5) CS/US pairing alters insular inputs to both cell types.

      Overall the paper is well done and the conclusions are believable. Furthermore, this brain area is understudied yet potentially very important.

      The analysis of the fluorescence transients is heavy handed. This leads to potential for error and could obscure what appear to be large differences that could be extracted more easily. In some instances, the data are interpreted too optimistically, especially that the silencing experiments implicate plasticity of the neurons rather than the need for activity.

      We thank the reviewer for his/her positive evaluation of our paper. For the revision, we have re-analyzed the Ca-imaging data, and we have made changes in the text to avoid a too optimistic interpretation of our data.

    1. Author Response

      Reviewer #2 (Public Review):

      Wild and colleagues develop a barcoding approach, termed WILD-seq, that combines tumor cell barcoding with single cell transcriptional analysis to concurrently examine clonal tumor cell dynamics and cell state changes during drug treatment. They examine two triple-negative breast cancer (TNBC) cell lines in vivo in response to JQ1 and taxanes. Results from these experiments yield several meaningful conclusions. First, they demonstrate that clonal dynamics are fundamentally distinct depend ending on context and microenvironment, with significant differences observable between cell culture, NSG and immunocompetent mice. Second, they show that bulk expression in treatment refractory tumors represents clonal outgrowth of subpopulations in pretreatment tumors that bear gene expression patterns similar to the tumor relapsed. Finally, they identify mechanisms of in vivo taxman resistance, including EMT and high NRF2 expression - the latter yielding tumors that show collateral sensitivity to L-asparaginase and subsequent resistance mediated by high levels of asparagine synthetase.

      This study is a technical tour de force. The authors deeply engage the complexity of cell barcoding, bottle necking, Hamming analysis, single cell expression analysis and microenvironmental cell analysis. The idea that bulk tumor expression states demarcate drug resistant clonal populations in pre-treatment tumors, while not a new concept, finds critical validation using this approach. Moreover, the use of this approach to examine collateral sensitivity and to identify new strategies to target taxane resistance is compelling.

      I support this work but might suggest some comparisons of primary and relapse tumors, as well as the nature of the taxane collateral sensitivity, be further extended.

      Major comments:

      1) The authors suggest that the bulk expression analysis in relapsed tumors mirrors clonal populations in pretreatment tumors (which, while requiring barcoding to validate, somewhat obviates the need for barcoding to identify mechanisms of drug resistance). In cases like EMT, it has been argued that mesenchymal tumor cells survive therapy, but then undergo MET in the relapsed state. Thus, in the long term, tumors may revert to pre-treatment clonal states. It would be interesting to see whether that is the case here - and whether the informative nature of bulk gene expression in the drug resistant tumor is lost over time.

      This is an interesting point. We don’t have any direct evidence of any of the tumour cell lineages in our model undergoing EMT or MET from our work, but it is entirely possible that the tumour cells dynamically transition between states over longer time frames that we haven’t captured in our experiments to date. It is also possible that there are intermediate states that we have not captured by sampling at end-point. WILD-seq presents an excellent method for such studies but these are beyond the scope of the current paper.

      For such experiments, it would be essential to use barcoded cells to track clonal lineage, otherwise it is impossible to determine whether changes in the EMT of a tumour cell population was driven by a change in the transcriptome/cell state or a shift in clonal abundance. We have added discussion of these points to the discussion section of the manuscript.

      With respect to the necessity of barcoding for identifying treatment resistance mechanisms over bulk approaches, lineage-based analysis serves to prioritise pathways that change in the resistance setting that might otherwise be overlooked as being lower down the list of differential expression in bulk analysis. While not specifically addressed here, being able to differentiate between a pre-existing resistance phenotype or an adaptive mechanism of resistance, may also inform the choice of dosing schedule of agents targeting resistant clones.

      2) Collateral resistance can either refer to the outgrowth of clones that show enhanced sensitivity to distinct therapies or the therapeutic induction of cell states that respond differently to other drugs. To confirm that L-asparaginase sensitivity results from the specific outgrowth of NRF2 clones, it would be meaningful to show that these clones are lost upon L-asparaginase-only treatment and that pretreatment of L-asparaginase promotes long term efficacy of taxanes.

      We agree this is a critical question and one that we had already started to address while the manuscript was under review. The Nrf2-high clones are lowly represented in vehicle treated tumours and on the edge of our detection threshold, thus accurate measurements of their depletion by L-asparaginase-only treatment in tumours derived from our heterogeneous WILD-seq clonal pools is very challenging. To address this question, we have instead chosen to isolate individual resistant clones and directly test their response to L-asparaginase. We were able to isolate two of the Nrf2-high clones (751 and 1240) by growing up clones from single cells. After expansion in vitro, these were implanted as pure monoclonal populations and the resulting tumours treated with L-asparaginase. These new data, presented in Fig 7g, demonstrate that tumours derived from these clones (in contrast to tumours derived from our WILD-seq pools) significantly respond to L-asparaginase-only treatment, suggesting that this cell state is a pre-existing intrinsic property of these clones and not one induced by docetaxel treatment.

    1. Author Response

      Reviewer #1 (Public Review):

      It has been shown that selenium protects against the development of epilepsy, and behavioral comorbidities, as pointed out by the authors. This paper attempts to show it does if administered later after chronic seizures start. While clinically relevant, as noted by the authors, the paper seems not to be a major advance beyond the prior study. The antiseizure effect is also not very convincing because the effect size is so small and the variance so high. The data about behavior is more convincing but similar data were in the previous paper, so it is not very novel.

      Thank you for reviewing our paper. Previous work has shown that sodium selenate, not selenium, can delay the appearance of seizures and mitigate behavioural comorbidities if given immediately after the epileptogenic brain insult, but before the appearance of spontaneous recurring seizures (i.e. before epilepsy development), i.e. is anti-epileptogenic. The novelty of our current work is that we are treating once epilepsy develops, i.e. is disease-modifying. This is the first time a pharmacological agent has been shown to be disease-modifying in established epilepsy, resulting in an enduring reduction in seizures suppression even after treatment withdrawal, as well as to mitigate the behavioural comorbidities that commonly are co-morbid with chronic epilepsy. This is potentially ground-breaking new findings for the epilepsy field, as at present the only current disease-modifying therapy for established chronic epilepsy is epilepsy surgery.

    1. Author Response

      Reviewer #1 (Public Review):

      Building upon the previous evidence of activation of auditory cortex VIP interneurons in response to non-classical stimuli like reward and punishment, Szadai et al., extended the investigation to multiple cortical regions. Use of three-dimensional acousto-optical two-photon microscopy along with the 3D chessboard scanning method allowed high-speed signal acquisition from numerous VIP interneurons in a large brain volume. Additionally, activity of VIP interneurons in deep cortical regions was obtained using fiber photometry. With the help of these two imaging methods authors were able to extract and analyze the VIP cell signal from different cortical regions. Study of VIP interneuron activity during an auditory go-no-go task revealed that more than half of recorded cortical VIP interneurons were responding to both reward and punishment with high reliability. Fiber photometry data revealed similar observations; however, the temporal dynamics of reinforcement stimuli-related response in mPFC was slower than in the auditory cortex. The authors performed detailed analysis of individual cell activity dynamics, which revealed five categories of VIP cells based on their temporal profiles. Further, animals with higher performance on the discrimination task showed stronger VIP responses to 'go trials' possibly suggesting the role of VIP interneurons in discrimination learning. Authors found that reinforcement related response of VIP interneurons in visual cortex was not correlated with their sensory tuning, unveiling an interesting idea that VIP interneurons take part in both local as well as global processing. These observations bring attention to the possible involvement of VIP interneurons in reinforcement stimuli-associated global signaling that would regulate local connectivity and information processing leading to learning.

      The state-of-the-art imaging technique allowed authors to succeed in imaging VIP interneurons from several cortical regions. Advanced analyses revealed the nuances, similarities and differences in the VIP activity trend in various regions. The conclusions about reinforcement stimuli related activity of VIP interneurons made by the authors are well supported by the results obtained, however some claims and interpretations require more attention and clarification.

      We thank Reviewer #1 for the positive general comments.

      Reviewer #2 (Public Review):

      In recent years the activity of cortical VIP+ interneurons in relation to learning and sensory processing has raised great interest and has been intensely investigated. The ability of VIP+ interneurons in the auditory cortex to respond to both reward and punishment was already reported a few years ago by some of the authors (Pi et al., 2013, Nature). However, this work importantly adds to their previous study demonstrating a largely similar and synchronous response of a large fraction of these interneurons across the neocortex to salient stimuli of different valence during the performance of an auditory discrimination task.

      An additional strength of this study is the analysis and identification of the general pattern of VIP+ interneuron responses associated to specific behaviors in the different layers of the neocortex depth.

      Interestingly, the authors also identified using cluster analysis 5 different classes of VIP+ interneurons, based on the dynamic of their responses, that were unequally distributed in distinct cortical areas.

      This is a well performed study that took advantage of a cutting-edge imaging approach with high recording speed and good signal-to-noise ratio. Experiments are well performed and the data are properly analyzed and nicely illustrated. However, one shortcoming of this paper, in my opinion, is the "case report" structure of the data. Essentially for each neocortical area the activity of VIP+ interneurons was analyzed only in one animal. This limits the assessment of the stability of the response/recruitment of these interneurons. I appreciate the high number of recorded VIP+ interneurons per area/animal and I do understand that it would be excessively laborious to perform 3D random-access two-photon microscopy in several mice for each cortical area. On the other hand, it would be important to have some knowledge of the general variability of the responses of these neurons among animals.

      In conclusion, despite the findings described in this manuscript being generally sound, additional experiments are recommended to further substantiate the conclusions.

      Thank you for pointing out this potential misunderstanding. Although we mentioned the number of animals the recordings were obtained from (n=22 total), we repeated this multiple times to alleviate the potential confusion. The data recorded with the 2-photon microscope are from 16 animals, and fiber photometry was performed on a separate 6 animals. Each animal was recorded in one (14 mice) or two areas (8 mice, 2 AOD, 6 photometry). We aimed to acquire data from at least 3 recordings per area (4 in the primary somatosensory cortex, 6 in the primary and secondary motor cortices, 4 in the lateral and medial parietal cortices, 3 in the primary visual cortices, 6 in the auditory and medial prefrontal cortices). In the revised manuscript this information can be found at the beginning of the results section and in the figure legends:

      “To probe the behavioral function of VIP interneurons, we trained head-fixed mice (n=22 in total, n=16 for 2-photon microscopy and n=6 for fiber photometry) on a simple auditory discrimination task (Figure 1A).”

      “Among the 811 neurons imaged in 18 imaging sessions from 16 mice,”

      “Ca2+ responses of individual VIP interneurons recorded separately from 18 different cortical regions from 16 mice using fast 3D AO imaging were averaged for Hit (thick green), FA (thick red), Miss (dark blue), and CR (light blue). Fiber photometry data were recorded simultaneously from mPFC and ACx regions and are shown in gray boxes. Functional map (Kirkcaldie, 2012) used with the permission of the author. Speaker symbols represent the average time of tone onset, and gray triangles mark the reinforcement onset for Hit and FA. Averages of Miss and CR trials were aligned according to the expected reinforcement delivery calculated on the basis of the average reaction time. mPFC: medial prefrontal cortex (n=6 mice), ACx: auditory cortex (n=6), S1Hl/S1Tr/S1Bf/S1Sh: primary somatosensory cortex, hindlimb/trunk/barrel field/shoulder region (n=4), M1/M2: primary/secondary motor cortex (n=6), Mpta/Lpta: medial/lateral parietal cortex (n=4), V1: primary visual cortex (n=3).”

      “This approach allowed us to simultaneously measure bulk calcium-dependent signals from VIP interneurons located in the right medial prefrontal (mPFC) and left auditory cortices (ACx) by implanting two 400 µm optical fibers at these locations (n=6 sessions from n=6 mice, Figure 1–figure supplement 1C).”

      “Raster plot of the trial-to-trial activation of the responsive VIP neurons in Hit and FA trials during the two-photon imaging sessions (n=18 sessions, n=16 mice, n=746 cells).”

      Subregional labels, for example on Figure 2, should be considered as additional information to orient the readers, even if they were very precisely defined on the basis of the coordinates. All analyses considering regional differences were conducted on the level of the main functional areas of the dorsal cortex (motor, somatosensory, parietal, and visual). Despite some location-dependent heterogeneity in the late response phase (Figures 2G and H), even these main dorsal cortical regions were all similar from the perspective of responsiveness to reinforcers and auditory cues.

      Reviewer #3 (Public Review):

      In this study Szadai et al. show reliable, relatively synchronous activation of VIP neurons across different areas of dorsal cortex in response to reward and punishment of mice performing an auditory discrimination task. The authors use both a relatively fast 2 photon imaging, as well as fiber photometry for some deeper areas. They cluster neurons according to their temporal response profiles and show that these profiles differ across areas and cortical depths. Task performance, running behavior and arousal are all related to VIP response magnitude, as has been previously shown.

      Methodologically, this paper is strong: the described imaging technique allows for fairly fast sampling rates, they sample VIP cells from many different areas and the analyses are sophisticated and touch on the most relevant points. The figures are of high quality.

      However, as the manuscript is now, the presentation could be clearer, the methods more complete and it is not clear whether their conclusions are entirely supported by the data.

      The main issue is that reinforcement and arousal are hard to distinguish in this study. It is well known that VIP activity is correlated with arousal. And it is fairly clear that the reinforcement they use in this study - air puffs to the eye, as well as water rewards - cause arousal. It is possible that the reinforcer responses they observe in VIP neurons throughout all areas merely reflect the increases in arousal caused by these behaviorally salient events. They do discuss this caveat (albeit not fully convincingly) and in their abstract even state that the arousal state was not predictive of reinforcer responses. However their data clearly shows the tight relationship of the VIP reinforcer responses to both arousal (as measured by pupil diameter), as well as running speed of the animal. Both of these variables are well known to be tightly coupled to VIP activity.

      Although barely mentioned, the authors do appear to sometimes present uncued reward (Figure S2F). If responses were noticeably different from the same events in the task context (as actual reinforcers) this could at least hint towards the reinforcement signal being distinct from mere arousal. However, this data is only mentioned in one supplementary figure in a different context (comparison with PV cells) and neither directly compared to cued reward, nor is this discussed at all. Were uncued air puffs also presented? How do the responses compare to cued air puffs/punishment?

      Our original approach to distinguish between reinforcement- and arousal-related responses aimed:

      1) to show that VIP cells with both low and high correlation coefficients with arousal produce large signals upon reinforcement presentation (Figure 3B),

      2) the high differences of low and high arousal changes were reflected in a limited way in the VIP activity (Figures 3C and D): as highlighted in Figure R1, where we also added bars to show ∆P/P in high and low pupil change conditions, the difference in ∆P/P is ~5-fold, while it is only ~1.5-fold for ∆F/F. This disproportionality suggests that a large part of the signal below the dashed blue line is independent of arousal. We have added these modifications to the new version of Figure 3 for clarity.

      Figure R1 = Figure 3C-D with modification. Comparison of pupil changes and corresponding calcium averages.

      We collected further evidence to support our claims. In Figure 3–figure supplement 2 we depicted Hit and FA trials in which the reinforcement didn’t elevate the arousal level any further. Many of these trials were associated with locomotion prior to the reinforcement, but it was also common that the animals remained still during the whole trial. Trials with increased locomotion upon reinforcement presentation were excluded. Reinforcement-related calcium signals were still present under these conditions, indicating that these signals are not simple reflections of arousal. Moreover, we estimate the distinct contributions of arousal, locomotion, and reinforcers in Figure 3–figure supplement 2D in a systematic way with a generalized linear model. This model also confirmed our view about the reinforcement-related coding.

      We now say in the results:

      “Finally, to assess the motor- and reinforcement-related contributions to VIP interneuronal activity, we built a generalized linear model using the behavior and imaging data of the SS and Mtr recordings (Figure 3–figure supplement 2D, n=3 mice). This model was able to explain 18.8 ± 11.1% of the variance of the VIP population calcium signal, and highlighted that arousal was the best predictor, followed by reward, punishment, locomotion velocity, and auditory cue (weights = 0.055, 0.031, 0.028, 0.020, 0.018 respectively; all predictors, except the auditory cue in the case of one animal, contributed significantly, p<0.001). These observations indicate that running and arousal changes alone cannot fully explain the recruitment of VIP interneurons by reinforcers.”

      We apologize for not describing the rational and the result from the uncued reward experiments. Briefly, while recording reinforcement related signals in auditory cortex in our task, we realized that the cue delivery, and the resulting purely sensory response could alter the measurement of the reward-related responses. Hence, in order to disentangle the reward and sensory-related responses, we presented the animals with simple, uncued reward and observed a similar and robust recruitment of VIP interneurons. Based on the same rational, we made similar measurement for PV neurons.

      We now say in the results:

      “We did not further analyze the FA responses in auditory cortex as those responses also had a sensory component linked to the white noise-like sound created by the air puff delivery. Because the cue delivery could prove as a confound to measure reward-mediated responses from VIP interneurons in auditory cortex (see also methods), we delivered random reward in separate sessions. Water droplets delivery recruited VIP interneurons in both auditory and medial prefrontal cortex in a similar fashion as water delivery during the discrimination task (Figure 2–figure supplement 1G). Like our single cell results, PV-expressing neuronal population in ACx did not show any significant change in activity upon similar random reward delivery (Figure 2–figure supplement 1G).”

      Regarding the difference between cued and uncued responses, we definitely agree with the reviewer that it is an important point. The goal of this manuscript is however to study how reward and punishment are being represented by VIP interneurons in cortex.

      The imaging method appears well suited for their task, however the improvements listed in table S1 make the method appear far superior to existing methods in many aspects. Published or preprinted papers with 2 photon imaging of VIP populations (eg. from Scanziani lab (Keller et al.), Carandini lab (Dipoppa et al.), deVries lab (Millman et al.), Adesnik lab (Mossing et al.), which use the much more common resonant scanning, seem to be able to image 4-7 layers at 4-8Hz with a good enough SNR and potentially bigger neuronal yield of approximately 100-200 VIP cells, depending on the field of view. While not every single cell in a volume would be captured by these studies, the only main advantage of the here-used technique appears to be the superior temporal resolution.

      We thank the reviewer for the positive comment and we agree that interpretation must be improved. We agree that the imaging methods in the papers listed above have good SNR and were proper to address the scientific questions that had arisen. As the reviewer points out, 3D-AOD imaging allows fast 3D measurement that cannot be achieved otherwise. We used these advantages to address the critical question of layer specificity in the response of VIP interneurons to reinforcer presentation (Figure 2–figure supplement 1F, but see also the new Figure 1–figure supplement 1B). Regarding the comparison and quantification of the factual advantages of AOD microscopy over other imaging methods, the reviewer and readers can refer to the methods section (3D AO microscopy), Table S1 and Szalay et al., 2016. We agree with the reviewer that one of the main advantages is the superior temporal resolution. The second main advantage is the improved SNR. This originates from the fact that the entire measurement time is spent on regions of interest; measurement of unnecessary background areas is not required. More specifically, SNR is improved even in the case of 2D imaging by the factor of:

      ((area of the entire frame )/(area of the recorded VIP cells))^0.5

      which is about (100)0.5=10 as VIP interneurons represent about 1% of the brain. We used this second advantage of AO scanning when we determined the activation ratio (e.g., see Figure 2D).

      As the resolution of single or a few action potentials is challenging in behaving mice labelled with the GCaMP6 sensor, any improvement in SNR will improve the detection threshold. The higher SNR achieved here improved the detection threshold, which also explains the relatively high activation ratio in our work.

      In the case of asynchronous activity patterns, there is negligible contribution of individual small neuropil structures to somatic activities because of the relatively high volume-ratio of a soma and a given small neuropil structure: this minimizes the error during ∆F/F calculation of somatic responses. However, reinforcement, arousal, and running can generate highly synchronous neuronal activities which can synchronize neuropil activity around a given soma and, therefore, effectively and systematically modulating the somatic ∆F/F responses. To avoid this error, we used a high NA objective with proper neuropil resolution and combined it with motion correction. The use of the high NA also decreased the total scanning volume to about 689 µm × 639 µm × 580 µm and, therefore, it limited the maximum number of VIP cells which could be recorded. It is also possible to use a low-NA objective with a much higher FOV and scanning volume and record over 1000 VIP cells, but the extension of the PSF along the z dimension is inversely and quadratically proportional to the NA of the objective, therefore neuropil resolution will be at least partially lost. In summary, using the high-NA Olympus objective we maximized the 2P resolution which, in combination with off-line motion artifact elimination, allowed precise recording of somatic signals without any neuropil contamination: this provided correct activation ratio values.

      Even though this is not mentioned at all, it certainly appears possible, that the accousto-optical scanning emits audible noise. In this case it would be good to know the frequency range and level of this background noise, whether there are auditory responses to the scanning itself and if it interferes with the performance of the animals in the auditory task in any way. If this is not the case, this should probably simply be mentioned for non-experts.

      While the name of the acousto-optical deflectors seems to refer to “acoustic noise”, these devices are driven in the range of 55-120 MHz, which is 3 orders of magnitude higher frequency than the hearing threshold of animals: mice don’t hear them. Moreover, we developed water-cooled AODs ten years ago which means that ventilators are also not required, therefore AOD-based scanning can be used with zero noise emission. In contrast, galvo, resonant, and piezo scanning work in the kHz frequency range, which is in the middle of the hearing range of mice. Moreover, these technologies can’t be used in a vacuum and the scanner is just a few tens of centimeters away from the mice, which means that acoustic noise can’t be canceled but can only be partially suppressed with white noise. We thank the reviewer for the helpful comment and have added one sentence about the absence of acoustic noise during acousto-optical scanning:

      “The deflectors are driven in the 55-120 MHz frequency range, therefore the noise emitted does not interfere with the auditory cues, as mice can’t hear it. This, in combination with the water cooling of the deflectors, makes the AOD-based scanning the quietest technology for in-vivo imaging.”

      The authors show a strong correlation between task performance (hit rate) and the response to the auditory cue on hit trials. Was there any other significant correlations of VIP cells' responses to other trial types? Was reinforcer response correlated to behavioral variables at all?

      We have not found any remarkable correlations between VIP cell activity and behavioral variables except the one mentioned above.

      For example, we tested discrimination rate (hit rate/FA rate) correlation with ∆F/Ftone in Hit trials, but this was not significant (R2=0.03, F=0.49, p=0.69), just like Hit rate vs. ∆F/Ftone in FA trials (R2=0.19, F=3.8, p=0.07), and discrimination rate vs. ∆F/Ftone in FA trials (R2=0.07, F=1.1, p=0.31).

    1. Author Response

      Reviewer #1 (Public Review):

      “Even though the methodology was already introduced, it should be described in some detail. Most importantly, AlphAfold's measures of accuracy have been part of the loss function during training/testing. What about the measure of protein-protein interaction accuracy? Was it also in the loss function?”

      We thank the reviewer for this insightful comment. The metrics used for evaluating predicted structure quality, such as the predicted local distance difference test (pLDDT) score and predicted TM score (pTM), both proposed in the AlphaFold 2 publication (Ref. 27), and the interface score (iScore) proposed in the AF2Complex publication (Ref. 23), are not explicitly employed as the loss function in training the main deep learning model for structure prediction. Instead, the main loss function of AF2 is the Frame Aligned Point Error (FAPE) loss, which measures the errors in the predicted atomic coordinates within local coordinate frames spanned by vectors connecting backbone heavy atoms of individual protein residues. However, this FAPE loss function is very much relevant to predicting TM-scores or iScores; both are derived from an additional module that predicts alignment errors (PAEs) viewed from each residue’s local frame. The training of this PAE module was done separately as described in the AF2 publication (Ref. 27). According to DeepMind, the training of the deep learning models for AlphaFold-Multimer (Ref. 25, AF version 2.2.0 and above) has relatively minor changes in the loss function; changes were made mainly to reduce severe clashes, which were not uncommon in modeling large complexes by earlier versions of AF2.

      We added in the Methods section, line 337,

      “The iScore metric was derived from the predicted alignment errors that gives an estimated distance for interface residue j from its position in the experimental structure, as viewed from a local frame of residue interface residue i [23,27]. To better estimate confidence, the contribution of each interface residue to the interface score is calculated using local frames not located within the same protein chain, i.e., residue i and j belonging to different chains.”

      “Figure 1a (upper panel, PpiD) includes quite a few promising hits but only the first, third, and 12th were considered. How were these chosen? For example, why not consider the second? The lower panel (YfgM) also shows many promising hits but only the first was chosen. Why not more? Likewise, only two of the top hits in Figure 4 are considered. What about the rest? For example, why taking into account the second best hit while skipping the first?”

      These are important questions about similar issues raised by all three reviewers, i.e., R2.1 by reviewer 2 and R3.2 by reviewer 3. We emphasize that our approach predicts physical interactions between proteins, not the biological consequence of such interactions. However, since the most interesting predictions are the ones relevant to biological functions, about which the computational method cannot make a judgement, given the space limitations of the manuscript, we opted to select from the top predictions those that likely provide mechanistic insights into biological function, for example, those that might inspire new hypotheses about molecular mechanisms. In practice, our selection process was guided by existing literature and experimental evidence. Since such information is limited, we can only focus on the very few ones with both strong computational and experimental evidence. Most top predictions, including the ones the reviewers questioned, were not pursued further because we cannot at present say anything about the functional consequences of these predicted interactions, even though they may interact physically. One main contribution of this computational screening approach is to provide short lists that accelerate the search for functionally important protein-protein interactions. Thus, in this contribution, we provide some examples found in the top 20 hits ranked from ~1500 possible pairs for a given query protein.

      In this revision, we added from line 85,

      “Note that our computational predictions are about physical interactions between a pair of proteins subjected to screening, not about their biological roles even if they are predicted to interact physically. Moreover, the predicted physical interactions may not be relevant in the cellular environment due to various factors not considered in modeling, e.g., competition from other proteins with stronger binding affinities, post-translational modifications, etc. Thus, it is possible that many protein-protein interactions predicted by this pipeline do not necessarily have biological relevance. Nevertheless, since cognate protein-protein interactions required by their functions are more likely to be detected than randomly selected proteins, biologically interesting protein-protein interactions are enriched at the top of the screening results ranked by iScore. Thus, the screening procedure may provide valuable even critical clues for subsequent investigation. In this study, assisted by existing experimental evidence, we select from high confidence computational predictions those most likely to have significant biological implications, and then predict the structures of larger complexes if more than two proteins are involved according to our predictions or based on literature information. The interactions that we ignored are either of unknown biological significance, physically interacting but biologically irrelevant, or simply false positives.”

      “Authors argue that the unstructured part of OmpA, which wraps around SurA, is to be trusted, which may be the case. But a more likely explanation is that it is an artefact, in agreement with the very low confidence assigned by AlphaFold.”

      While we do not disagree that the structure prediction about SurA/OmpA complex may contain artifacts, there are several reasons why our predications may be insightful, as we explained in the manuscript. First, it is well-known in experimental studies (references 41, 42, 45) that the SurA/OmpA complex is very dynamic and unlikely to possess a stable structural complex as in a typical crystal structure. As such, the low confidence score by AF2Complex is expected, as it reflects uncertainty due to the existence of many possible conformations. Second, it makes physical sense to have loose wrapping of OmpA around SurA, as it reduces the energetic costs to dissociate OmpA from SurA when SurA approaches BAM for its delivery. Our point is a qualitative assessment, rather than claiming a specific complex model as in a typical structure prediction scenario. To be cautious as the reviewer suggested, we added a sentence in the Discussion, from line 309,

      “Despite the low confidence due to weak interactions, the predicted structures delineate a picture for how SurA prevents OmpA from aggregating. Moreover, since it transports OmpA with a relatively small number of intermolecular contacts, the free energy required to dissociate OmpA from SurA is small. Notwithstanding these considerations, we caution that artifacts likely exist in these predicted structural models.”

      “Figure 5. How is (does) this predicted structure compare with the known structure of the complex? In particular, how similar are the predicted and known structures of the individual subunits, and how similar are the predicted docking poses to the known ones?”

      The BAM complex has been studied extensively, with over one hundred experimental structures of its individual subunits or the full complex. Therefore, a thorough structural comparison is a subject of a review beyond the scope of this study. In our computational models, the structures of the individual subunits or of the full BAM complex closely mimic their known experimental structures, which is expected because some of these structures were likely employed in the training of deep learning models and/or structure predictions. We added a comparison to the highest resolution crystal structure in the revised manuscript after line 225,

      “Because BAM has been extensively studied structurally [7,47], we focus on describing its interaction with SurA, though the predicted BAM complex model closely mimics a known crystal structure of the complex determined at 2.9 Å resolution (PDB 5D0O, [48]). The alignment of the two complex structures yields a very high TM-score of 0.94.”

      “Authors should make the results easily accessible to all. Maybe as Cytoscape and CyToStruct sessions for easy visualization.”

      Cytoscape and the add-on CytoStruct are very useful tools to visualize large networks. In our case, however, we are presenting only a handful of complexes, not a massive protein-protein interaction network like those resulting from all-against-all screening at proteome-scale. A diagram such as Fig. 7 is sufficient for our visualization purposes. Moreover, we provide the atomic coordinates in the standard PDB format for readers who wish to examine the respective structures in detail. In the future, if we have opportunity to expand PPI screening to a large number of targets, Cytoscape and add-ons will be handy to display the resulting gigantic network.

      “Finally, AlphaFold was trained and tested mostly with water-soluble protein. Thus, application to outer membrane proteins is a bit risky. Maybe authors can comment on this.”

      While it is true that most experimental structures used for training AlphaFold models are of water-soluble proteins, there are also structures of many membrane proteins available for training, as over 10,000 structures of membrane-proteins were already deposited in the Protein Data Bank, though there are redundancy within these structures and some domains are outside the transmembrane regions. These structures are likely sufficient for machine-learning approaches such as AlphaFold 2 to learn the sequence and structural patterns unique to transmembrane proteins. This view is supported by our empirical experience, because transmembrane regions of membrane proteins are typically among those with high confidence scores, e.g., complex models for a transmembrane molecular system CcmI presented in our AF2Complex work (Ref. 23). And one of these computational models (of CcmA2B2CD) was just confirmed to have high quality by cyro-EM models (Li et.al., Nature Communications 13:6422, 2022) at TM-score 0.89. We note that this was a non-trivial prediction as this structure was not present in the PDB and was long sought by the experimentalists. The view also agrees with the conclusion of a recent published study on AF2 models of transmembrane proteins (Hegedűs, et. al. Cell. Mol. Life Sci. 79:73, 2022).

    1. Author Response

      Reviewer #1 (Public Review):

      The authors present a study of figure-ground segregation in different species. Figure-ground segregation is an important mechanism for the establishment of an accurate 3D model of the environment. The authors examine whether figure-ground segregation occurs in mice in a similar manner to that reported in primates and compare results to two other species (Tree shrews and mouse lemurs). They use both behavioral measures and electrophysiology/twophoton imaging to show that mice and tree shrews do not use opponent motion signals to segregate the visual scene into objects and background whereas mouse lemurs and macaque monkeys do. This information is of great importance for understanding to what extent the rodent visual system is a good model for primate vision and the use of multiple species is highly revealing for understanding the development of figure-ground segregation through evolution.

      The behavioral data is of high quality. I would add one caveat: it seems unfair to report that the tree shrews could not generalize the opponent motion stimulus as it seems they struggled to learn it in the first place. Their performance was below 60% on the training data and they weren't trained for many sessions in comparison to the mice. Perhaps with more training the tree-shrews might have attained higher performance on the textures and this would allow a more sensitive test of generalization. The authors should qualify their statements about the treeshrews to reflect this issue.

      The reviewer is correct in this assertion. For context, we performed the mouse experiments first and were hoping to see texture-invariant performance but instead realized that the mice were resorting to memorizing patterns. With this in mind, when expanding to treeshrews we wanted to prevent this type of learning to really test whether texture invariant recognition was possible, thus we increased the number of orientations tested to 5, resulting in 10 possible textures that would have to be memorized in contrast to the 4 that had to be memorized for the mice. We now clarify this in the text:

      “We reversed the number of train/test patterns compared to what was used for the mice (Fig. 2i1) because we reasoned that animals might be more likely to generalize if given more patterns for training. We had performed the mouse experiments initially, noticed the memorization approach, and were trying to avoid this behavior in treeshrews. This also means that the naturalistic train condition presented to treeshrews was harder than that for mice (5 orientations for treeshrews vs. 2 orientations for mice in the training set).”

      Reviewer #2 (Public Review):

      Luongo et al. investigated the behavioural ability of 4 different species (macaque, mouse lemur, tree shrew and mouse) to segment figures defined by opponent motion, as well as different visual features from the background. With carefully designed experiments they convincingly make the point that figures that are not defined by textural elements (orientation or phase offsets, thus visible in a still frame) but purely by motion contrast, could not be detected by nonprimate species. Interestingly it appears to be particularly motion contrast, since pure motion - figures moving on a static background - could be discriminated better, at least by mice. This is highly interesting and surprising -- especially for a tree shrew, a diurnal, arboreal mammal, very closely related to primates and with a highly evolved visual system. It is also an important difference to take into account considering the multitude of studies on the mouse visual system in recent years.

      The authors additionally present neuronal activity in mice, from three different visual cortical areas recorded with both electrophysiology and imaging. Their conclusions are mostly supported by the data, but some aspects of the recordings and data analysis need to be clarified and extended.

      The main issues are outlined below roughly in order of importance:

      1) The most worrying aspect is that, if I interpret their figures correctly, their recordings seem not very stable and this may account for many of the differences across the visual conditions. The authors do not report in which order the different stimuli were shown, their supplemental movie, however, makes it seem as though they were not recorded fully interleaved, but potentially in a block design with all cross1 positions recorded first, before switching to cross2 positions and then on to iso... If I interpret Figure 6a correctly, each line is the same neuron and the gray scale shows the average response rate for each condition. Many of these neurons, however, show a large change in activity between the cross1 and the cross2 block. Much larger than the variability within each block that should be due to figure location and orientation tuning. If this interpretation is correct, this would mean that either there were significant brain state changes (they do have the mice on a ball but don't report whether and how much the animals were moving) between the blocks or their recordings could be unstable in time. It would be good to know whether similar dramatic changes in overall activity level occur between the blocks also in their imaging data.

      The same might be true for differences in the maps between conditions in figure 4. If indeed the recordings were in blocks and some cells stopped responding, this could explain the low map similarities. For example Cell 1 for the cross stimuli seems to be a simple ON cell, almost like their idealized cell in 3d. However, even though the exact texture in the RF and large parts of the surround for a large part of the locations is exactly identical for Cross1 and Iso2, as well as Cross2 and Iso1, the cells responses for both iso conditions appear to only be noise, or at least extremely noise dominated. Why would the cell not respond in a phase or luminance dependent manner here?

      This could either be due to very high surround suppression in the iso condition (which cannot be judged within condition normalization) or because the cell simply responded much weaker due to recording instability or brain state changes. Without any evidence of significant visual responses, enough spikes in each condition and a stable recording across all blocks, this data is not really interpretable. Instability or generally lower firing rates could easily also explain differences in their decoding accuracy.

      Similarly, it is very hard to judge the quality of their imaging data. They show no example field of views or calcium response traces and never directly compare this data to their electrophysiology data. It is mentioned that the imaging data is noisy and qualitatively similar, but some quantification could help convince the reader. Even if noisy, it is puzzling that the decoding accuracy should be so much worse with the imaging data: Even with ten times more included neurons, accuracy still does not even reach 30% of that of the ephys data. This could point to very poor data quality.

      We address the issue of stability of selectivity in our response to all reviewers above. Note that we wavered on whether to include the imaging data at all given the much better decoding accuracies from the electrophysiology data, and decided to include it for two main reasons:

      1) It qualitatively gives a very similar result, namely that there is a texture-dependent ability to resolve the position of given figures, suggesting that the rodent visual system is indeed better equipped at representing figure locations for the cross and iso stimuli than that nat stimulus.

      2) The correspondence on subsequent days between single cells and their corresponding spatial preference responses suggests that this is a stable and consistent preference represented by these neurons.

      The following verbiage has been added to the methods section

      Matching cells across days. Cells were tracked across days by first re-targeting to the same plane by eye such that the mean fluorescence image on a given day was matched to that on the previous day, with online visual feedback provided by a custom software plugin for Scanbox. […] This result points to the consistency of the spatial responses in the visual cortex as a substrate for inferring figure position.

      2) There is no information on the recorded units given. Were they spike sorted? Did they try to distinguish fast spiking and regular spiking units? What layers were they recorded from? It is well known that there are large laminar differences in the strength of figure ground modulation, as well as orientation tuned surround suppression. If most of their data would be from layer 5, perhaps a lack of clear figure modulation might not be that surprising. This could perhaps also be seen when comparing their electrophysiology data to the imaging data which is reportedly from layer 2/3, where most neurons show larger figure modulation/tuned surround suppression effects. There is, however, no report or discussion of differences in modulation between recording modalities.

      We used Kilosort (Pachitariu et al., 2016) for spike sorting of the data. The output of the automatic template-matching algorithm from Kilosort was visualized on Phy and then curated manually.

      We did not compute current source density. The 64 contacts on our probe spanned 1 mm, so we recorded cells throughout all layers of cortex. We didn’t focus on specific layer, as we didn’t find strong modulation by figure/ground or border ownership in any of our cells. We did not distinguish the fast and regular spike units.

      3) There is an apparent discrepancy between Figure 5d and i. How can their modulation index be around -0.1 for cross (Figure 5d) - which would correspond to on average ~20% weaker responses to a figure than to background, when their PSTH (5i) shows an almost 50% increase of figure over ground. This positive figure modulation has also been widely reported in the literature (Schnabel, Kirchberger, Keller). Are there different populations of cells going into these analyses?

      There was a mismatch in cells for plotting the F/G modulation index and time-course, since we previously set different criteria. Now we used the same criteria and replotted Figure 5d, e, g, h.

      4) In a similar vein, it is not immediately clear why the average map correlation would be bigger for random cell pairs (~0.2, Fig 3g) than for the different conditions of the same cell (~0, Fig 5b). Could this be due to differences in recording modality (imaging in 3g and ephys in 5b)?

      We suspect the reviewer is correct, namely, that the difference in recording modality accounts for these differences. The spatial mixing of signals inherent to calcium imaging can be problematic for the study of these figure ground and border ownership signals. Thus, it can be assumed that the non-zero mean observed in Fig 3g, is likely due to neuropil contamination, whereas Fig. 5 is purely ephys data and thus has no such confounds.

      5) The maps in Figure 4 should show the location of the RF, because they cannot be interpreted without knowledge of the RF center and size. For example cell 4 in the iso 1 condition could be a border cell, or could respond to the center of the figure. It is impossible to deduce without knowledge of the location of the RF.

      We have added the following clarification to the figure legend for Fig. 4a:

      “Overlaid on these example stimuli are grids representing the 128 possible figure positions and a green ellipse representing the ON receptive field. Note that this receptive field is the Gaussian fit from the sparse noise experiment.”

      We have also added the following clarification to the figure legend for Fig. 4b:

      “Please note that for all of these experiments the population receptive field was centered on the grid of positions.”

      6) It could help the reader to discuss the interpretation of the map correlations in Fig 5 a and b in more detail. My guess is that negatively correlated maps (within cross or iso condition) could come from highly orientation tuned neurons, whereas higher correlation values point to more generally figure/contextually modulated cells (within this condition). While the distribution is far from bimodal, this does not rule out a population of nicely figured modulated cells at the high end of the distribution. It might not be necessary at the level of V1 that the figure modulation be consistent across all textures. It would not be surprising, if orientation contrast-defined, phase contrast-defined and motion contrast-defined figures could be signalled to higher areas by discrete populations of V1 or even LM cells.

      We agree the reviewer’s interpretation of the neural findings is possible. But at least from the behavior, it seems unlikely that a motion contrast-defined figure is generated anywhere in the rodent brain.

      7) Some of the behavioural results warrant a little more explanation or discussion, as well. In Figure 2h, the mice seem significantly better on the static version of the iso task, than on the moving one. If statistically significant, this should be discussed. Is this because the static frame was maximally phase offset? Then the figure would indeed be better visible better (bigger phase contrast in more frames) than in the moving condition.

      Yes, indeed, in Figure 2h, the static frame was chosen with maximal positional displacement, and thus the figure can likely be seen better. We have added this clarification to the figure legend for Fig. 2h.

      Figure 2 and extended Figure 1c: why is the mouse lemur performing so poorly on average? It also appears to have biggest problems with the cross stimulus early on in training.

      The behavior experiments in the mouse lemur were carried out under an international collaboration and with substantially less exploratory experiments than was done for mouse, treeshrew, and macaque. For the mouse lemur, we simply went with a training regimen that we knew had worked efficiently for treeshrews and without any optimization of the procedure. Thus we would caution against over-interpreting the exact learning rates of the mouse lemurs and instead focus on the qualitative result that they could generalize for the Nat condition. This was a marked departure from the rodents and shrews and is the main finding we would like to convey. We suspect that with future optimizations of behavior shaping, training times and performances could likely both be improved.

      Tree shrews seem not to be able to memorize the textures as well as the mice do. Is this because of less deprivation/motivation? Or because of the bigger set of textures in training? This would make memorization harder and could thus lower their overall performance. The comparative aspects are very interesting but the absolute differences in performance could be discussed in more detail or explained better.

      Reviewer 1 raised a similar concern, please see our response above

      8) In Figure 7b, why wouldn't the explanation for the linear decodability in cross also hold for iso? There are phase offsets at the borders that simple cells should readily be able to resolve, just as in the case of orientation discontinuities. Could they make a surround phase model, similar to their surround orientation model, that could more readily capture the iso discontinuities?

      The reviewer is likely correct in their assertion that one could consider further hand tuning the model to account for the observed diversity in responses (namely, Cross > Iso > Nat for figure position decoding). We went directly to a DNN to model the data, since we thought this would be more powerful, given that the DNN features were not tuned to explain our neural data per se.

    1. Author Response

      Reviewer #1 (Public Review):

      This study used a multidimensional stimulus-response mapping task to determine how monkeys learn and update complex rules. The subjects had to use either the color or shape of a compound stimulus as the discriminative dimension that instructed them to select a target in different spatial locations on the task screen. Learning occurred across cued block shifts when an old mapping became irrelevant and a new rule had to be discovered. Because potential target locations associated with each rule were grouped into two sets that alternated, and only a subset of possible mapping between stimulus dimensions and response sets were used, the monkeys could discover information about the task structure to guide their block-by-block learning. By comparing behavioral models that assume incremental learning, quantified by Q-learning, Bayesian inference, or a combination, the authors show evidence for a hybrid strategy in which animals use inference to change among response sets (axes), and incremental learning to acquire new mappings within these sets.

      Overall, I think the study is thorough and compelling. The task is cleverly designed, the modeling is rigorous, and the manuscript is clear and well-written. Importantly there are large enough distinctions in the behavior generated by different models to make the authors' conclusions convincing. They make a strong case that animals can adopt mixed inference/updating strategies to solve a rule-based task. My only minor question is about the degree to which this result generalizes beyond the particulars of this task.

      Thanks for these kind comments. Regarding generalization, we agree with the reviewer and did not intend to make any claim about how the particular result generalizes beyond this task. Indeed, the specific result could depend on the training protocol even within the same task. We now discuss this explicitly in the manuscript, lines 800-810. However, we do take the view that even if the way the monkey’s behavior played out in this setting is a lucky accident, that may still reveal something fundamental about learning processes in the brain.

      Reviewer #2 (Public Review):

      The authors trained two monkeys to perform a task that involved sequential (blocked) but unsignalled rules for discriminating the colour and shape of visual stimulus, by responding with a saccade to one of four locations. In rules 1 and 3, the monkeys made shape (rule 1) or colour (rule 3) discriminations using the same response targets (upper left / lower right). In rule 2, the monkeys made colour judgments using a unique response axis (lower left/upper right). The authors report behaviour, with a focus on time to relearn the rules after an (unsignalled) switch for each rule, discrimination sensitivity for partially ambiguous stimuli, and the effect of congruency. They compare the ability of models based on Q-learning, Bayesian inference, and a hybrid to capture the results.

      The two major behavioural observations are (1) that monkeys re-learn faster following a switch to rule 2 (which occurs on 50% of blocks and involves a unique response axis), and (2) that monkeys are more sensitive to partially ambiguous stimuli when the response axis is unique, even for a matched feature (colour). These data are presented clearly and convincingly and, as far as I can tell, they are analysed appropriately. The former finding is not very surprising as rule 2 occurs most frequently and follows each instance of rule 1 or 3 (which is why the ideal observer model successfully predicts that the monkeys will switch by default to rule 2 following an error on rules 1 or 3) but it is nevertheless reassuring that this behaviour is observed in the animals. It additionally clearly confirms that monkeys track the latent state that denotes an uncued rule.

      The latter finding is more interesting and seems to have two potential explanations: (i) sensitivity is enhanced on rule 2 because it is occurs more frequently; (ii) sensitivity is enhanced on rule 2 because it has a unique response axis (and thus involves less resource sharing/conflict in the output pathway).

      The authors do not directly distinguish between these hypotheses per se but their modelling exercise shows that both results (and some additional constraints) can be captured by a hybrid model that combines Bayesian inference and Q learning, but not by models based on either principle alone. A Q-learning model fails to capture the latent state inference and/or the rule 2 advantage. The Bayesian inference model captures the rapid switches to rule 2 (which are more probable following errors on rule 1 and rule 3) but predicts matched discrimination performance for partially ambiguous stimuli on colour rules 2 and 3. This is because although knowing the most likely rule increases the probability of a correct response overall it does not increase discriminability and thus boosts the more ambiguous stimuli. I wondered whether it might be possible to explain this result with the addition of an attention-like mechanism that depends on the top-down inference about the rule. For example, greater certainty about the rule might increase the gain of discrimination (psychometric slope) in a more general way.

      We agree with the reviewer that our logic in ruling out pure inference models assumes that other factors affecting performance, like attention or motivation, are equivalent between blocks. In principle, if there were large and sustained differences in these factors between Rule 2 vs Rule 1 or 3 blocks, that might offer a different explanation for the effect. We now mention this caveat in the manuscript. In terms of actually leveraging this into a full account of the behavior, we are not quite sure how to instantiate the reviewer’s particular idea why this would be the case, however, since (as as we show in Fig. 3a,b,c, and Fig. S4a,b,c) the difference in psychometric slopes lasts at least 200 trials into the rule, even when (in the hybrid learning model) the feature weights have converged (Figure 4 – figure supplement 2). It’s hard to see why elevated uncertainty about the rule would persist this long in anything resembling an informed ideal observer model.

      The authors propose a hybrid model in which there is an implicit assumption that the response axis defines the rule. The model infers the latent state like an ideal observer but learns the stimulus-response mappings by trial and error. This means that the monkeys are obliged to constantly re-learn the response mappings along the shared response axis (for rules 1/3) but they remain fixed for rule 2 because it has a unique response axis. This model can capture the two major effects, and for free captures the relative performance on congruent and incongruent trials (those trials where the required action is the same, or different, for given stimuli across rules) on different blocks.

      I found the author's account to be plausible but it seemed like there might be other possible explanations for the findings. In particular, having read the paper I remained unclear as to whether it was the sharing of response axis per se that drove the cost on rule 3 relative to 2, or whether it was only because of the assumption that response axis = rule that was built into the authors' hybrid model. It would have been interesting to know, for example, whether a similar advantage for ambiguous stimuli on rule 2 occurred under circumstances where the rule blocks occured randomly and with equal frequency (i.e. where there was response axis sharing but no higher probability); or even whether, if the rule was explicitly signalled from trial to trial, the rule 2 advantage would persist in the absence of any latent state inference at all (this seems plausible; one pointer for theories of resource sharing is this recent review: https://www.cell.com/trends/cognitive-sciences/fulltext/S1364-6613(21)00148-0?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS1364661321001480%3Fshowall%3Dtrue). No doubt these questions are beyond the scope of the current project but nevertheless it felt to me that the authors' model remained a bit tentative for the moment.

      Thanks for these interesting thoughts. It is true that the imbalanced pattern of sharing (of response axes, and actually also features) across the three rules has important consequences for learning/inference under our model (and indeed other latent state inference models such as the informed ideal observer). It is an intriguing idea that these features of the design might cause interference even per se, for instance even without the need to do inference or learning because the rules are fully signaled. We agree this (and the other case the reviewer mentioned) is an interesting direction for future work. We have added this in the discussion, line 800-812.

    1. Author Response

      Reviewer #1 (Public Review):

      In order to study odor response dynamics in the olfactory peripheral organ, Kim et al. employs extracellular sensillum recording from the locust antenna to a set of 4 odors at different concentrations. Using spike sorting to assign odor responses to single olfactory sensory neurons (OSNs), the authors demonstrate that OSNs exhibit four distinct response motifs comprising two types of excitation, namely fast and delayed excitatory responses, as well as inhibitory responses in form of offset responses and inhibition. Notably, OSNs can switch between these four motifs depending on the odor applied. This finding is highly interesting and facilitates odor classification as demonstrated by computational modeling in this study. Furthermore, the authors demonstrate that each response motifs follows different adaptation profiles which further results in an increased coding space. The authors conclude and provide evidence with their model that the experimentally observed response dynamics also facilitate determining the distance to the odor source. The obtained results are novel and demonstrate a new dimension of odor response properties at the peripheral level. However, given that the authors used a very limited set of chemically similar odors and considering that the broad tuning and wiring of OSNs in the locust is special and follows different rules compared to the olfactory circuitry of OSNs in other insects (i.e. locust OSNs do not converge onto a single glomerulus but target multiple glomeruli), I wonder whether the observed distinct response motifs are a general phenomenon or a rather special case. I therefore recommend that the authors discuss their findings in the light of these key issues before general conclusions with regard to odor coding rules is being drawn. Do these response motifs also occur for highly ecologically relevant odors, such as PAN, where a rather specialized olfactory circuit would be assumed? Hence, the MS would benefit if those questions would be addressed as well. In addition, the computational modeling approach is written in specialized terms and is therefore difficult to grasp for readers lacking modeling expertise.

      We thank the reviewer for this very positive and helpful assessment of our work. We agree with suggestions to expand our discussion of (1) olfactory circuitry following OSNs and of (2) responses to highly ecologically relevant odors. We have also extensively revised the description of our computational modeling approach to make it understandable to non-specialists.

      In brief:

      (1) The results we present here address only peripheral activity – we do not record or model responses of follower neurons. Because our conclusions do not depend to any extent upon the architecture of the locust's olfactory system, we would prefer to limit necessarily speculative discussion or analyses of these factors. We agree these factors provide interesting context for our work, so we have now expanded our discussion to include: “In other species, how exactly ORN response patterns are utilized downstream may depend on species-specific variations in connectivity between ORNs and the antennal lobe and its glomeruli” (lines 490-492). More investigation is needed to address this important question. Nevertheless, our study shows ORN response motifs provide useful information, and conveying this information to downstream circuits augments coding space.

      (2) We share the reviewers’ concern that our odor set should include ecologically particularly relevant odors. Indeed, it was for this reason that our odor set includes components of the locust diet, wheat grass: 1-Octanol, 1-Hexanol, and Cyclohexanol. As above, though, we are reluctant to speculate on the responses of downstream circuits. But to acknowledge the reviewer’s important point, we have added the following text to our discussion in lines 401-405: “For these studies we used odorants known to be ecologically relevant to locusts, including several found in the head space of wheat grass. Future experiments with larger sets of odorants, including blends or locust pheromones like 4-vinylanisole (4VA) and phenylacetonitrile (PAN), may help clarify the logic of motif switching.”

      Reviewer #2 (Public Review):

      This manuscript provides additional data about how smell is encoded by insects. The study includes both new experimental measurements and simulations. At present, there are questions about whether simulations are appropriately performed to support experimental measurements.

      The main experimental finding reported here is that the same olfactory receptor neurons (ORN) can respond with different temporal dynamics to different odorants. This finding is of interest. However, it is very important to discuss whether the differences in temporal dynamics can be explained by differences in how this odorant is carried by air, as has been described here: https://pubmed.ncbi.nlm.nih.gov/23575828/.

      We agree this phenomenon is of great interest, and we have now expanded our discussion section to address it.

      In the cited paper (see also Su et al, 2011), PID response characteristics were indeed quite different for different odors, reflecting “fast” and “slow” intrinsic odor dynamics. We are aware of these studies and shared the reviewer’s concern, and for this reason we also made PID recordings during odor presentations. These recordings show our odor set included only “fast” odorants (please see the figure below). We also note that, across our extensive dataset, all odors could elicit all four response motifs. These observations rule out the possibility that differences in how odorants are carried by air underlie the different temporal dynamics we observed in OSN responses.

      We now discuss this important point in the text, as follows: “Earlier work established that the intrinsic dynamic properties of odorants, described as “fast” or “slow,” can contribute to variations in the timing of ORN responses (Su et al., 2011; Martelli et al., 2013). However, our experiments ruled out the possibility that intrinsic odorant dynamics underly the response motifs we describe here. First, across our extensive dataset, all odors could elicit all four response motifs; second, photoionization detector recordings of our odor presentations all revealed “fast” dynamics (not shown). It seems likely that “slow” odors would elicit concentration-dependent elaborations in the response motifs we observed. In future work it will be interesting to investigate ways intrinsic odor dynamics interact with ORN response motifs. We predict such interactions would further increase ORN response dimensionality” (lines 370-380).

      There are several questions that need to be addressed regarding the simulations part of the manuscript.

      1) There is a mismatch between the number of ORNs used in the model and in the insect system studied.

      The exact number of ORNs in the locust is not known, but estimates range from 45,000 to 113,000 per antenna (Leitch & Laurent 1996; Perez-Orive et al 2002; Galizia & Sachse 2010). We chose to model a smaller but still large set of ORNs (10,000) which we believe is a reasonable compromise between the ideal size (which would be true number of ORNs in locust), and limitations needed to achieve practical computational efficiency. Indeed, almost all computational models are unavoidably scaled-down versions of the biological organisms.

      2) The demonstration in Figure 5 that motif switching improves odor classification includes motif switching for a given odorant, which is not observed experimentally.

      We regret that our description of the experiment presented in Figure 5 was confusing, and we have revised extensively to clarify this in our revision. In brief, the simulation shown in Figure 5 was not, as the reviewer understood, an attempt to model motif switching that occurs when a given odorant is presented repeatedly; rather, it shows how responses to two different, similar odors (Odor 1 and Odor 2) become increasingly different from each other when the probability of motif switching increases.

      We have now revised the text to clarify this point as follows: “With our model we could independently vary odor-elicited response motifs and response magnitudes (Figure 4E), allowing us to evaluate the extent to which motif switching benefitted odor classification in a way that cannot be tested in vivo. Thus, we simulated a realistically large number of ORNs (10,000) and compared the relative success of classifying two different odors (Odor 1 and Odor 2) with three different versions of our model in which we systematically varied the probability of motif switching. Model Version 1: the probability of switching response motif when switching from Odor 1 to Odor 2 was 0%; Version 2: 10%; Version 3: 50%. We found that the model versions that simulated higher motif switching probability made it easier to distinguish these two similar odors.” (lines 191-195, 206-209).

      We have also revised the figure caption as follows: “Computational model shows response motif switching substantially improves odor classification. A) Simulated ORN spiking illustrates different motif switching probabilities. Odors 1 and 2 are similar (see Methods). Each ORN response is sorted by motifs elicited by Odor 1. Raster plots show the responses to Odor 2 become increasingly different from responses to Odor 1 as motif switching probability increases. B) ORN odor-elicited response trajectories in reduced PCA space show motif switching increases the separation between responses to similar Odors 1 and 2; response to Odor 1 (blue) is the same in each panel; response to Odor 2 (red) changes with switching probability. C) Odor classification success as a function of odor similarity and motif switching probability for 1s (top) and 4s (bottom) stimulus pulses; even low switching probabilities improve classification performance; darker shading indicates lower classification accuracy. Odor similarity is quantified by angles (degrees) between odor vectors (see Methods)” (lines 231-239).

      3) The methodology for estimating neural temporal dynamics needs to be corrected to apply to the natural stimuli used here.

      We agree and thank the reviewer for raising this important point. To appropriately account for natural correlations present in the stimuli we used in experiments, we have now completely redone our analysis, substantially revised Figure 6, and rewritten the Methods section titled “Temporal filters using linear non-linear models.” Using methods appropriate for strongly correlated and natural odorant stimuli delivered experimentally, we obtained results consistent with those in the previous version of our manuscript.

      Reviewer #3 (Public Review):

      In this contribution, the authors align an extensive analysis of in vivo recordings of olfactory receptor neuron (ORN) responses to odors in the locust with a data-driven mathematical model of ORN population coding. Their results provide novel insights into the temporal dynamics of peripheral encoding of time-varying and naturalistic olfactory input.

      The manuscript presents three central experimental results: 1) ORNs odor responses can be grouped into 4 distinct response motifs (response profiles). This has partly been known with respect to the typical excitatory phasic-tonic motif and odor offset responses. The exhaustive data set here is however unprecedented. 2) Individual ORNs can switch their response motif, e.g. from excitatory to inhibitory responses. To my knowledge, this is entirely new, highly interesting, and has strong implications. For one it implies an increased coding space and odor separability, which is supported by the authors' model study. It also bears implications for our understanding of processing in the antennal lobe where projection neurons were shown to exhibit property but this has largely been attributed to network processing within the AL. The authors discuss ephaptic interactions as a possible underlying mechanism. 3) ORNs not only show classical within and across pulse adaptation where the response amplitude reduces, but also the novel result that the offset response can increase across repeated pulses with short inter-stimulus intervals. The data-driven model reproduces the experimental observations and a population model that confirms the assumed increase in coding space. In the temporal domain, the authors then perform simulations that mimic realistic stimulus statistics with stochastic arrival of odor packets of variably short duration. The model with a trained linear filter and a non-linear transfer function faithfully predicts the experimental firing rates.

      These results, based on an exhaustive set of experimental data, provide a novel view of peripheral odor coding in insects and they will have a particularly strong impact on biologically realistic computational (spiking) circuit models of sensory processing and sensory-to-motor transformations during odor source navigation in naturalistic simulated odor environments where conclusive data and analysis of ORN signaling has thus far been lacking.

      We thank the reviewer for this very thoughtful and positive assessment of our work.

    1. Author Response

      Reviewer #1 (Public Review):

      Authors introduced new strategy of genetic manipulation in mice to reveal functional development of the retrotrapezoid nucleus (RTN) neurons that is known as an important brainstem region for central chemoreception and the dysfunction is relate to congenital central hypoventilation syndrome (CCHS) neuropathology. They used a conditional mutation of Phox2b within Atoh1derived cells (Atoh1Cre/Phox2bΔ8 mice) and examined a) respiratory rhythm; b) ventilatory responses to hypercapnia and hypoxia and c) number of RTN-chemosensitive neurons. They found that 1) mice with mutant Phox2b expression showed a suppressed breath activity to hypoxia and hypercapnia in neonates; 2) adult mutant mice presented irregular breathing pattern, partial recovery of the ventilatory response to hypoxia and complete recovery of response to hypercapnia; 3) anatomical data showed reduced number of activated neurons by hypercapnia and Phox2b immunoreactivity in the RTN. They concluded that conditionally expression of Phox2b mutation by Atoh1 affected development of the RTN neurons and suggested that Atoh1/Phox2b system in the RTN was essential for the activation of breathing under hypoxic and hypercapnia condition. They thought that their findings provided new evidence for mechanisms related to CCHS neuropathology. The conclusions of this paper are well supported by data, but careful discussion seems to be required for comparison with results of various previous studies performed by different genetic strategies for the RTN development.

      We would like to thank the reviewer for the comments on our manuscript. In the present version, we made several corrections as suggested by the reviewers to facilitate interpretation and strength the manuscript.

      Reviewer #2 (Public Review):

      Mutations in the Phox2B gene can lead to congenital central hypoventilation syndrome with variable presentations. Two distinct classes of causative mutations have been found in the human population. The first group consists of mutations that result in trinucleotide, polyalanine repeat expansions, referred to as PARM. The second group are non- polyalanine repeat expansion mutations (NPARM) that includes missense, nonsense, and frameshift mutations. Each group (and even specific mutations) present with differing clinical phenotype severity, with NPARM mutations typically being more severe. As Phox2B is expressed across a multitude of cell types across the life an individual, there remains much to be understood as to the cell specific effects of various Phox2B mutations on phenotype. To add to our understanding, the authors utilized a conditional Phox2bΔ8 allele that, upon recombination, replaces Exon 3 and UTR with a mutated exon and IRES GFP reporter. This approach allows for an inducible NPARM mutation and reporter expression in a targeted cell type. The authors focused on Atoh1 expressing cells using an Atoh1 expressing Cre recombinase line (Atoh1_Cre). Atoh1 has been shown to also be coexpressed in the RTN and in the para and inter-trigeminal regions of the Pons. After inducing the Phox2B mutations in one allele, the authors examined respiratory features in both adults and neonate mice under room air, hypercapnia (7%) and Hypoxia (8%). The Atoh1_Cre; Phox2bΔ8 adult mice showed a significant body weight difference. Under their plethysmography approach neonate mice breathing room air showed few differences with a potential difference in tidal volume. Notably adult mice show irregularity in their breathing. Both adult and neonate mice may show compromised chemosensory deficits. A potential hypercapnic deficit likely resolves in the adult but there may remain a compromised hypoxic reflex in the adult. Notably, Atoh1_Cre; Phox2bΔ8 mice showed reduced cfos expression in the RTN after hypercapnic stimulation and reduced Phox2B immuno-reactivity.

      The premise of the paper is to examine how a distinct mutation in a specific cellular context may contribute to clinical outcomes. The potential phenotypes are interesting and illuminate how differing mutations may drive different phenotypes or phenotype severity. While the RTN is likely a key mediator of the reported phenotypes, the conclusions drawn by the authors cannot be fully supported with the data presented.

      We would like to thank the reviewer for the comments. In the present version, we have made all changes suggested and we performed new sets of additional experiments to strengthen the work. We are very enthusiastic about the new version of the manuscript, and we believe it opened new questions that could be addressed in the future.

      The authors assign all phenotypes to RTN function. However, there are other documented and potential undocumented areas of Atoh1 and Phox2b overlap that could either impact breathing directly or indirectly through metabolism and stress responses (PMID 8184995). As noted above, para trigeminal neurons including those in the ITR also co-express Atoh1 and Phox2B and are captured in the Atoh1_Cre; Phox2bΔ8 mouse model. The inter-trigeminal region is associated with apneic reflexes and jaw opening (PMID: 19914183). Thus, perturbations to this center may underlie the increased irregularity seen in adult life. A potential role in chemosensory function cannot be entirely ruled out either. While Rose et al. assert that the RTN and para- and inter- trigeminal neurons are the only ones co-expressing Atoh1 and Phox2B (using antibodies), the persistent cumulative GFP labeled fate map offered by the Atoh1_Cre; Phox2bΔ8 model would allow the authors to rule in or rule out any other uncharacterized overlapping populations. Such a fate map may also help to inform as to why the adult mice are significantly underweight. The weight phenotype may stem from metabolic dysregulation, changes in behavior, or feeding. Changes in metabolism may drive secondary changes in breathing and chemosensory reflexes that play a role in the reported phenotypes. Ultimately, the relative roles of para-trigeminal and RTN neurons in these phenotypes should be dissected out.

      Yes, we ran a new series of experiments and noticed that Phox2b+ neurons in the pons as well as the number of TH cells in the A1, A2, A6, and C1 were not affected by the mutation. Unfortunately, we were unable to quantify the number of Phox2b-expressing neurons in the paratrigeminal region.

      Both the adult and neonate plethysmography was not collected in line with current best practices. Adult whole body plethysmography is best carried out in a temperature controlled chamber held at thermo-neutrality. This minimizes any thermo-regulatory and metabolic effects on respiratory drive. Concurrent measurement of one or more metabolic parameters such as VO2 or VCO2 is required to determine if baseline breathing, and chemosensory reflex phenotypes may be affected by changes metabolism or persistent metabolic imbalances (acidosis or alkalosis). Whole body measurements in neonates are do not allow for accurate assessment of tidal volume. Rather head out or facemark pneumotachography are more accurate, (PMID: 25017785).

      We totally agree with the reviewer. In fact, some information and misconception were noticed in the previous version. We added the correct way in which the respiratory parameters were measured in both neonate and adult mice. Additionally, we performed head-out plethysmograph in a subset of neonates (control and mutant) and added it in the result section. We also measure VO2 and VE/VO2 in neonates and adults.

      Reviewer #3 (Public Review):

      The work by Ferreira and colleagues set to define the functional consequences of a PHOX2B (Phox2bdelta8) mutation, belonging to the group of non-polyalanine repeat expansions, when restricted to Atoh1 expressing cells. In doing so, the authors generated a new mouse model (Atoh1Cre,Phox2bdelta8 mice) for the study of the central respiratory chemoreceptor circuit. Ferreira et al., found that these conditional mutants present with largely unaffected breathing parameters in postnatal life. However, neonatal breathing irregularities, normally observable in control neonates, are not corrected with the maturation of the conditional mutants. Furthermore, the authors found that conditional Atoh1Cre,Phox2bdelta8 neonates fail to display ventilatory responses to hypoxic (low O2 content in air) and hypercapnic (high CO2 content in air) challenges. The authors show that Atoh1Cre,Phox2bdelta8 adult mice appear to "recover" the capacity to response to hypercapnic, but not hypoxic, challenges. Lastly, the authors found reduced numbers of Phox2b+ cells in an "area" where the retrotrapezoid nucleus, a key center in the respiratory chemoreceptor circuit, normally locates.

      Strengths:

      The most exciting aspect of this work is the modelling of the Phox2bdelta8 mutation in one element of the central neuronal circuit mediating respiratory reflexes, that is in the retrotrapezoid nucleus. To date, mutations in the PHOX2B gene are commonly associated with most patients diagnosed with central congenital hypoventilation syndrome (CCHS), a disease characterized by hypoventilation and absence of chemoreflexes, in the neonatal period, which in severe cases can lead to respiratory arrest during sleep. Two distinct types of PHOX2B mutations have been identified in CCHS patients: i) polyalanine repeat expansions, and ii) non-polyalanine repeat expansions. Non-polyalanine repeat expansions tend to be more prevalent in severe cases of CCHS. Thus, the characterization of the Phox2bdelta8 mutation could allow for a better understanding of the etiology behind CCHS.

      Weaknesses:

      Whereas the most exciting part of this work is the modelling of the Phox2bdelta8 mutation in retrotrapezoid neurons using conditional mutagenesis driven by Atoh1 (i.e. Atoh1Cre,Phox2bdelta8 mice), the weakness of this study is the lack of a clear physiological, developmental, and anatomical distinction between this approach and similar studies already reported elsewhere, for instance the use of Atoh1Cre,Phox2bflox/flox and P2b::CreBAC1;Atoh1lox/lox mice (Ruffault et al., 2015, DOI: 10.7554/eLife.07051), Egr2cre;P2b27Alacki (Ramanantsoa et al., 2011, DOI: 10.1523/JNEUROSCI.1721-11.2011), Atoh1Phox2bCKO mice (Huang et al., 2017, DOI: 10.1016/j.neuron.2012.06.027) and Egr2cre;Lbx1FS (Hernandez-Miranda et al., 2018, DOI: 10.1073/pnas.1813520115).

      Several conclusions presented in this work are not directly supported by the provided data. For instance, the reduction in the number of retrotrapezoid neurons in Atoh1Cre,Phox2bdelta8 mice or the reduction of fos+ activated retrotrapezoid neurons after CO2 exposure, as the identity of retrotrapezoid neurons was not thoroughly determined. Furthermore, the authors conclude from their plethysmograph (respiratory recordings) data that Atoh1Cre,Phox2bdelta8 neonatal mice display an impaired ventilatory responses to hypoxia (low O2 in air) and hypercapnia (high CO2 in air), but that these mutant animals recover the capacity to respond to hypercapnia, but not to hypoxia, in the adult life. This is a bit of an overstatement, as their plethysmograph recordings show that adult Atoh1Cre,Phox2bdelta8 mice do respond to low O2 in air, as these mice accelerate respiration, increase tidal volumes and minute ventilation in the same fashion as control mice. However, what the presented data show is that adult Atoh1Cre,Phox2bdelta8 mice do not sustain the ventilatory response as efficient as control mice.

      We would like to thank the reviewer for the comments, strengths, and weakness of our study. In the present version, we have made a significant change throughout the manuscript as suggested by the editor and reviewers. In addition, we performed new sets of experiments to strengthen our work. We are very enthusiastic about the current version, and we believe it will open new questions that need to be addressed in future studies

    1. Author Response

      Reviewer #1 (Public Review):

      Wosniack et al. perform the analysis of larval trajectories from behavioral experiments and build a phenomenological model and efficiently combine the two to dissect behavioral strategies that Drosophila larvae use during foraging. The paper touches upon several factors that influence foraging: from food quality and distribution to genetic polymorphism and finally the contribution of sensory cues. While the first two are well explored and characterized in the paper, the contribution of different sensory modalities is less investigated. They study how homogeneous food substrates or food distributed in patches influence foraging strategies. They find a modular organization of behavioral strategies that is dependent of food characteristics: food quality modulates crawling speed, turning and pausing while increases in the time spent inside the patches are the result of biasing turning towards the patch center when the larvae are at the food-no food interface. Furthermore, using anosmic animals they determine that olfaction is differentially involved in the foraging decisions depending on the type of food substrates that the larvae are exploring. Finally, they perform this analysis in rover and sitter larvae to determine the effect of the foraging gene polymorphism on these behaviors and show that its expression (where sitter larvae are slower, turn less and pause more compared to rover larvae) is dependent on the food distribution. They propose that larvae adapt the extent of their exploration to the quality of food. This detailed analysis of elements that constitute behavioral strategies sets the basis for identifying genes involved in foraging and the neural substrates of the different behavioral modules and ultimately understanding the neural circuit mechanisms involved.

      The paper efficiently combines analysis of larval trajectories from experiments with computational modeling and identifies the behavioral elements that contribute to foraging. The authors show that olfaction has an important role when foraging on yeast substrates but not on sugar-rich substrates using anosmic larvae. They propose that taste could contribute more on sugar and apple juice substates however they do not test this hypothesis. Did the authors try or consider testing the Gr43a mutant on these substrates? Determining to which extent taste contributes to the different strategies completes the picture of how sensory cues contribute to foraging decisions that the authors started to address by tackling the contribution of olfaction to foraging on the different substrates. Also on patchy substrates, is the border completely smooth or could the larvae also sense the border as a rough edge? Could other modalities be involved?

      The idea of testing the anosmic animals was to understand to what extent volatile sensory cues influence the search outside the patch. We did not intend to make a complete analysis of the role of different sensory modalities for the foraging adaptation. In particular, investigating taste is complicated since it is not very well known how yeast taste is sensed. Several yeast metabolites have been shown to activate subsets of taste receptor neurons but the work has mostly been done in adult flies. There is a clearer picture regarding sugars where Gr43a is known to be a sucrose and fructose receptor. To understand the role of taste for foraging, we should do a series of experiments which go beyond the scope of this paper.

      But we agree it is an interesting question and have added a new section in the discussion. See line 634: “An experiment using the gustatory sweet sensor Gr43a mutant on sucrose, which is not volatile and does not produce smell, could help discerning the contribution of taste at the border of the patch (Fujishiro et al. 1984; Marella et al., 2006; Miyamoto et al. 2013; Wang et al.,2004; Mishra et al.,2013). For yeast, the lack of smell completely changed the response of the larvae, which did not show differences inside and outside the patch for most foraging parameters (Figure 4B, C, E, G). In this instance, taste was not sufficient to retain larvae inside the yeast patch (compare Figure 3H with Figure 4F) even though several gustatory receptors have been shown to be activated by yeast metabolites (Wisotsky et al., 2011, Ganguly et al.,2017, Croset et al., 2016).”

      Regarding the edge sensation, the revised version includes two control experiments where we have tested the impact of the edges in the absence of nutrients. In the first control experiment, we prepared wells for food patches like in the “sucrose” and “apple juice” conditions, but we filled them with agar. In the second experiment, to control for the “yeast” condition, we made patches with gel. The results are presented in Figure 3-figure supplement 2 and they show that in both cases, in the absence of nutrients, the edge does not have a significant influence on the turning rate towards the center.

      The revised version also includes mentions to mechanosensation:

      Line 337 : “We observed that inward turns occur more often than outward turns at the border of the patch for the three substrates (Figure 3B, inward turns are shown in black). To control for possible mechanosensory effects due to the border edges, we prepared new arenas with patches that contained no nutrients, either using the same agar that composed the rest of the arena, or using ultrasound gel (Methods). Larvae in the agar-agar or the agar-gel border did not show any changes in their preference to turn towards the patch center, confirming that the behavioral change observed in response to food is specific (Figure 3-figure supplement 2).”

      Line 646: “However, when larvae are crawling, they leave a print of their denticle attachment on the agar, that could inform them about their previous location and help returning to the food.”

      In Figure 3C the crawling speed is lower in yeast and apple juice experiments both inside and outside of patches (and in both rovers and sitters) compared to sucrose experiments. Do the authors have an explanation for this? Also, as they note, surprisingly the turn bias persisted when the larvae exited the patches. Are these two related? Do larvae turn more frequently?

      The speed outside the patches of yeast and apple juice is indeed lower than outside sucrose. We now mention this in the main text and propose an explanation:

      Line 313: “Outside yeast and apple juice patches, the crawling speed increased but did not return to levels similar to the agar-only condition, suggesting that the behavior of larvae that exit the patch is influenced by the recent food experience or that larvae might still be sensing the food (Figure 3-figure supplement 1E). In line with this, in yeast the number of turns outside the patch was higher than inside the patch.”

      The authors describe and discuss handedness in larval turning. While this in itself is an interesting characterisation, it does not appear to be thoroughly addressed in the context of its influence on foraging behavior. The authors conclude that the presence of patches induces turning bias that overrides handedness. It would be interesting to determine whether there are differences in turn size and/or reorientation frequency depending if the larvae are turning on the preferred side versus the non-preferred side.

      Thank you for pointing this, the sentence was somewhat misleading. We corrected it and added a quantification of the percentage of larvae whose handedness changes when comparing in and out behaviour, in Figure 3-figure supplement 1F. This is generally around 20% so larvae mostly adjust their angles rather than their handedness.<br /> Line 354: “This is accomplished by turning towards the patch center while maintaining the handedness (Figure 3J and Figure 3-figure supplement 1F) and represents an important mechanism to remain inside the food.”

      During different types of taxes, the larvae modulate crawling speed, duration, turn rate, size and direction to avoid unfavourable conditions and approach unfavourable conditions. This is true across different types of sensory gradients. Some of these strategies are also described in this paper. The authors make a link between behaviour on patch-no patch interface and taxis behaviour. It would be interesting to further develop the comparison between the behavioural elements described here and those in navigational strategies in sensory gradients. The commonalities and possible modular organisation of both could point to an existence of neural circuits for the different behavioural modules that are recruited differentially dependent on the sensory context, motivation state, or a combination of both (and based on different types of sensory information).

      Thank you for the comment. We have added a new section in the discussion. Line 651: “One of the strengths of our phenomenological model is that it incorporates a modular organization of foraging that could reflect how the crawl and turn modules are controlled. First, we modelled a stochastic search where no information regarding food is available outside of the current location, because food is absent or because the larvae cannot sense it. This corresponds to an autonomous search behavior implemented by circuits located in the ventral nerve cord without input from the brain (Berni et. al 2012; Sims et al. 2019). Second, we have incorporated a goal-directed navigation that allows larvae return to the food. Our phenomenological model includes a distance-dependent probability to turn inwards that mimics the effect of chemotaxis (when present), as much as any other possible mechanism that contributes to the turning probability. As a consequence, we observed that simulated larvae, even when the resources are fractioned in eight patches, could stay inside the food patch for longer periods, in line with experimental observations (Figure 5 and Figure 6). The model could be improved by setting the turning properties outside the patch to match as closely as possible experimental observations. To this end, we could consider studies of larvae crawling in different attractive gradients, where the changes in turning probability and angle, including weathervaning, have been investigated in relation to precise spatio-temporal information of odorants (Louis et al., 2008; Gomez-Marin et al., 2011; Davies et al.,2015). It would also be helpful to have information about other attractive gradients, like taste, to know if a common set of mechanisms is used regardless of the sensory modality. Using this information, our model could be used to investigate how crawling speed and turning properties are controlled via descending pathways from the brain (Tastekin et al. 2018; Jovanic et al. 2019). Finally, in the presence of nutrients, our model adjusts movements to stay on the food patch. The concerted decrease in turning rate and crawling speed and the increase in the number of pauses, suggests that a neuromodulatory depression of movement (Marder, 2012) could be relevant in this phase. It would be interesting to investigate more generally how neuromodulators influence the decision to remain or explore new food resources in relation to the resources available and the larval motivational state.”

      Reviewer #3 (Public Review):

      The authors of the paper study foraging strategy in crawling Drosophila larvae. They utilize single-larva tracking in isotropic and patchy food nutrition environments, detailed quantitative analysis of the animals' behavioral states and transitions, and a random-walk-style Monte Carlo simulation setting. They investigate how specific components of behavior are modulated for the animal to locate suitable food resources.

      Strengths:

      • The main results of the paper, laying out how crawling speed, turn/pause rates, and turn direction bias work together cause larvae to find the food they need are interesting, nicely presented, and important for ultimately understanding how foraging really works in detail, here at the behavioral level, and somewhere down the road at the circuit and/or molecular levels too.

      • Comparing rovers and sitters throughout the experimental parts of the paper was a really nice idea, with interesting results, and it is well motivated in the introduction.

      • The handedness of individuals is a nice finding as well, I think the first time this has been published for larval Drosophila.

      • Simulations that use empirical results as probability distributions make for a nice environment for testing ideas about larva behavior.

      • Creating the patchy food environments was a great idea, as it puts the larva behavior in a more realistic setting, but still controlled enough to be analyzed clearly.

      Weaknesses:

      • For an animal that tends to have a very high variance in its behavior, the number of larvae used in each experiment seems pretty low to me. As a result, some of the secondary claims are perhaps not as well supported when they rely on "not significant" statistical test results. * The introduction is generally good, but could perhaps better motivate why fly larva foraging should be of interest to a more general audience.

      We answered the question about the number of larvae used in our experiments in the required revisions above.

      We have added a section in the introduction to explain the relevance and generality of our work:

      Line 45: “These models postulate that animals will use different strategies depending on the distribution of the resources. In environments where resources are abundant, animals will search and exploit them performing short movements in random directions, in patterns well approximated by Brownian random walks. When resources are sparse, and foragers have incomplete knowledge about their location, a more diffusive strategy is needed, with an alternation between short-range and long-range movements, which can be modelled as a Lévy random walk. Analysis of animal movements in the wild has demonstrated that environmental context can induce the switch between Levy to Brownian movement patterns (Humphries et al., 2010), but the mechanisms behind the implementation of such a behavior (e.g., cognitive capacity, memory) often remain elusive (Budaev et al., 2019). Understanding the motor mechanisms that regulate the execution of different movement strategies and the transitions between them could provide insight into how the nervous system can drive the search for resources in complex and ever-changing environments. Drosophila larva is an excellent model to study this question, because the movement of single animals can be tracked for long periods of time in a controlled environment.”

      • The execution of the simulations seems reasonable, but perhaps don't add a lot to this particular paper, especially given how much of the manuscript they take up.

      We now specifically highlight the unique contributions of the model that go beyond the performed experiments, especially in terms of making experimental predictions. See our answer to the specific point in the requires revisions above. Overall, the primary results of the paper do achieve the stated goals and set the stage nicely for further studies into the underlying mechanisms of foraging in larvae.

      For those studying foraging, especially in flies/larvae but probably other animals as well, this should be an important paper that highlights the utility of individual animal tracking with high resolution, analyzing specific components of behavior, and creating simulation environments as playgrounds for investigating the impact of those components.

    1. Author Response

      Reviewer #2 (Public Review):

      This fascinating study describes a possible effect of cancer-generated microvesicles on fibroblasts. Microvesicles from a particularly metastatic line promote more contractile and proliferative fibroblasts, and there is a key role for at least one microvesicle factor - the crosslinking enzyme Transglutaminase-2. A wide range of studies help identify and elucidate these effects, but a few aspects remain unclear.

      1) MV- has more crosslinking TGM2 but also less MMP14 degradation, and so ECM is more stable either way. The authors should describe any other factors that would give a similar effect as these. The authors should address: do other genes change with TGM2 knockdown; does MMP14 change? If the latter changes, does it have a more important role than TGM2?

      We included a more thorough investigation into the proteomics data to determine what other factors in the MVs may induce fibroblast activation or matrix remodeling. Lists of “fibroblast-activating proteins” and “matrix remodeling proteins” were generated based on online datasets. All fibroblast-activating proteins tested were more highly expressed in MV- compared to MV+, but TGM2 was the only protein on this list with significantly increased expression (Figure 3b-d).

      A large variety of matrix-remodeling proteins were detected in the MV proteomics, including matrix ligands, proteases, protease inhibitors, and crosslinking enzymes. Interestingly, MV+ had significantly higher levels of the matrix remodeling proteins TIMP3, FN1, and COL8A1 (Figure 3d). MV- had significantly higher levels of the crosslinking enzymes PLOD1 and PLOD3, the matrix ligand COL12A1, and TGM2 (Figure 3d). As TGM2 can be categorized as both a matrix remodeling and fibroblast-activating protein and was significantly greater in the MV- compared to MV+, we believe this addition to the paper reinforces our focus on TGM2 (Figure 3).

      2) Perhaps the cleanest and important study of MV effects is in Fig.6j,k, but it shows in vivo differences that are barely significant or not significant, and compares to 'SF' serum free media as a control. Are serum components detected in Mass Spec? If so, wouldn't this suggest a serum supplemented media is a better control? The serum is usually from another species, which is a further (xenogeneic) concern that motivates care and discussion about dose -- especially given the high frequency of injection. Also, is there a survival difference for the mice?

      Thank you for bringing this concern to our attention. We realize that our wording was not clear. MVs are isolated under serum-free conditions and after isolation are resuspended in serum-free media. For this experiment, our mice were injected with either MVs suspended in serum-free media or serum-free media alone. We have revised the text to explain this more thoroughly.

      Additionally, we were unable to assess survival differences as our IACUC protocol requires sacrificing mice upon a certain percentage of weight loss.

    1. Author Response

      Reviewer #1 (Public Review):

      Using fMRI-based univariate and multivariate analyses, Root, Muret, et al. investigated the topography of face representation in the somatosensory cortex of typically developed two-handed individuals and individuals with a congenital and acquired missing hand. They provide clear evidence for an upright face topography in the somatosensory cortex in all three groups. Moreover, they find that one-handers, but not amputees, show shorter distances from lip representations to the hand area, suggesting a remapping of the lips. They also find a shift away of the upper face from the deprived hand area in one-handers, and significantly greater dissimilarity between face part representations in amputees and one-handers. The authors argue that this pattern of remapping is different to that of cortical neighborhood theories and points toward a remapping of face parts which have the ability to compensate for hand function, e.g., using the lips/mouth to manipulate an object.

      These findings provide interesting insights into the topographic organization of face parts and the principles of cortical (re)organization. The authors use several analytical approaches, including distance measures between hand- and face-part-responsive regions and representational similarity analysis (RSA). Particularly commendable is the rigorous statistical analysis, such as the use of Bayesian comparisons, and careful interpretation of absent group differences.

      We thank the reviewer for their positive and constructive feedback.

      Reviewer #2 (Public Review):

      After amputation, the deafferented limb representation in the somatosensory cortex is activated by stimulation of other body parts. A common belief is that the lower face, including the lips, preferentially "invades" deafferented cortex due to its proximity to cortex. In the present study, this hypothesis is tested by mapping the somatosensory cortex using fMRI as amputees, congenital one-handers, and controls moved their forehead, nose, lips or tongue. First, they found that, unlike its counterpart in monkeys, the representation of the face in the somatosensory cortex is right-side up, with the forehead most medial (and abutting the hand) and the lips most lateral. Second, there was little evidence of "reorganization" of the deafferented cortex in amputees, even when tested with movements across the entire face rather than only the lips. Third, congenital one-handers showed significant reorganization of deafferented cortex, characterized principally by the invasion of the lower face, in contrast to predictions from the hypothesis that proximity was the driving factor. Fourth, there was no relationship between phantom limb pain reports and reorganization.

      As a non-expert in fMRI, I cannot evaluate the methodology. That being said, I am not convinced that the current consensus is that the representation of the face in humans is flipped compared to that of monkeys. Indeed, the overwhelming majority of somatosensory homunculi I have seen for humans has the face right side up. My sense is that the fMRI studies that found an inverted (monkey-like) face representation contradict the consensus.

      Thank you for point this out. As we tried to emphasise in the introduction, very few neuroimaging studies actually investigated face somatotopy in humans, with inconsistent results. We agree the default consensus tends to be dominated by the up-right depiction of Penfield’s homunculus (recently replicated by Roux et al, 2018). However, due to methodological and practical constraints, alignment across subjects in the case of intracortical recordings is usually difficult to achieve, and thus makes it difficult to assess the consistency in topographical organisation. Moreover, previous imaging studies did not manage to convincingly support Penfield’s homunculus. For these two key reasons, the spatial orientation of the human facial homunculus is still debated. A further limiting factor of previous studies in humans is that the vast majority of human studies investigating face (re)mapping in humans focused solely on the lip representation, using the cortical proximity hypothesis to interpret their results. Consequently, as we highlight above in our response to the Editor, there is a wide-spread and false representation in the human literature of the lips neighbouring the hand area.

      To account for the reviewer’s critic and convey some of this context, we changed our title from: Reassessing face topography in primary somatosensory cortex and remapping following hand loss; to: Complex pattern of facial remapping in somatosensory cortex following congenital but not acquired hand loss. This was done to de-emphasise the novelty of face topography relative to our other findings.

      We also rewrote our introduction (lines 79-94) as follows:

      “The research focus on lip cortical remapping in amputees is based on the assumption that the lips neighbour the hand representation. However, this assumption goes against the classical upright orientation of the face in S126–30, as first depicted in Penfield’s Homunculus and in later intracortical recordings and stimulation studies26–29, with the upper-face (i.e., forehead) bordering the hand area. In contrast, neuroimaging studies in humans studying face topography provided contradictory evidence for the past 30 years. While a few neuroimaging studies provided partial evidence in support of the traditional upright face organisation31, other studies supported the inverted (or ‘upside-down’) somatotopic organisation of the face, similar to that of non-human primates32,33. Other studies suggested a segmental organisation34, or even a lack of somatotopic organisation35–37, whereas some studies provided inconclusive or incomplete results38–41. Together, the available evidence does not successfully converge on face topography in humans. In line with the upright organisation originally suggested by Penfield, recent work reported that the shift in the lip representation towards the missing hand in amputees was minimal42,43, and likely to reside within the face area itself. Surprisingly, there is currently no research that considers the representation of other facial parts, in particular the upper-face (e.g., the forehead), in relation to plasticity or PLP.”

      We also updated the discussion accordingly (lines 457, 469-477, 490-492).

      Similarly, it is not clear to me how the observations (1) of limited reorganization in amputees, (2) of significant reorganization in congenital one-handers, and (3) of the lack of relationship between PLP and reorganization is novel given the previous work by this group. Perhaps the authors could more clearly articulate the novelty of these results compared to their previous findings.

      Thank you for giving us the opportunity to clarify on this important point. The novelty of these results can be summarised as follow:

      (1) Conceptually, it is crucial for us to understand if deprivation-triggered plasticity is constrained by the local neighbourhood, because this can give us clues regarding the mechanisms driving the remapping. We provide strong topographic evidence about the face orientation in controls, amputees and one-handers.

      (2) The vast majority of previous research on brain plasticity following hand loss (both congenital and acquired) in humans has exclusively focused on the lower face, and lips in particular. We provide systematic evidence for stable organisation and remapping of the neighbouring upper face, as well as the lower face. We also study topographic representation of the tongue (and nose) for the first time.

      (3) The vast majority of previous research on brain remapping following hand loss (both congenital and acquired, neuroimaging and electrophysiological) was focused on univariate activity measures, such as the spatial spread of units showing a similar feature preference, or the average activity level across individual units. We are going beyond remapping by using RSA, which allows us to ask not only if new information is available in the deprived cortex (as well as the native face area), but also whether this new information is structured consistently across individuals and groups. We show that representational content is enhanced in the deprived cortex one-handers whereas it is stable in amputees relative to controls (and to their intact hand region).

      (4) Based on previous studies, the assumption was that reorganisation in congenital one-handers was relatively unspecific, affecting all tested body parts. Here, we provide evidence for a more complex pattern of remapping, with the forehead representation seemingly moving out of the missing hand region (and the nose representation being tentatively similar to controls). That is, we show not just “invasion” but also a shift of the neighbour away from the hand area which has never been documented (or in fact suggested).

      (5) Using Bayesian analyses we provide definitive evidence against a relationship between PLP and forehead remapping, providing first and conclusive evidence against the remapping hypothesis, based on cortical neighbourhood.

      Our inclination is not to add a summary paragraph of these points in our discussion, as it feels too promotional. Instead, we have re-written large sections of the introduction and discussion to better emphasise each of these points separately throughout the text, where the context is most appropriate. Given the public review strategy taken by eLife, the novelty summary provided above will be available for any interested reader, as part of the public review process. However, should the reviewer feel that a novelty summary paragraph is required (or an emphasis on any of the points summarised above), we will be happy to revise the manuscript accordingly.

      Finally, Jon Kaas and colleagues (notably Niraj Jain) have provided evidence in experiments with monkeys that much of the observed reorganization in the somatosensory cortex is inherited from plasticity in the brain stem. Jain did not find an increased propensity for axons to cross the septum between face and hand representations after (simulated) amputation. From this perspective, the relevant proximity would be that of the cuneate and trigeminal nuclei and it would be critical to map out the somatotopic organization of the trigeminal and cuneate nuclei to test hypotheses about the role of proximity in this remapping.

      Thank you for highlighting this very relevant point, which we are well aware of. We fully agree with the reviewer that this is an important goal for future study, but functional imaging of the brainstem in humans is particularly challenging and would require ultra high field imaging (7T) and specialised equipment. We have encountered much local resistance due to hypothetical issues for MRI safety for scanning amputees in this higher field strength, meaning we are unable to carry out this research ourselves. Our former lab member Sanne Kikkert, who is now running her independent research programme in Zurich, has been working towards this goal for the past 4 years. So we can say with confidence that this aim is well beyond the scope of the current study. In response to your comment, we mentioned this potential mechanism in the introduction (lines 98-101), we ensured that we only referred to “cortical proximity” throughout our manuscript, and we circle back to this important point in the discussion.

      Lines 539-543: “Moreover, even if the remapping we observed here goes against the theory of cortical proximity, it can still arise from representational proximity at the subcortical level, in particular at the brainstem level44,45. While challenging in humans, mapping both the cuneate and trigeminal nuclei would be critical to provide a more complete picture regarding the role of proximity in remapping.”

      Reviewer #3 (Public Review):

      In their study, the authors set up to challenge the long-held claim that cortical remapping in the somatosensory cortex in hand deprived cortical territories follows somatotopic proximity (the hand region gets invaded by cortical neighbors) as classically assumed. In contrast to this claim, the authors suggest that remapping may not follow cortical proximity but instead functional rules as to how the effector is used. Their data indeed suggest that the deprived hand area is not invaded by the forefront which is the cortical neighbor but instead by the lips which may compensate for hand loss in manipulating objects. Interestingly the authors suggest this is mostly the case for one-handers but not in amputees for who the reorganization seems more limited in general (but see my comments below on this last point).

      This is a remarkably ambitious study that has been skilfully executed on a strong number of participants in each group. The complementarity of state-of-the-art uni- and multi-variate analyses are in the service of the research question, and the paper is clearly written. The main contribution of this paper, relative to previous studies including those of the same group, resides in the mapping of multiple face parts all at once in the three groups.

      We are grateful to the reviewer for appreciating the immense effort that this study involved.

      In the winner takes all approach, the authors only include 3 face parts but exclude from the analyses the nose and the thumb. I am not fully convinced by the rationale for not including nose in univariate analyses - because it does not trigger reliable activity - while keeping it for representational similarity analyses. I think it would be better to include the nose in all analyses or demonstrate this condition is indeed "noisy" and then remove it from all the analyses. Indeed, if the activity triggered by nose movement is unreliable, it should also affect multivariate.

      Following this comment, we re-ran all univariate analyses to include the nose, and updated throughout the main text and supplemental results and related figures. In short, adding the nose did not change the univariate results, apart from a now significant group x hemisphere interaction for the CoG of the tongue when comparing amputees and controls, matching better the trends for greater surface coverage in the deprived hand ROI of amputees. Full details are provided in our response to Reviewer 1 above.

      The rationale for not including the hand is maybe more convincing as it seems to induce activity in both controls and amputees but not in one-handers. First, it would be great to visualize this effect, at least as supplemental material to support the decision. Then, this brings the interesting possibility that enhanced invasion of hand territory by lips in one-handers might link to the possibility to observe hand-related activity in the presupposed hand region in this population. Maybe the authors may consider linking these.

      Thank you for this comment. As we explain in our response to Reviewer 1 above, we did not intent the thumb condition in one-handers for analysis, as the task given to one-handers (imagine moving a body part you never had before) is inherently different to that given to the other groups (move - or at least attempt to move - your (phantom) hand). As such, we could not pursuit the analysis suggested by the reviewer here. To reduce the discrepancy and following Reviewer 1’s advice, we decided to remove the hand-face dissimilarity analysis which we included in our original manuscript, and might have sparked some of this interest. Upon reflection we agreed that this specific analysis does not directly relate to the question of remapping (but rather of shared representation), in addition to making the paper unbalanced. We will now feature this analysis in another paper that appears more appropriate in the context of referred sensations in amputees (Amoruso et al, 2022 MedRxiv).

      The use of the geodesic distance between the center of gravity in the Winner Take All (WTA) maps between each movement and a predefined cortical anchor is clever. More details about how the Center Of Gravity (COG) was computed on spatially disparate regions might deserve more explanations, however.

      We are happy to provide more detail on this analysis, which weights the CoG based on the clusters size (using the workbench command -metric-weighted-stats). Let’s consider the example shown here (Figure 1) for a single control participant, where each CoG is measured either without weighting (yellow vertices) or with cluster weighting (forehead CoG=red, lip CoG=dark blue, tongue CoG=dark red). When the movement produces a single cluster of activity (the lips in the non-dominant hemisphere, shown in blue), the CoG’s location was identical for both weighted (red) and unweighted (yellow) calculations. But other movements, such as the tongue (green), produced one large cluster (at the lateral end), with a few more disparate smaller clusters more medially. In this case, the larger cluster of maximal activity is weighted to a greater extent than the smaller clusters in the CoG calculation, meaning the CoG is slightly skewed towards it (dark red), relative to the smaller clusters.

      Figure 1. Centre-of-gravity calculation, weighted and unweighted by cluster size, in an example control participant. Here the winner-takes-all output for each facial movement (forehead=red, lips=blue, tongue=green) was used to calculate the centre-of-gravity (CoG) at the individual-level in both the dominant (left-hand side) and non-dominant (right-hand side) hemisphere, weighted by cluster size (forehead CoG=red, lip CoG=dark blue, tongue CoG=dark red), compared to an unweighted calculation (denoted by yellow dots within each movements’ winner-takes-all output).

      This is now explained in the methods (lines 760-765) as follows:

      “To assess possible shifts in facial representations towards the hand area, the centre-of-gravity (CoG) of each face-winner map was calculated in each hemisphere. The CoG was weighted by cluster size meaning that in the event of multiple clusters contributing to the calculation of a single CoG for a face-winner map, the voxels in the larger cluster are overweighted relative to those in the smaller clusters. The geodesic cortical distance between each movement’s CoG and a predefined cortical anchor was computed.”

      Moreover, imagine that for some reason the forefront region extends both dorsally and ventrally in a specific population (eg amputees), the COG would stay unaffected but the overlap between hand and forefront would increase. The analyses on the surface area within hand ROI for lips and forehead nicely complement the WTA analyses and suggest higher overlap for lips and lower overlap for forehead but none of the maps or graphs presented clearly show those results - maybe the authors could consider adding a figure clearly highlighting that there is indeed more lip activity IN the hand region.

      We agree with you on this limitation of the CoG and this is why we interpret all cortical distances analyses in tandem with the laterality indices. The laterality indices correspond to the proportion of surface area in the hand region for a given face part in the winner-maps.

      Nevertheless, to further convince the Reviewer, we extracted activity levels (beta values) within the hand region of congenitals and controls, and we ran (as for CoGs) a mixed ANOVA with the factors Hemisphere (deprived x intact) and Group (controls x one-handers).

      As expected from the laterality indices obtained for the Lips, we found a significant group x hemisphere interaction (F(1,41)=4.52, p=0.040, n2p=0.099), arising from enhanced activity in the deprived hand region in one-handers compared to the non-dominant hand region in controls (t(41)=-2.674, p=0.011) and to the intact hand region in one-handers (t(41)=-3.028, p=0.004).

      Since this kind of analysis was the focus of previous studies (from which we are trying to get away) and since it is redundant with the proportion of face-winner surface coverage in the hand region, we decided not to include it in the paper. But we could add it as a Supplementary result if the Reviewer believes this strengthens our interpretation.

      In addition to overlap analyses between hand and other body parts, the authors may also want to consider doing some Jaccard similarity analyses between the maps of the 3 groups to support the idea that amputees are more alike controls than one-handers in their topographic activity, which again does not appear clear from the figures.

      We thank the reviewers for this clever suggestion. We now include the Jaccard similarity analysis, which quantified the degree of similarity (0=no overlap between maps; 1=fully overlapping) between winner-takes-all maps (which included the nose; akin to the revised univariate results) across groups. For each face part/amputee, the similarity with the 22 controls and 21 one-handers respectively was averaged. We utilised a linear mixed model which included fixed factors of Group (One-handers x Controls), Movement (Forehead x Nose x Lips x Tongue) and Hemisphere (Intact x Deprived) on Jaccard similarity values (similar to what we used for the RSA analysis). A random effect of participant, as well as covariates of ages, were also included in the model.

      Results showed a significant group x hemisphere interaction (F(240.0)=7.70, p=0.006; controlled for age; Fig. 5), indicating that amputees’ maps showed different similarity values to controls’ and one-handers’ depending on the hemisphere. Post-hoc comparisons (corrected alpha=0.025; uncorrected p-values reported) revealed significantly higher similarity to controls’ than to one-handers’ maps in the deprived hemisphere (t(240)=-3.892, p<.001). Amputees’ maps also showed higher similarity to controls’ maps in the deprived relative to the intact hemisphere (t(240)=2.991, p=0.003). Amputees, therefore, displayed greater similarity of facial somatotopy in the deprived hemisphere to controls, suggesting again fewer evidence for cortical remapping in amputees.

      We added these results at the end of the univariate analyses (lines 335-351) and in the discussion (lines 464-465 and 497-500).

      This brings to another concern I have related to the claim that the change in the cortical organization they observe is mostly observed in one-handers. It seems that most of this conclusion relies on the fact that some effects are observed in one-handers but not in amputees when compared to controls, however, no direct comparisons are done between amputees and one-handers so we may be in an erroneous inference about the interaction when this is actually not tested (Nieuwenhuis, 11). For instance, the shift away from the hand/face border of the forehead is also (mildly) significant in amputees (as observed more strongly in one-handers) so the conclusion (eg from the subtitle of the results section) that it is specific to one-hander might not fully be supported by the data. Similar to the invasion of the hand territory from the lips which is significant in amputees in terms of surface area. All together this calls for toning down the idea that plasticity is restricted to congenital deprivation (eg last sentence of the abstract). Even if numerically stronger, if I am not wrong, there are no stats showing remapping is indeed stronger in one-handers than in amputees and actually, amputees show significant effects when compared to controls along the lines as those shown (even if more strongly) in one-handers.

      Thank you for this very important comment. We fully agree – the RSA across-groups comparison is highly informative but insufficient to support our claims. We did not compare the groups directly to avoid multiple comparisons (both for statistical reasons and to manage the size of the results section). But the reviewer’s suggestion to perform a Jaccard similarity analysis complements very nicely the univariate and multivariate results and allows for a direct (and statistically lean) comparison between groups, to assess whether amputees are more similar to controls or to congenital one-handers, taking into account all aspects of their maps (both spatial location/CoG and surface coverage). We added the Jaccard analysis to the main text, at the end of the univariate results (lines 335-385). The Jaccard analysis suggests that amputees’ maps in the deprived hemisphere were more similar to the maps of controls than to the ones of congenital one-handers. This allowed us to obtain significant statistical results to support the claim that remapping is indeed stronger in one-handers than in amputees (lines 346-351). We also compared both amputees and one-handers to the control group. In line with our univariate results, this revealed that the only face part for which controls were more similar to one-handers than to amputees was the tongue (lines 379-381). And that the forehead remapping observed at the univariate level in amputees (surface area), is likely to arise from differences in the intact hemisphere (lines 381-383).

      Finally, we also added the post-hoc statistics comparing amputees to congenitals in the RSA analysis (lines 425-427): “While facial information in the deprived hand area was increased in one-handers compared with amputees, this effect did not survive our correction for multiple comparisons (t(70.7)=-2.117, p=0.038).”

      Regarding the univariate results mentioned by the reviewer, we would like to emphasise that we had no significant effect for the lips in amputees, though we agree the surface area appears in between controls and one-handers. But this laterality index was not different from zero. This test is now added lines 189-190. Regarding the forehead, we fully agree with the Reviewer, and we adjusted the subtitle accordingly (lines 241-242). For consistency, we also added the t-test vs zero for the forehead surface area (non-significant, lines 251-253).

      Also, maybe the authors could explore whether there is actually a link between the number of years without hand and the remapping effects.

      To address this question, we explored our data using a correlation analysis. The only body part who showed some suggestive remapping effects was the tongue, and so we explored whether we could find a relationship (Pearson’s correlation) between years since amputation and the laterality index of the Tongue in amputees (r = 0.007, p=0.980, 95% CI [-0.475, 0.475]). We also explored amputees’ global Jaccard similarity values to controls in the deprived hemisphere (r = -0.010, p=0.970, 95% CI [-0.488, 0.473]), and could not find any relationship. Considering there was no strong remapping effect to explain, we find this result too exploratory to include in our manuscript.

      One hypothesis generated by the data is that lips remap in the deprived hand area because lips serve compensatory functions. Actually, also in controls, lips and hands can be used to manipulate objects, in contrast to the forehead. One may thus wonder if the preferential presence of lips in the hand region is not latent even in controls as they both link in functions?

      We agree with the reviewer’s reasoning, and we think that the distributed representational content we recently found in two-handers (Muret et al, 2022) provides a first hint in this direction. It is worth noting that in that previous publication we did not find differences across face parts in the activity levels obtained in the hand region, except for slightly more negative values for the tongue. But we do think that such latent information is likely to provide a “scaffolding” for remapping. While the design of our face task does not allow to assess information content for each face part (as done for the lips in Muret et al, 2022), this should be further investigated in follow-up studies.

      We added a sentence in the discussion to highlight this interesting notion: Lines 556-559: “Together with the recent evidence that lip information content is already significant in the hand area of two-handed participants (Muret et al, 2022), compensatory behaviour since developmental stages might further uncover (and even potentiate) this underlying latent activity.”

    1. Author Response

      Reviewer #1 (Public Review):

      It has previously been shown that deletion of the GluA3 subunit in mice leads to alterations in auditory behavior in adult mice that are older than a couple of months of age. The GluA3 subunit is expressed at several synapses along the auditory pathway (cochlea and brainstem), and in ko mice changes in brainstem synapses have been observed. These previously documented changes may account for some of the deficits in hearing in adult ko mice.

      In the current study, the authors investigate an earlier stage of development (at 5 wks) when the auditory brainstem responses (ABRs) are normal, and they ask how transmission persists at inner hair cell (ihc) ribbon synapses in GluA3 ko mice. They discovered that deletion of GluR3A significantly changed 1) the relative expression of Glu A2 (dramatically downregulated) and A4 subunits at SGN afferents, and 2) caused morphological changes in ihc ribbons (modiolar side) and synaptic vesicle size (pillar).

      The changes documented in the 5 wk old GluA3ko mice were not necessarily predicted because in general the mechanisms involved in shuffling GluA receptors at this synapse (or other sensory synapses) are not completely understood; furthermore, much less is known about the role of differentiation of ihc-sgn synapses along a modiolar-pillar axis. With that said, the only shortcoming of the study is a lack of explanation for the observed changes in the synaptic structure; but this is not specific to this study.

      Given the quality of the data and the clarity of presentation of results, this is a very valuable study that will aid and motivate researchers to further explore how auditory circuitry develops, and becomes differentiated, at the level of ihc-sgn synapses.

      We thank the reviewer for the positive and helpful comments. Ongoing studies are seeking to explain the observed changes in synapse structure.

      Reviewer #2 (Public Review):

      The goal of the study by Rutherford and colleagues was to characterize functional, structural, and molecular changes at the highly specialized cochlear inner hair cell (IHC) - spiral ganglion neuron (SGN) ribbon synapse in GluA3 AMPA receptor subunit knockout mice (GluA3KO). Previous work by the authors demonstrated that 2-month-old GluA3KO mice experienced impaired auditory processing and changes in synaptic ultrastructure at the SGN - bushy cell synapse, the next synapse in the auditory pathway.

      In the present study, the authors investigated whether GluA3 is required for ribbon synapse formation and physiology in 5-week-old mice using a series of functional and light- and electron microscopy imaging approaches. While deletion of GluA3 AMPAR subunit did not affect hearing sensitivity at this age, the authors reported that cochlear ribbon synapses exhibited changes in the molecular composition of AMPARs and pre- and postsynaptic ultrastructural alterations. Specifically, the authors demonstrated that GluA3KO ribbon synapses exhibit i) a global reduction in postsynaptic AMPARs, which is also reflected by smaller AMPAR arrays, ii) a reduction in GluA2 and an increase in GluA4 protein expression at individual postsynaptic sites, and iii) changes in the dimensions and morphology of the presynaptic specialization ("ribbon") and in the size of synaptic vesicles. These reported structural changes are linked to the side of innervation with respect to the IHC modiolar-pillar axis.

      The results presented by the authors are conceptually very interesting as the data support the notion that potentially detrimental changes in the molecular composition of a sensory synapse can be compensated to sustain synaptic function to a certain extent during development. The conclusions of the study are mostly well supported by the data, but some experimental details or control experiments are missing or need to be clarified to allow a full assessment.

      1) The authors tested which GluA isoforms are expressed in SGNs of GluA3KO mice and reported that only GluA2 and GluA4, and not GluA1, receptor subunits are present in the cochlear. It is, however, a bit difficult to understand why immunolabelling for GluA1 was only performed on brainstem sections (Fig. 1B right) and not in the cochlear to probe for postsynaptic localization at ribbon synapses as it was done for the other isoforms (Fig. 2 and 6) given that GluA3KO IHCs exhibited a larger number of ribbons that lacked GluA2 and 3 (lone or 'orphaned' ribbons; Fig. 6B). It is also not clear why immunolabelling for GluA2 and 4 was performed to probe for expression of these receptor subunits on SGN cell bodies in the cochlear spiral ganglion. Which neurons are expected to synapse onto these somata?

      There is precedent for expression of GluA subunits in the SGN cell bodies reflecting expression at the synapse, although it is not clear if any of that immunoreactivity reflects cell surface expression in the intact ganglion or if it represents solely intracellular subunits being trafficked to synapses.

      Figure 1b shows that GluA2 is expressed in the somata of WT mice and KO mice. The lower panels show that GluA1 is not expressed in the somata of WT or KO mice. The right panels show that while GluA1 is expressed in the cerebellum of WT and KO mice, is not expressed in the cochlear nucleus of WT or KO mice. We think this demonstrates the lack of compensation by GluA1 in the GluA3 KO.

      We have now added GluA4 immunoreactivity in the SGNs to Fig. 1, for completeness. In our experience, GluA subunits expressed at synapses are also found in the cell bodies, and GluA subunits not expressed at synapses are not found in the cell bodies. The current data is consistent with this, although we did not label GluA1 in the organ of Corti.

      2) The authors state in the text that GluA3 expression is completely abolished in GluA3KO IHCs, however, there appears to still be a faint punctate immunofluorescence signal visible when an antibody directed against GluA3 was used (Fig. 2C). Providing additional information on the specificity of this (and the other) antibodies used in the study would be helpful.

      We agree, and thank the reviewer for pointing this out. There is indeed a small signal presumably due to cross-reactivity of the anti-GluA3 with GluA2 subunits, because the cytoplasmic epitope recognized by the antibody is in a region of high similarity of GluA2 and GluA3 (Dong et al., 1997). In addition, the specification sheet of the Santa Cruz company states that the GluA3 antibody can detect GluA2. This relatively small cross-reactivity is noted now in the text on p. 9. Also, this appearance was a product of the same brightness and contrast issue noted above in the response to the editor’s summary. Upon readjustment, the signal is less apparent, because in the readjustment we used less brightness and less contrast enhancement to avoid the unwanted saturation in some of the panels.

      3) The authors reported changes in the volume of the presynaptic ribbon and postsynaptic density surface area in GluA3KO KO animals. The EM data as presented are however not sufficiently convincing.

      i) There appears to be a mismatch between the EM data shown in Fig. 3 and 4 and the information in the text with respect to the number of data points in the plots and the reported number of reconstructed synapses. This raises several questions with respect to the analysis. For instance, it is unclear whether certain synapses were reconstructed but excluded from the analysis. If so, what were the exclusion criteria?

      We thank the reviewer for pointing out this discrepancy within the text and the figures. The discrepancies are now fixed. We have added more information on how the synapses were reconstructed in the M&M (p.14-15).

      ii) The authors compare PSD surface areas in reconstructions from 3D serial sections, but for some of the shown reconstructions (i.e. Fig. 3A' and B' and 4B'), it appears as if PSDs were only incompletely reconstructed.

      We included all the ultrathin sections that show afferent dendrites with a visible PSD. We revised all the reconstructions and fixed some misalignments. The appearance of the reconstructed PSD relates to how the Reconstruct software creates the 3-D rendering. We did not use any extra software to smooth the hedges of the 3D reconstructions.

      4) The immunolabelling experiments shown in Fig. 2 and 6 are of very high quality and the quantitative analysis of the light microscopy data (Fig. 6-9) is clearly very detailed, but slightly difficult to interpret the way it is presented. Specifically, it is unclear how the number of synapses per IHC (Fig. 6B) and the separation into modiolar and pillar side (Fig. 8) was achieved based on the shown images without the outlines of individual cells being visible.

      We agree. Please see the revised Figs. 2, 6, and 8, and explanation in the figure legend of Fig. 8.

      5) Adding more detailed information about important parameters (mean, N/n, SD/SEM) and the statistical tests used for the individual comparisons presented in the Figures would help strengthen the confidence in the presented data.

      Please see the new spreadsheets accompanying the revised manuscript.

      6) In general, the authors report a series of molecular and structural changes in IHCs and reach the conclusion that GluA3 subunits may have a role in "trans-synaptically" determining or organizing the architecture of both the pre- and post-synapse. However, some of the arguments are very speculative and many of the claims are not supported by experimental data presented in the paper. The authors should consider to also compare their findings to studies that investigated ultrastructural changes of AMPAR subunit knockouts in other synapse types, and discuss alternative interpretations (e.g. homeostatic changes).

      Thank you for this comment. Considering that reviewer 1 asked for more speculation, we have decided to leave the level of speculation similar to the initial submission. However, we went through the text to make sure our claims were backed by our observations.

      Due to space constraints, rather than comparing to additional other synapses, in this context we prefer to compare with auditory brainstem synapses.

      The possibility of homeostatic changes we now added on p. 29.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Winter and colleagues define the sensitivity of cancer cells lacking the mitochondrial AAA+ ATAD1 to proteasome inhibition. They show that ATAD1 is often co-deleted with PTEN¬ in many different types of cancer. Using two complementary CRISPR screens in two distinct cell models, they identified the mitochondrial E3 ubiquitin ligase MARCH5 as a gene whose deletion is synthetically lethal with ATAD1. Since MARCH5 was previously reported to function to attenuate apoptotic signaling through mechanisms including promoting degradation of pro-apoptotic factors including BIM1, they sought to define the specific role of ATAD1 in regulating pro-apoptotic factor. They present evidence that ATAD1 extracts the pro-apoptotic protein BIMEL from mitochondria to facilitate its inactivation by mechanisms including degradation and inhibitory phosphorylation - a mechanism that appears enhanced during proteasome inhibition. This suggested that ATAD1-deficient cells could be preferentially sensitive to proteasome inhibitors. Consistent with this, expression of ATAD1 in ATAD1deficient cells decreases sensitivity to proteasome inhibition. Similarly, depletion of ATAD1 in PC3 cells increased sensitivity to proteasome inhibition in xenografts, although somewhat curiously a corresponding increase in BIM was not readily observed (NOXA levels did increase). Finally, the authors show that prostate cancer patients with combined PTEN1/ATAD1 deletion show improved survival as compared to tumors where PTEN1 was deleted alone. Ultimately, these results support a model whereby ATAD1 promotes tumor cell survival and highlights that ATAD1 deletion may represent a vulnerability that can be exploited to treat tumors through the use of proteasome inhibitors.

      Overall, this is an interesting and generally well-performed study that defines the mechanistic and functional implications of a genetic 'hitchhiker' in the context of cancer cell survival. The synthetic lethality for ATAD1 and MARCH5 observed using two different genetic approaches (deletion/overexpression) in two different cell models underscores a strong link between these two genes. Further, the data showing an important role for ATAD1 in regulating BIM mitochondrial localization/cytosolic phosphorylation are interesting. The evidence demonstrating relationships between ATAD1 and proteasome sensitivity is also convincing. However, there are some weaknesses. For example, the direct relationship between ATAD1-dependent prosurvival activities and BIM is not clearly defined. This is evident as BIM1 depletion did not influence ATAD1-deficient PC3 cells' sensitivity to bortezomib and BIM was not significantly impacted in the xenograft models. BIM deletion did partially rescue synthetic lethality in Jurkat cells deficient in both MARCH5 and ATAD1, indicating a potential role in those cells. While the authors do address this, these results do create a disconnect within the studies that complicates the overall interpretation, as the specific importance of BIM regulation by ATAD1 in different models is not consistent or always clear. Regardless, this study does reveal new insights into the genetic relationship between ATAD1 deficiency and proteasome inhibition that could have direct therapeutic potential to improve the treatment of patients. Further, considering that the anti-apoptotic roles for ATAD1 appear to extend beyond BIM regulation, this will open new avenues for investigation of the underlying molecular mechanisms whereby ATAD1 contributes to regulating apoptotic signaling in cancer and other models. With that being said, tempering the writing to better highlight that BIM regulation does not explain the ATAD1 protection observed across cancer cell models (it is the case in some, but not all) would be helpful. While there is value in the new mechanistic insight provided into the potential mechanism of ATAD1-dependent apoptotic regulation, more focus on the specific relationship between ATAD1 deficiency and proteasome inhibitor sensitivity would better suit the current work.

      Reviewer #2 (Public Review):

      This manuscript by Winter et al represents an analysis of the function of the ATAD1 gene in cancer. At present, the manuscript makes a number of interesting observations, with strong experimental support. First, the authors show that tumors with PTEN deletions frequently have additional mutations in ATAD1, and that prostate tumors with both mutations are associated with a shorter period of survival. Second, tumors lacking ATAD1 are more sensitive to proteotoxic stress, based in part on an increased tendency to apoptosis. Third, the ATAD1 protein interacts with BIM, and interactions with BIM contribute in part to an increased tendency to apoptosis. Fourth, ATAD1 and MARCH5 have at least moderate synthetic sick/lethal interactions; together with other data, this suggests they control the release of BIM from the OMM, contributing to its degradation. Overall, the data suggest that tumors with ATAD1 deletions may be particularly vulnerable to drugs that induce proteotoxic stress, suggesting new potential therapeutic regimens, which would be a valuable contribution to the field. The level of data presented here is already substantial; however, some additional experiments to support the authors' contentions would strengthen the work. Some claims about the mechanism are overstated given the current body of data and should be qualified.

      First, we thank the reviewers and editors for considering our work and providing insightful critiques. We are also grateful that our prior reviews from another journal were considered as part of a holistic review. Overall, we have rewritten key aspects of the manuscript to emphasize strengths pointed out by the reviewers (the relationship between the proteasome and ATAD1) while de-emphasizing the claims surrounding ATAD1 and BIM. Specifically, we added a new paragraph to the discussion section to help focus the reader on how loss of ATAD1 sensitizes cells to ubiquitin proteasome system (UPS) dysfunction and describe the implications thereof. We also removed a paragraph from the discussion that may have put undue emphasis on BIM. Lastly, we reconfigured our schematic figure (Fig 4F) to describe a model in which ATAD1 and the UPS represent two parallel pathways of dealing with proteins on the OMM, where loss of one pathway increases dependency on the other. We believe that BIM is an important piece of this story, and clearly demonstrate that ATAD1-dependent extraction of BIM partly explains the synthetic lethality of ATAD1 and MARCH5. However, we agree with the reviewers that to focus too much on BIM detracts from the more general thesis of the work, as described above. We added another paragraph to the discussion that describes limitations of the study, to explicitly outline what our manuscript does and does not demonstrate.

    1. Author Response

      Reviewer #1 (Public Review):

      With a real interest, I read the manuscript entitled "Sex-specific effects of an IgE polymorphism on immunity susceptibility to infection and reproduction in a wild rodent", written by Wanelik and colleagues. Actually, I am impressed with each and every part of this work. This study is very well designed and answers intriguing scientific questions. The study is multilayer and multidimensional and goes far beyond a genomic association as it deeply addresses, to mention only those most important, ecological, parasitological, immunological, and gene expression aspects. In addition to studying the free-living animal community of voles, it utilizes this opportunity to get some insights into the genetics and biology of the high-affinity IgE receptor not possible to be gained in studies performed in humans or standard laboratory animals. The data are presented in a very elegant way and the article is really nicely written.

      We thank the Reviewer for these positive comments, and are very glad to hear they think our work is so comprehensive.

      Reviewer #2 (Public Review):

      In this manuscript, Wanelik et al. use a wild rodent population to test if a polymorphism in a receptor for immunoglobulin E (IgE) affects immune responses, resistance to infection, and fitness. Finding such effects would imply that polymorphisms in immune genes can be maintained by antagonistic pleiotropy between sexes, which has important implications for our understanding of how genetic variation is maintained. The work presented here extends previous work by the same group where they have shown that expression of GATA3 (a transcription factor inducing Th2 immune responses) affects tolerance to ectoparasites and that polymorphism in Fcer1a affects the expression of GATA3. The present study is based on a fairly large data set and comprehensive analysis of a number of different traits. Indeed, the authors should be commended for investigating all steps in the chain polymorphism→immune response→resistance→fitness. Unfortunately, the presentation of the methodology is a bit confusing. Moreover, most of the key results are only marginally significant.

      We thank the Reviewer for their positive feedback, and are very glad to hear they think our work is so comprehensive. As detailed below, we have tried to clarify our methodology and to temper our claims in the revised manuscript.

      As regards methodology, I was confused by the differential expression (DE) analyses presented in fig 1A. First, it took a while to understand that these were based on a comparison of unstimulated cells (i.e. baseline expression), not ex vivo stimulated cells; this should be made explicit in conjunction with the presentation of the results. Second, it would be good to clarify (and motivate) in the Results that you compare individuals with at least one copy of the GC haplotype against the rest, i.e. a dominant model.

      We apologise for the confusion. We now explicitly state in the Results (lines 313-314) that the DGE analysis was based on unstimulated splenocytes: “Differential gene expression (DGE) analysis performed on unstimulated splenocytes taken from 53 males and 31 females assayed by RNASeq”. We also explicitly state “Unstimulated immune gene expression” in the legend for Figure 1.

      Please note that an additive model was used for all analyses run using the hapassoc package (macroparasites and SOD1). A dominant model was used in the DGE analysis and in other analyses where it was not possible to use the hapassoc package (gene expression assayed by Q-PCR, microparasites and reproductive success) which meant that only those individuals for which haplotype could be inferred with certainty could be included (i.e. a smaller dataset). In this case, a dominant model was used. Our use of the dominant model in the DGE analysis is now more explicitly explained on lines 933-935: “Only those individuals for which haplotype could be inferred with certainty could be included (n = 53 males and n = 31 females; none of which were known to have two copies of the GC haplotype hence the choice of a dominant model).” And its use in other non-hapassoc analyses is now explicitly stated on lines 991-992: “as in the DGE analysis, genotype was coded as the presence or absence of the GC haplotype (i.e. a dominant model)”.

      The first key result is that polymorphisms in Fcer1a have sex-specific effects on the expression of pro- and anti-inflammatory genes in males and females. However, the GSEA analyses (fig 1A) show that the GC haplotype has positive effects on the expression of both pro- and anti-inflammatory gene sets in both sexes - albeit with a stronger effect of proinflammatory genes in males and anti-inflammatory genes in females - but there is no formal evidence for an effect of genotype by sex. I am not sure how to test for interaction with GSEA (or if it is at all possible), so it would be good to complement the GSEA with other analyses (perhaps based on PCA?) of these data to provide more formal evidence for an effect of genotype by sex.

      It is not possible to provide formal evidence for an effect of genotype by sex in the DGE analysis/GSEA. Instead, we have tried to temper our claims about sex-specific effects (please see below for further details).

      Some more evidence of a sex-specific effect of Fcer1a genotype is actually provided by analyses of the expression of 18 immune genes in ex vivo stimulated T cells. Here, a sex-specific effect of Fcer1a genotype was found on the expression of one of 18 measured immune genes, the cytokine IL17a. However, Fcer1a is as far as I am aware not expressed by T cells, so the relevance of these results is unclear. Moreover, it is unclear why these 18 genes were analyzed one by one, rather than by some multidimensional approach (e.g. PCA).

      The Reviewer is right that Fcer1a is not generally considered to be expressed by T cells. However, the stimulation could have indirect effects. We have clarified this on lines 801-804: “Although Fcer1a is not expressed by T-cells themselves, polymorphism in this gene could be acting indirectly on T-cells through various pathways, including via cytokine signalling, following expression of Fcer1a by other cells”.

      The 18 immune genes were specially selected because they represent different immune pathways and are expected to have limited redundancy. This is why individual tests were performed (followed by a correction for multiple testing) rather than using a multidimensional approach like PCA. This is now explicitly explained in the Methods on lines 804-808: “The choice of our panel of genes was informed by…(iii) the aim of limited redundancy, with each gene representing a different immune pathway” and on lines 1031-1032: “We did not use a multidimensional approach (such as principal component analysis) because of limited redundancy in our panel of genes.” and in the Results on line 363-366: “we used an independent dataset for males and females whose spleens were stimulated with two immune agonists and assayed by Q-PCR (for a panel of 18 immune genes with limited redundancy); see Methods for how these genes were selected.”

      The second key result is that Fcer1a genotype has sex-specific effects on resistance to parasites, but this is based on a marginally significant effect as regards one of three tested pathogens.

      We acknowledge that this is a marginally significant result and have acknowledged this in the text on line 428 of the Results section.

      The third key result is that Fcer1a genotype has sex-specific effects on reproductive fitness. However, this is based on a marginally significant effect in males only, and a formal test for sex by genotype could not be performed (and since the direction of the effect was similar in females it is doubtful whether there would be an effect of sex by genotype; see fig 1C).

      Thus, while the results presented here are clearly indicative of sex-specific effects of an immune gene polymorphism, I think it is too early to actually claim such effects.

      We understand the Reviewer’s concerns about the overall lack of formal evidence for an effect of genotype by sex. As we are not able to provide this for the DGE analysis, GSEA (see above), or for the reproductive success analysis, we have tempered our claims about sex-specific effects (as suggested by the Reviewer). We have done this by removing the term “sex-specific effect” throughout the manuscript, including in the title. We now focus more heavily on the multiple effects we have shown across different phenotypic traits, and use the term “sex-dependent effects” or describe effects as “differing between sexes” sparingly, and only where necessary. These changes have been made throughout the manuscript, but more so in the introduction where the narrative has been substantially reworked to lay out this change in focus.

      Reviewer #3 (Public Review):

      This is a well-replicated study: the authors sampled over a thousand field voles (Microtus agrestis), over three years at seven different sites, with a combination of cross-sectional and longitudinal sampling. The authors compared individuals carrying the GC haplotype (<10% of the population) of the high-affinity immunoglobulin receptor gene (Fcer1). They recorded parasite infections (Babesia, Bartonella, ticks, fleas, gastrointestinal helminths), expression levels of inflammatory and immune genes using transcriptomes and quantitative PCR, and genotype and pedigree.

      We thank the Reviewer for their positive feedback, and are very glad to hear they think our work is well replicated.

      A comparison of overall gene expression between GC-carrying and all other voles indicated two sex-dependent differences, the expression in males of Il33, which is associated with antihelminthic responses, and in females of Socs3, which is implicated in regulating immune responses. One substantial issue with the authors' interpretation of these data is to attribute Il33 to the inflammatory response - this taints the rest of their interpretation (e.g., Fig 1A, see below); instead, this is a key cytokine of the antihelminthic Th2 response and its detection suggests there might be a difference in helminth infection between the haplotypes - which is consistent with the role of IgE. Therefore, the authors would need to explore further how the GC haplotype, IgE, and parasite burdens might be driving the expression of IL-33. Specifically, the authors did not control for potential confounding effects of infection, which might be expected to differ based on the rest of their data.

      We acknowledge the difficulty in grouping genes under single GO terms, and the need for more nuance when describing these classifications. No gene set is perfect and immune networks are highly complex, so the same gene can be grouped into multiple gene sets. IL33 is an example of this – it appears in the GO term GO:0050729 (positive regulation of inflammatory response) but, as the Reviewer points out, is also commonly associated with the antihelminthic Th2 response. We have edited the text in the Results (on lines 322-324 and lines 350-352) to communicate this nuance, as well as adding references to support each of these associations: “Il33 is commonly associated with anti-helminthic response [25] and Socs3 with regulation of the immune response more broadly [26]….Both Il33 and Socs3 also share an association with the inflammatory response [26,27]. While Il33 positively regulates this response (appearing in the gene set GO:0050729), Socs3 negatively regulates it (GO:0050728).” References added:

      1. Liew FY, Pitman NI, McInnes IB. Disease-associated functions of IL-33: The new kid in the IL-1 family. Nat Rev Immunol. Nature Publishing Group; 2010;10: 103–110. doi:10.1038/nri2692
      2. Carow B, Rottenberg ME. SOCS3, a major regulator of infection and inflammation. Front Immunol. 2014;5: 1–13. doi:10.3389/fimmu.2014.00058
      3. Cayrol C, Girard JP. IL-33: An alarmin cytokine with crucial roles in innate immunity, inflammation and allergy. Curr Opin Immunol. Elsevier Ltd; 2014;31: 31–37. doi:10.1016/j.coi.2014.09.004

      We have also run an extra DGE analysis including cestode burden as a covariate (cestodes being the most prominent helminth infection in terms of biomass), to check whether IL33 still emerges as a top-responding gene in males (see Appendix 1-table 4 & 5). We found that it did (in fact the signal was even stronger), indicating that the differences in Il33 expression are not being driven by differences in cestode infection. We now mention this additional analysis in the text: “Given the link between Il33 and the antihelminthic response (and more generally, IgE-mediated responses and the antihelminthic response), we repeated the DGE analysis while controlling for cestode burden, but this had little effect on our results (same top-responding immune genes; see Appendix 1—table 4 & 5), suggesting that these effects were not driven by differences in cestode infection”. This is consistent with our finding that there is no difference in macroparasite burden (including cestode burden) between individuals with and without the GC haplotype (see Appendix 1—table 11) and lines 449-451: “However, we found no effect of the haplotype (interactive or not) on the probability of infection with the other parasites in our population”.

      We have also included the following caveat in our discussion on lines 540-542: “Some of the differences in immune phenotype that we observed may also be driven by difference in parasite infection (although we accounted for cestode burden in a follow-up analysis, we cannot rule this out).”

      Among a narrow panel of immune genes measured in ex vivo settings, the authors reported elevated expression of Il17a, which is associated with inflammatory, antibacterial responses. Of note, the panel of genes they measured did not contain antihelminth effectors beyond the transcription factor GATA3, and therefore could not confirm the expression of IL-33 observed in the transcriptomes. However, the expression of IL-17a appears consistent with the elevated activity of antioxidant SOD1.

      In response to this comment, we now point out more clearly that our panel of genes did not include Il33 or Socs3, but did include other inflammatory genes including Il17a, Ifng, Il1b, Il6 and Tnfa.

      Somewhat unexpectedly given the authors' claim that in males the GC haplotype is prone to a more inflammatory immune phenotype, it had no effect on infection in that sex. However, the identity of the genes and pathways matter and the authors do not provide sufficient detail to evaluate their interpretation (GSEA analysis and Figure 1A).

      Barcode plots, such as the one we include in Figure 1A, are commonly used representations of GSEA results. In order to aid interpretation for those who are unfamiliar with barcode plots, we have included some more information in the legend of Figure 1.

      An intriguing and potentially important finding is that males carrying the GC haplotype appeared to have fewer offspring (little to no effect detected in the females). To confirm whether the effect of the haplotype is direct or mediated by other factors, it would be useful to test how other covariates, like infection, might contribute to this.

      To explore this possibility, we have run extra GLMs for both females and males which include two parasite variables: proportion of samples taken from an individual that tested positive for Babesia and proportion of samples taken from an individual that tested positive for Bartonella. We found no difference in the main results – males with the GC haplotype still have fewer offspring, suggesting that infection is not acting as a confounder.

    1. Author Response

      Reviewer #1 (Public Review):

      1) Fig 6E shows that CAPE1 is released only upon Fol infection. This appears to contradict with the notion that FolSpv1 prevents CAPE1 release. However, Fol strain overexpressing FolSpv1 prevented the release of CAPE1. It is necessary to compare WT and the mutant strain in which the FolSvp1 gene is deleted. One would expect that the mutant strain induces significantly more CAPE1 release. Similarly, mutant strain complemented with the nls1 construct needs to be tested to see whether nuclear localization is required for preventing CAPE1 release.

      Thank you for the good suggestions! According to the revision policy of eLife in response to COVID-19, we stated in the Discussion section that FolSpv1-mediated translocation of SlPR1 into the nucleus impedes CAPE1 release needs to be further strengthened with additional data in the revised manuscript (lines 441-444). We would like to perform the suggested experiments in future studies.

      2) SlPR1 is localized in the apoplast in a manner dependent on the signal peptide (Fig 5-figure supplement 1). Overexpression of SlPR1 with added NLS but lacking the signal peptide failed to enhance disease resistance to Fol infection (Fig 7G). What about overexpression of SlPR1 lacking the signal peptide without the added NLS? Does retention of SlPR1 in the cytoplasm sufficient to abolish its function? It is not even discussed why SlPR1 has to be in the nucleus to prevent CAPE1 release.

      Thank you for these suggestions! We have discussed the possibility that binding of FolSvp1 to SlPR1 may inhibit the function of the latter in the cytoplasm and stated that additional experiments are required in future studies in the revised manuscript (lines 436-444).

      3) FolSvp1 carrying the PR1 signal peptide interacted with SlPR1 in the apoplast (Fig 6D and Fig 6-figure supplement 2). Why weren't these proteins translocated into the nucleus? These seem to contradict the in vitro uptake data. It seems that either no or only a very small proportion of SlPR1 transiently expressed in tobacco cells is located in the nucleus. Fig 7C shows that infection of the WT strain, but not the nls1 mutant strain, allowed detection of SlPR1 in the nucleus of tomato cells. However, it is not clear how much of SlPR1 remain in the apoplast or cytoplasm. Is the FolSpv1 protein secreted by Fol sufficient to translocate a significant portion of SlPR1 into the nucleus? The authors are suggested to examine apoplastic and cytoplasmic protein fractions for the relative amounts of SlPR1 after Fol infection.

      Thank you very much for this constructive point! The observations of FolSvp1 and SlPR1 interaction in both the apoplast and the nucleus of N. benthamiana leaves suggest that binding of FolSvp1 to SlPR1 may inhibit its anti-fungal activity and/or the cleavage of SlPR1 to produce CAPE1 in the extracellular region or even the cytoplasm. In addition, the BiFC assays performed with N. benthamiana leaves might not completely mimic the physiological conditions. Therefore, whether FolSpv1-mediated translocation of SlPR1 into the nucleus impedes CAPE1 release is the only way of PR1 inactivation needs to be further strengthened with additional data in future studies. We have added these information to the revised manuscript (lines 436-444).

      4) Fig 7J and 7K, a better experiment would be to pretreat WT tomato plants with CAPE1 prior to inoculation with WT and FolSpv1 OE strains. The pretreatment should eliminate the virulence function of FolSpv1 OE if the virulence is solely dependent on the prevention of CAPE1 release.

      Thank you for this suggestion! We have stated in the Discussion section that FolSpv1-mediated translocation of SlPR1 into the nucleus impedes CAPE1 release needs to be further strengthened with additional data in the revised manuscript (lines 441-444). It will be of considerable interest to perform the suggested experiments in future studies.

      Reviewer #2 (Public Review):

      1) As far as I know, the apoplastic PR1 proteins may have a fungicide activity. When the authors tested the interaction between FolSvp1 and SlPR1 in Nicotiana benthamiana by BiFC, both apoplastic and nuclear interactions could be detected. Therefore, the authors should discuss the possibilities whether the binding of FolSvp1 to SlPR1 remained in the apoplast can inhibit (i) its anti-Fol activity and (ii) the cleavage of SlPR1 to produce the CAPE1 peptide. In other words, although translocating SlPR1 to the nucleus by FolSvp1 is effective for suppressing CAPE1 production, this may not be the only way.

      Thank you very much for this constructive point! The observations of FolSvp1 and SlPR1 interaction in both the apoplast and the nucleus of N. benthamiana leaves suggest that binding of FolSvp1 to SlPR1 may inhibit its anti-fungal activity and/or the cleavage of SlPR1 to produce CAPE1 in the extracellular region or even the cytoplasm. Therefore, whether FolSpv1-mediated translocation of SlPR1 into the nucleus impedes CAPE1 release is the only way of PR1 inactivation needs to be further strengthened with additional data in future studies. According to the revision policy of eLife in response to COVID-19, we have added these information to the revised manuscript (lines 436-444).

      2) The FolSvp1 produced in N. benthamiana was using the SlPR1 signal peptide and lacked the acetylation modification. It is possible that the acetylation of FolSvp1 can affect the interaction affinity or localization between FolSvp1 and SlPR1. The K167Q mutation of FolSvp1 might not be able to faithfully mimic the K167 acetylation.

      Thank you for this suggestion! It’s true that the BiFC assays performed with N. benthamiana leaves might not completely mimic the physiological conditions. We have discussed this possibility in the revised manuscript (lines 439-444).

    1. Author Response

      Reviewer #1 (Public Review):

      This paper describes longitudinal MRI measurements of "grey matter volume" (GMV) and "white matter volume" (WMV) in the brains of mice that were trained in a well-established one-pawed reaching/grasping paradigm for fine-motor skill learning. GMV/WMV ratio is presumed to reflect the extent to which axons in the region of interest are ensheathed by water-poor myelin membrane ("myelinated"). The conclusion is that WMV increases during learning in several task-related brain regions such as the primary motor cortex and somatosensory cortex, as well as a number of regions that are not so obviously task-related. Parallel decreases in GMV were observed. No change in overall cortical volume was detected so the conclusion is that some intra-cortical axons become myelinated in response to motor learning - supporting the idea of "adaptive myelination" proposed by others. Supporting histochemical evidence is provided (quantitative myelin immunolabelling). The MRI changes observed did not occur in a simple linear or cumulative fashion during learning, but rather increased in a non-linear asymptotic way, or even peaked and decreased again during training ("quadratic"). This is an interesting and useful study that takes us a little closer to understanding what is going on in the brain during learning and memory formation and continues the development of MRI as a useful non-invasive tool for studying the contribution of myelin to these processes.

      Specific points:

      1) "Grey matter" and "white matter" are normally used to describe spatially distinct brain regions that are sparsely myelinated (grey) or heavily myelinated (white), for example, the cerebral cortex (grey) and underlying subcortical axon tracts (white). However, most or all regions are described here as white matter within the classical grey matter - within the motor cortex, for example. Classical white matter regions such as corpus callosum do not get a mention. Presumably, the authors' use of the terms grey and white matter refer to specific MRI signals that are designed to pick up relatively water-rich or water-poor domains that are presumed to reflect the abundance of myelinated versus unmyelinated fibers, not necessarily the classic anatomical grey or white matter. However, this is confusing. Is it possible to change the terminology from grey and white matter to myelin-rich and myelin-poor, water-poor and water-rich, or something similar? At the very least it requires a better explanation.

      We thank this reviewer for bringing up this point and apologize for the confusion. In the revised version of the manuscript, we now present higher-magnification of the images that were used to quantify MBP immunoreactivity (densitometry) (see Main Figure 5-Supplementary Figure 3 in the revised version of the manuscript). In addition, new immunohistochemical experiments were performed and a second method was used to investigate myelinated axons within the cortex. Coronal sections were immunolabeled for myelin basic protein (MBP) and high-resolution confocal imaging was performed on a subset of trained mice (n=12 mice, n=108 probes, 9 probes per animal, represented in Main Figure 6-Supplementary Figure 1 in the revised version of the manuscript). We acquired Z-stacks with a minimum of 30 optical sections and performed an analysis of fibers based on a quantitative 3D immunohistochemical method (3D-QICH) to reconstruct and analyze length density, diameter and volumetric fraction of myelinated axons. This method of analysis of fibers was first implemented to measure vascularity (Fouard et al., 2006) which was later developed further and validated for the systematic analysis of axons (Hamodeh et al., 2010; Hamodeh et al.,2014; Hamodeh et al., 2017). The method employed for the 3D-reconstruction and analysis of myelinated axons is explained in detail in the Material and Methods section of the revised manuscript. There a significant increase in the length density of myelinated axons from baseline to experimental day 6 followed by a significant decrease towards baseline levels at experimental day 14 (one-way ANOVA, F2,7 = 8.249, P < .05; Fig. 6B), following a quadratic model rather than a linear one (AIC > 2).

      Fouard, C., Malandain, G., Prohaska, S., & Westerhoff, M. (2006). Blockwise processing applied to brain microvascular network study. IEEE Trans.Med Imaging, 25(10), 1319-1328.

      Hamodeh, S., Eicke, D., Napper, R. M. A., Harvey, R. J., & Sultan, F. (2010). Population based quantification of dendrites: evidence for the lack of microtubule-associate protein 2a,b in Purkinje cell spiny dendrites. Neuroscience, 170(4), 1004-1014. doi:10.1016/j.neuroscience.2010.08.021

      Hamodeh, S., Sugihara, I., Baizer, J., & Sultan, F. (2014). Systematic analysis of neuronal wiring of the rodent deep cerebellar nuclei reveals differences reflecting adaptations at the neuronal circuit and internuclear level. J Comp Neurol, 522, 2481-2497.

      Hamodeh, S., Bozkurt, A., Mao, H., & Sultan, F. (2017). Uncovering specific changes in network wiring underlying the primate cerebrotype. Brain Struct Funct, 222(7), 3255-3266. doi:10.1007/s00429-017-1402-6

      2) Several previous studies of motor learning in rodents, both MRI- and histology-based, have identified structural alterations and/or changes to oligodendrocytes and myelin in the corpus callosum underlying the motor cortex. In general, those white matter alterations were proportionally greater than those detected within the cortex itself. However, the present study apparently did not find significant MRI signal changes in sub-cortical white matter, which is surprising. Was this because the MRI sequences were not optimized for classical "white matter", or because the white matter was specifically excluded from the analysis (masked out)? If the latter, why was sub-cortical white matter excluded from the analysis? This needs discussion and explanation.

      We thank this reviewer for bringing up this critical point. As mentioned above in point #4 to the Editor, significant increases in WMV were observed on the whole-brain level in many areas of WM in the brain (also see Main Figure 2-Supplemnetary Figure 3). For whole-brain analyses, all subcortical white matter regions were included in the analysis of WMV. Table 1 in the revised version of the manuscript indicate the significant changes and the direction of these changes: decreases in GMV (Main Figure 2A) and increases in WMV (Main Figure 2B). Significant changes were found in WMV, but these were not represented in the Figures originally presented. Instead, we chose to depict significant changes at PFDR corr < 0.01 for increases in WMV and PFDR corr < 0.001 for decreases in GMV, due to the high number of significant voxels at PFDR corr < 0.05, for both WMV and GMV. The Figure in point #4 to the Editor (new Main Figure-Supplementary Figure 4) depicts significant increases in WMV according to the asymptotic model at PFDR corr < 0.05. Clear changes are observed in subcortical WMV, however, we chose to present higher thresholded results (PFDR corr < 0.01) to present the more discrete clusters of increases in WMV together with the more discrete clusters of decreases in GMV at PFDR corr < 0.001.

      3) The quantitative MBP immunolabelling is a crucial piece of supporting evidence for the suggestion that MRI signal changes reflect adaptive myelination. What was the baseline against which immunoreactivity was measured? What did the fluorescence labelling look like at higher magnification - can individual myelin sheaths be distinguished, for example, and could these sheaths be counted, to complement and reinforce densitometry? Higher-mag images should be included in a revision.

      We thank this reviewer for these questions. Baseline measurements of myelin immunoreactivity were quantified in brain sections from food-restricted mice that never underwent behavioral training, represented as experimental day 0 in Main Figure 5C, Main Figure 6A-C, Main Figure 5-Supplementary Figure 2B. We also evaluated myelin immunoreactivity in non-trained control mice; mice that were food-restricted and placed into the training cage during the 15 experimental days, yet the daily ration of food pellets was provided on the floor of the cage rather than the shelf of the training cages. These data are represented in Main Figure 5-Supplementary Figure 2A and 2B.

      In the revised version of the manuscript, we have included a higher magnification image of a representative section (see below and as Main Figure 5-Supplementary Figure 3) to depict the area for which MBP-immunoreactivity was quantified. Individual myelinated axons can be appreciated in areas of cortex or striatum with limited myelinated axons. Yet due to the dense plexus of myelinated axons in cortical areas where significant VBM clusters were observed, it was not possible to identify and count individual myelinated axons within 20-micron thick histological sections using fluorescence light microscopy. To complement and reinforce our observations from MBP densitometry, we performed additional immunohistochemical labeling in subsequent coronal brain sections and used confocal laser scanning microscopy to be able to distinguish individual myelinated axons. As mentioned in answer #1 to editor, we acquired Z-stacks with a minimum of 30 optical sections and performed an analysis of fibers based on a quantitative 3D immunohistochemical method (3D-QICH) to reconstruct and analyze length density, diameter and volumetric fraction of myelinated axons. The method employed for the 3D-reconstruction and analysis of myelinated axons is explained in detail in the Material and Methods section of the revised manuscript. There a significant increase in the length density of myelinated axons from baseline to experimental day 6 followed by a significant decrease towards baseline levels at experimental day 14 (one-way ANOVA, F2,7 = 8.249, P < .05; Fig. 6B), following a quadratic model rather than a linear one (AIC > 2). This new data is now presented in Main figure 6 in the revised version of the manuscript and confirm our observations from densitometry of adaptive myelination during learning.

      Reviewer #2 (Public Review):

      This study uses a well-established reaching task to assess the effect of learning on cortical structures as assessed by MRI in mice. The results show a decrease in grey matter (GM) and an increase in white matter (WM) volumes that appear to peak at experimental day 8, falling slightly thereafter.

      This is an interesting addition to the literature around myelination changes associated with learning/activity (adaptive myelination). However, it requires significant additional analysis. The correlation between imaging and histology is critical, but the only measure used here is MBP immunoreactivity. This is insufficient, as MBP can be expressed by newly-formed oligodendrocyte cell bodies, by their processes, and by the myelin sheath they form; but only the latter is relevant to function. So, a much more detailed analysis of oligodendrocyte morphology and myelin sheath number/size is required. This analysis needs to distinguish different layers of the cortex. This is easy for the superficial layers where myelination is sparse but much more difficult in the more heavily myelinated deeper layers. Here, counting nodes of Ranvier by Caspr immunostaining provides a good proxy. Ideally, both sheath number and sheath length would be analysed, but I accept that most studies point to number rather than changes in length as being the key changes in adaptive myelination. Then, the critical precise correlation of imaging changes with myelin sheath number can be made and the conclusion that the MRI changes represent physiologically significant changes in myelination becomes more solid.

      We thank this reviewer for bringing up their suggestions to improve our manuscript. In the revised manuscript, we have now addressed which cortical layers demonstrate significant changes in GMV and WMV (new Main Figure 4 in the revised manuscript) and we have now included an additional series of experiments to further quantitate myelinated axons in somatosensory cortex for the forelimb.

      We acknowledge that MBP can be expressed in newly-formed oligodendrocyte cell bodies, by their processes, and by the myelin sheath they form. For this reason, we complemented the densitometry now presented in Main Figure 5 of the revised manuscript with a confocal-based analysis of myelin sheath/myelinated axons. The latter is presented in Main Figure 6 of the revised manuscript and further supports adaptive changes in intracortical myelin during learning. Using confocal microscopy, in combination with the quantitative analysis of fibers by using a function in Amira software for fiber skeleton reconstruction, significant changes were observed in length density. In the revised discussion we have stated that changes in the length density of myelinated axons reflect both changes in length and in number, or density, of myelinated axons in somatosensory cortex for the forelimb. Our analysis also quantitated the diameter of myelinated axons, for which we observed a decrease at experimental day 6 followed by an increase in diameter at experimental day 14, albeit these changes did not reach significance. We added in the revised discussion a paragraph hypothesizing that an increase in length density combined with a putative decrease in the diameter of myelinated axons at experimental day 6 could indicate the appearance of new myelinated axons (novel candidate circuits). Afterwards, during the consolidation phase of learning, optimal candidate circuits may be selected and refined, for which putative increases in myelin sheath diameter may occur. However, to further understand changes in myelinated axons with learning, future studies should focus on a longitudinal in vivo evaluation of individual myelinated axons.

      Due to the dense plexus of myelinated axons in cortical areas where significant VBM clusters were observed, these deeper layers are challenging to quantitate adaptive changes in individual myelinated axons and/or nodes of Ranvier by the use of Caspr immunoreactivity in the 20-µm thick histological sections generated by our dataset. These heavily myelinated deeper layers are also challenging to quantitate adaptive changes in myelination using, for example, longitudinal in vivo measurements by two-photon microscopy since this technique is typically limited to imaging more superficial depths (300–400 µm) of cortex. The focus of this manuscript was to demonstrate that white matter volume in somatosensory cortex significantly correlates with myelin immunoreactivity, to support the hypothesis that myelin is a component of non-linear structural changes observed by longitudinal voxel-based morphometry during learning. We are planning a future study are to determine a physiological correlate to the changes we present in this manuscript using fiber photometry and multielectrode recordings during learning.

    1. Author Response:

      We thank the reviewers for their thoughtful comments. We would like to respond to one point made by reviewer 2. We agree with the recommendations of this reviewer for improving the manuscript, including additional studies in non-transformed cells. However, we would also like to clarify one point. Reviewer 2 stated that “results in Figure 4C indicate that total STAT1 is completely localized in the nucleus even prior to interferon stimulation when it should be in the cytoplasm.” Figure 4C uses proximity ligation assays to show that the interaction of STAT1 with DUX4-CTD occurs in the nucleus, at a lower level without interferon and a higher level with interferon, but does not measure the distribution of total STAT1. Supplemental Figure S3A/3B shows a combined cytoplasmic and nuclear distribution of STAT1 without interferon treatment and shows increased nuclear STAT1 with interferon treatment, as would be expected in cells with an intact signaling pathway, although we also agree that the presentation of this finding can be improved with additional images that specifically address this point. Again, we thank both reviewers for their careful reading and helpful comments on our study.

    1. Author Response

      Reviewer #1 (Public Review):

      Iyer et al. address the problem of how cells exposed to a graded but noisy morphogen concentration are able to infer their position reliably, in other words how the positional information of a realistic morphogen gradient is decoded through cell-autonomous ligand processing. The authors introduce a model of a ligand processing network involving multiple ”branches” (receptor types) and ”tiers” (compartments where ligand-bound receptors can be located). Receptor levels are allowed to vary with distance from the source independently of the morphogen concentration. All rates, except for the ligand binding and unbinding rates, are potentially under feedback control. The authors assume that the cells can infer their position from the output of the signalling network in an optimal way. The resulting parameter space is then explored to identify optimal ”network architectures” and parameters, i.e. those that maximise the fidelity of the positional inference. The analysis shows how the presence of both specific and non-specific receptors, graded receptor expression and feedback loops can contribute to improving positional inference. These results are compared with known features of the Wnt signalling system in Drosophila wing imaginal disc.

      The authors are doing an interesting study of how feedback control of the signalling network reading a morphogen gradient can influence the precision of the read-out. The main strength of this work is the attention to the development of the mathematical framework. While the family of network architectures introduced here is not completely generic, there is enough flexibility to explore various features of realistic signalling systems. It is exciting to find that some network topologies are particularly efficient at reducing the noise in the morphogen gradient. The comparison with the Wnt system in Drosophila is also promising.

      Major comments:

      1) The authors assume that the cell estimates its position through the maximum a posteriori estimate, Eq.(5), which is a well-defined mathematical object; it seems to us however that whether the cell is actually capable of performing this measurement is uncertain (it is an optimal measurement in some sense, but there is no guarantee that the cell is optimal in that respect). Notably, this entails evaluating p(theta), which is a probability distribution over the entire tissue, so this estimate can not be done with purely local measurements. Can the authors comment on this and how the conclusions would change if a different position measurement was performed?

      This is indeed an important question. Our viewpoint is that if the cells were to use a maximum a posteriori (MAP) estimate (Eq. 5) to decode their positions, then what features of the channel architecture would lead to small errors in positional inference. Whether the maximum a posteriori estimate is employed by the cell, or some other estimate, is an important but difficult question to address. Our choice has been motivated by how this estimate has allowed the precise determination of developmental fates in the context of gap gene expression in Drosophila embryo [1, 2, 3]. We had earlier computed the inference error with a different estimate i.e.

      which computes the mean squared deviations of the inferred positions from the true position for each x, taking into account the entire distribution p(x∗|x). While the qualitative results are the same, the inference errors showed spurious jitters from outliers in sampling the noisy morphogen input distribution. This consistency might suggest that our qualitative results are insensitive to the choice of the estimate.

      Further, when evaluating the MAP estimate, the term p(θ) in the denominator serves as a normalisation factor to ensure p(x|θ) is a probability density. This is not strictly necessary for MAP estimation. Since p(θ) does not depend on x, the MAP estimate can be written as follows

      without the need for evaluating p(θ). In the case of a uniform prior, it would be equivalent to maximum likelihood estimate (MLE) i.e.

      2) One of the features of the signalling networks studied in the manuscript is the ability of the system to form a complex (termed a conjugated state, Q) made of two ligands L, one receptor and one nonsignalling receptor. While there are clear examples of a single ligand binding to two signalling receptors (e.g. Bmps), are there also known situations where such a complex with two ligands, one receptor, and one non-signalling receptor can form? In the Wnt example (Fig. 10a), it is not clear what this complex would be? In general, it would be great to have a more extended discussion of how the model hypothesis for the signalling networks could relate to real systems.

      This is a good suggestion. We have now added a discussion on the various possible realisations of the “conjugate state” Q in Section 3.6. We have also explored the various states in the context of different signalling contexts such as Dpp, Hh, Fgf in the Discussion section.

      The conjugated state ‘Q’ represents a combination of the readings from the two branches i.e. receptor types. This could be realised through processes like ligand exchange or complex formation, both in a shared spatial location such as a compartment. As discussed in the original manuscript (Section 3.6 of the revised manuscript), the ligand Wg in the Wg signalling pathway is internalised through two separate endocytic pathways associated with the receptor types - signalling receptor Frizzled (via Clathrin-mediated endocytosis (CME)) and non-signalling receptor HSPGs (via the CLIC/GEEC pathway (CLIC - (clathrin-independent carriers, GEEC - GPI-anchored protein-enriched early endosomal compartments)). Both pathways meet in a common early endosomal compartment where the ligands may be exchanged between the two receptors [4]. In a previous work by Hemalatha et al [4], we had shown that there are more Wg-DFz2 interactions in the endosomal compartment (measured through FRET) than on the cell surface. Therefore, the non-signalling receptors directing Wg through the CLIC/GEEC pathway titrate the amount of Wg interaction with the signalling receptor, DFz2.

      As mentioned in the original manuscript (Section 3.3 and subsection 4.2 of the Discussion in the revised manuscript), apart from Wg signalling, non-signalling receptors such as the HSPGs have also been proposed to act as co-receptors for Dpp, Hh, FGF (reviewed in [5, 6]). Although some ligands bind to the core protein of HSPG, the majority of the ligands bind to the negatively charged HS chains [7, 8]. Here, the coreceptors HSPGs aid in capturing diffusible ligands and presenting the same to signalling receptors (either on the cell surface or within endosomes).

      3) The authors consider feedback on reaction rates - it would seem natural to also consider feedback on the total number of receptors; notably, since there are known examples of receptors transcriptionally down-regulated by their ligands (e.g. Dpp/Tkv)? Also it is not clear in insets such as in Fig. 7b, if the concentration plotted corresponds to the concentration of receptors bound to ligands?

      As mentioned in the original manuscript (Section 2.2 of the revised manuscript), we have indeed considered control on reaction rates and receptors, although the control on the latter is done with the constraint of receptor profiles being monotonic. Further, while the control on reaction rates is considered via feedbacks explicitly, the control on receptors is done via an approach akin to the openloop control used in control theory. In reality, cellular control on receptors will involve transcriptional up- or down-regulation of receptor and thus warrant a feedback control approach – however, the timescales involved in such a control are different from the binding-unbinding and signalling timescales.

      Therefore, in the current work, we take the morphogen profile to be given i.e. independent of receptor concentrations, and we ask for the receptor concentrations that would help reduce the inference errors.

      Our predictions of increasing signalling receptor and decreasing non-signalling receptors in a twobranch channel architecture are consistent with the known transcriptional up-regulation of Dally/Dlp and down-regulation of Fz by Wg signalling [9].

      In a future work, we will extend the control on receptors to include feedbacks explicitly. Furthermore, the explicit feedback control on receptors may need to be considered concomitantly with the effect of receptors on morphogen dynamics (i.e. morphogen sculpting by receptors) along with the possibility of spatial correlations in receptor concentrations through neighbouring cell-cell interactions.

      As mentioned in the original manuscript (Section 2.2 of the revised manuscript), the variables ψ and φ stand for the total (bound + unbound) surface receptor concentrations of the signalling and the non-signalling receptors respectively. Therefore, the insets showing receptor profiles such as in Fig. 6b, 7b, and Appendix H Fig.8b,e correspond to the total surface receptor concentrations.

      4) The authors are clear about the fact that they consider the morphogen gradient to be fixed independently of the reaction network; however, that seems like a very strong assumption; in the Dpp morphogen gradient for instance over expression of the Tkv receptor leads to gradient shortening. Can the authors comment on this?

      This point is related to the earlier question 4. As discussed in the Discussion of the original manuscript (subsection 4.3 of the revised manuscript), we focus on finding the optimal receptor concentration profiles and reaction networks that enable precision and robustness in positional information from a given noisy morphogen profile. The framework and the optimisation scheme within it will prescribe different receptor profiles and reaction networks for different monotonically behaving, noisy morphogen profiles. It is possible that cells may achieve the optimal receptor concentrations via feedback control on production of the receptors.

      Broadly, morphogen dynamics depends on cell surface receptors, which could participate in both the inference and the sculpting of the morphogen profile, and factors independent of them such as extracellular degradation, transport and production, etc. In our present work, we have taken the receptors involved in sculpting and inference as being independent.

      In a more general case, feedback control on receptors will change the receptor concentrations as well as the morphogen profile. We are currently working on realising such a feedback control on receptors within the same broader information theoretic framework proposed in the current work.

      5) Fig. 10f is showing an exciting result on the change in endocytic gradient CV in the WT and in DN mutant of Garz. Can the authors check that the Wg morphogen gradient is not changing in these two conditions? And can they also show the original gradient, and not only its CV?

      The reviewer raises a legitimate concern – could the observed changes in CV upon perturbation of endocytic machinery be attributed to a systematic change in the mean levels of the endocytosed Wg alone? In the original manuscript (Appendix O Fig.17b,c of the revised manuscript), we show the normalised profiles of endocytic Wg in control and myr-Garz-DN cases. Here, in Fig.1 below, we show a comparison between the mean Wg concentrations (measured as fluorescence intensity) in control wing discs and discs wherein CLIC/GEEC endocytic pathway is removed using UAS-myr-Garz-DN. For clarity, we show the discs with largest and smallest fluorescence intensities from the control and myr-Garz-DN discs. It is hard to conclude that the mean concentrations are significantly different in the two cases.

      Reviewer #2 (Public Review):

      The work of Iyer et al. uses a computational approach to investigate how cells using multiple tiers of processing and multiple parallel receptor types allow more accurate reading of position from a noisy signal. Authors find that combining signaling and non-signaling types of receptors together with additional feedback increases the accuracy of positional readout against extrinsic noise that is conveyed in the morphogen signal. Further, extending the number of layers of signal processing counteracts the intrinsic stochasticity of the signal reading and processing steps. The mathematical formulation of the model is general but comprehensive in the way it handles the difference between branches and tiers for the processing of channels with feedbacks. The results of the model are presented from simple one-branch and one-tier architecture to two-branch and two-tier architecture with feedbacks. Interestingly authors find that adding more tiers results in only very small improvements in the accuracy of positional readout. The model is tested against a perturbation experiment that impairs one of the signaling branches in the Drosophila wing disc, but the comparison is only qualitative as further experiment-oriented work is planned in a separate paper.

      Strengths

      There is a clear statement of objectives, model, and how the model is evaluated. In particular, the objective is to find what number of receptor types and their concentrations for a given number of tiers and feedback types is resulting in the most accurate positional readout. The employed optimization procedure is capable to find signalling architectures that result in one cell diameter positional precision for most of the tissue with 3-4 cells at the tissue end that is most distant to the morphogen source. This demonstrates that employing additional complexity in signal processing results in a very accurate positional readout, which is comparable with estimates of positional precision obtained in other developmental systems (Petkova et al., Cell 2019, Zagorski et al., Science 2017).

      The optimal signalling architectures indicate that both signalling (specific) and non-signalling (nonspecific) receptors affect the precision of positional readout, but the contributions of each type of these receptors are qualitatively different. Even slight perturbation of signalling receptors drives the system out of optimum, resulting in a decrease in positional precision. In contrast, the non-signalling receptors could accommodate much larger perturbations. This observation could provide a biophysical explanation for how cross-talk between different morphogen species could be realized in a way that positional precision is kept at the optimum when morphogen signaling undergoes extrinsic and intrinsic perturbations.

      Last, the model formulation allows to specifically address perturbations of signalling and feedbacks, that could be explored to validate model predictions experimentally in Drosophila wing disc, but also in other developmental tissues. The authors present a proof-of-concept by obtaining consistent results of variation of output profiles in two-tier two-branch architectures with non-signaling branch removed and intensity profiles of Wg in wing disc where the CLIC/GEEC endocytic pathway was perturbed.

      Weaknesses

      The list of model parameters is long including more than 20 entries for two-tier two-branch architectures. This is expected, as the aim of the model is to describe the sophisticated signalling architecture mimicking the biological system. However, this also makes it very challenging or impossible to provide guiding principles or understanding of the system behaviour for the complete space of signalling architectures that optimize positional readout. Although, the employed optimization procedure finds solutions that exhibit very high positional accuracy, there is only very limited notion how these solutions depend on variation of different parameters. The authors do not address the following question, whether these solutions correspond to broad global optima in the space of all solutions, or were rather fine-tuned by the optimization procedure and are quite rare.

      It is unclear how contributions from the intrinsic noise affect the system behaviour compared to contributions from extrinsic noise. In principle, the two-branch one-tier architecture results in an already very accurate positional readout across the tissue. The adding of another tier seems to provide only a very weak improvement over a one-tier solution. It is possible that contributions from intrinsic noise for the investigated signalling architectures are only mildly affecting the system compared with contributions from extrinsic noise. Hence, it is difficult to assess whether the claim of reducing intrinsic noise by adding another tier is supported by the presented data, as the contributions from intrinsic noise could overall very weakly affect the positional readout.

      The optimal response of the channel to extrinsic and intrinsic noises is very distinct. As noted correctly by the reviewer, an additional tier provides only a marginal improvement in inference error due extrinsic noise (compare Fig.7 and Fig.8 in the revised manuscript). However, as shown in Fig.9c of the revised manuscript (same as in the original manuscript), adding an extra tier provides a substantial improvement in inference errors due to intrinsic noise.

      References

      [1] Gasper Tkacik, Julien O Dubuis, Mariela D Petkova, and Thomas Gregor. Positional information, positional error, and readout precision in morphogenesis: a mathematical framework. Genetics, 199:39– 59, 2015.

      [2] Mariela D Petkova, Gasper Tkacik, William Bialek, Eric F Wieschaus, and Thomas Gregor. Optimal decoding of cellular identities in a genetic network. Cell, 176:844–855, 2019.

      [3] Julien O Dubuis, Gaˇsper Tkaˇcik, Eric F Wieschaus, Thomas Gregor, and William Bialek. Positional information, in bits. Proceedings of the National Academy of Sciences, 110:16301–16308, 2013.

      [4] Anupama Hemalatha, Chaitra Prabhakara, and Satyajit Mayor. Endocytosis of wingless via a dynaminindependent pathway is necessary for signaling in drosophila wing discs. Proceedings of the National Academy of Sciences, 113:E6993–E7002, 2016.

      [5] Xinhua Lin. Functions of heparan sulfate proteoglycans in cell signaling during development. Development, 131:6009–6021, 2004.

      [6] Stephane Sarrazin, William C Lamanna, and Jeffrey D Esko. Heparan sulfate proteoglycans. Cold Spring Harbor perspectives in biology, 3(7):a004952, 2011.

      [7] Catherine A Kirkpatrick, Sarah M Knox, William D Staatz, Bethany Fox, Daniel M Lercher, and Scott B Selleck. The function of a drosophila glypican does not depend entirely on heparan sulfate modification. Developmental biology, 300(2):570–582, 2006.

      [8] Mariana I Capurro, Ping Xu, Wen Shi, Fuchuan Li, Angela Jia, and Jorge Filmus. Glypican-3 inhibits hedgehog signaling during development by competing with patched for hedgehog binding. Developmental cell, 14(5):700–711, 2008.

      [9] Kenneth M Cadigan, Matthew P Fish, Eric J Rulifson, and Roel Nusse. Wingless repression of drosophila frizzled 2 expression shapes the wingless morphogen gradient in the wing. Cell, 93(5):767–777, 1998.

    1. Author Response

      Reviewer #2 (Public Review):

      Strengths:

      This is potentially a very large and robust dataset of spinal stimulation while the animal performs a wrist torque task. However, the authors do not detail the number of trials obtained for each combination of conditions - stimulation location, current intensity, movement direction, number of repetitions, etc.

      We have provided an additional table to present the summary of collected data (Table 1 and 2 in Supplementary File 1). Each experiment consisted of 63-1004 successful trials that were evenly distributed to 8 task targets. We described this in the text on line 823-824. However, we indicated the averaged evoked muscle responses or the averaged evoked torques using the stimulus triggered average throughout the manuscript, we believe that it is more important to show the number of stimuli for averaging. Thus, we have kept the description of the number of stimuli in the typical examples of Figures 2A-C, 5A-C, 7B-D and 8A-C.

      Lines: 823-824 “Each experiment consisted of 63-1004 successful trials (Table 2 in Supplementary File 1).”

      Weaknesses:

      The authors' primary conclusion is that spinal stimulation at moderate current intensities facilitates the effects of descending inputs of the motor command. However, the authors need to expand on:

      i. The effect of these intensities of spinal stimulation on their own; without voluntary movement.

      ii. The robustness of the interactions observed.

      We added the results of stimulus-induced muscle responses (Figure 2A-C, 5A-C and 6A-D) and stimulus-induced torques (Figure 7B-D) during the hold period for the center target (i.e., during awake rest). These data allowed us to quantify the PStEs and the evoked torques without the effect of intended torque production. We could observe clearly the PStEs for Facilitation and the evoked torque. However, it was difficult to observe PStEs for Suppression because it required the substantial voluntary muscle activation to be inhibited. The robust interaction was demonstrated by the modulations of PStEs and the evoked torque from the awake rest to the voluntary torque production. We added further discussion on this point as follows:

      Results Lines 126-131 “The PStEs during the entire period of the task (insets on Figure 2A-C) showed either post-stimulus facilitative (Facilitation, insets on Figure 2A and C) or suppressive effect (Suppression, inset on Figure 2B). Spinal stimulation occasionally produced small magnitude of Facilitation during the hold period for the center target where the voluntary wrist torque production was not intended (center panels on Figure 2A). However, different magnitudes and/or types of PStEs were observed among the directions of voluntary torques (Figure 2A-C).”

      Lines 143-146 “Especially in PStEs of Facilitation, the magnitude of PStEs in the peripheral target close to the PD of background EMG (Figure 2A, 270° and 315°) was generally larger compared with that in the center target and smaller in the peripheral target opposite to the PD (Figure 2A, 90° and 135°).”

      Legend of Figure 2A-C Lines 170-171 “Muscle responses to spinal stimulation during the hold period for the 8-peripheral (peripheral panels) and the center targets (center of peripheral panels).”

      Results Lines 356-357 “Left insets and gray dots in right panels (Figure 5A-C) show the PStEs and background EMGs during hold period for the center target.”

      Legend of Figure 5A-C Lines 368-372 “The leftmost insets show PStEs during the hold period for the center target. The rightmost panels for each muscular condition show two-sided Pearson’s correlation coefficients between the magnitudes of background EMGs and PStEs. Gray dots in right panels indicate the result during the hold period for the center target that were not included for the correlation analyses.”

      Results Lines 394-402 “PStEs during the hold period for the center target increased as current intensity increased, showing a simple input-output property of stimulus-indued muscle responses (“Center target”, insets on Figure 6A-D). In general, including the hold period for the center target, the magnitudes of PStEs at low stimulus currents was linearly increased depending on the magnitudes of background EMGs (Figures 5A-C and 6A). However, the magnitudes of PStEs of Facilitation at medium currents were often larger during hold period for the center target (Figure 6B and C insets) compared to that during voluntary torque production even though the magnitude of background EMG was identical between them (Figure 6B and C, rightmost panels).”

      Legend of Figure 6A-D Lines 419-423 “The leftmost insets show PStEs during hold period for the center target intended to relax the wrist. The rightmost panels indicate two-sided Pearson’s correlation coefficients between the magnitudes of background EMGs and PStEs. Gray dots in right panels indicate the result during hold period for the center target that were not included for the correlation analyses.”

      Results Lines 452-460 “In another case, spinal stimulation at 300 μA mainly induced Facilitation effects on muscles with higher background EMG (outer peripheral panels in Figure 7C and Figure 7-figure supplement 1B), and the directions of the Evoked Torque were similar to the directions of voluntary torque independent of the direction of the Evoked Torque at the center target (center and inner peripheral panels in Figure 7C). Stimulation at 1700 μA exhibited large magnitudes of Facilitation in all muscles for all targets (outer peripheral panels in Figure 7D and Figure 7-figure supplement 1), and the Evoked Torques displayed ulnar-flexion directions regardless of the presence/absence or the direction of voluntary torque (center and inner peripheral panels in Figure 7D).”

      Legend of Figure 7B-D Lines 487-489 “StTAs of rectified EMGs (outer peripheral panels and center-bottom panel) and StTAs of wrist torque trajectories (inner peripheral panels and center-top panel).”

      Discussion Lines 669-680 “Compared with the hold period for the center target, the stimulus-induced muscle responses and torques at low to medium currents were generally more pronounced during the hold period for the peripheral targets (Figure 2A-C, Figure 7B and C, and Figure 7-figure supplement 1), indicating that the descending commands augmented activation in the spinal motoneurons and interneurons driven by spinal stimulation. Interestingly, at medium currents, the stimulus-induced facilitatory responses were sometimes smaller when the responses were recorded in the antagonistic muscles against the wrist torque direction regardless of the background EMG activity (Figure 2A and Figure 7-figure supplement 1B), suggesting that spinal reciprocal inhibitory function was evolved by the descending commands (Meunier and Pierrot-Deseilligny, 1998). Together, our findings indicate that voluntary commands amplify the functions of spinal circuits, including excitatory and inhibitory synaptic connections to motoneurons activated by spinal stimulation.”

      Specific comments:

      1) Interpretation of the main result - The authors state that they investigated the "effect of descending inputs on the stim-evoked EMG and torque output". But, their experimental design which compares post-stim EMG to pre-stim EMG provides a somewhat different result, i.e., the effect of spinal stimulation on voluntarily-evoked EMG and torque output. In other words, the voluntary output is held constant (independent variable) and the spinal stimulation parameters are varied (dependent variable).

      To get what the authors state, the design would have to be modified wherein the comparison would have to be between post-stim muscle activity recorded in the wrist neutral vs one of the holding state; Or comparison of post-stim muscle activity when the arm is passively torqued vs when voluntarily torqued.

      In our study, we compared pre-stim EMG and post-stim EMG in order to determine the presence/absence and the polarity (facilitation/inhibition) of PStEs. Our main aim in this study was to investigate the effect of descending commands (voluntary output) on the stimulus-evoked responses, and we concluded that the descending commands influence the spinal interneuron activities elicited by spinal stimulation. The motor task requires the control of the direction and magnitude of wrist torque attained in order to manipulate the magnitude of descending commands that were expressed as the background EMG activity at each muscle. Then, the result showing that PStEs were modulated by the variation of background EMG certainly indicates that the descending commands influence PStEs.

      In the revised version of the manuscript, we present additional data of PStEs and evoked torque while the wrist remained in the neural position (i.e., during awake rest) to address your comment.

      2) Most of the studies that have demonstrated the benefits of spinal stimulation, esp. in humans, have used sub-threshold stimulation. The manuscript does not provide direct information regarding the threshold of stimulation. Only table 2 provides such information but the data collection paradigm is so different from the actual task that it makes it difficult to make a relevant connection.

      • Why was the stimulation protocol under sedation different from during the wrist torque task? It would be really useful to describe the kind of involuntary movements evoked at different current intensities at the different spinal levels in awake, behaving animals. For instance, the higher amplitudes appear to just lock the arm into a full ulnar deviation. Such current intensities would be unlikely to be effective in enhancing movement in spinal cord injury. Thus, all the results for these amplitudes are somewhat irrelevant to therapeutic intervention. Similarly, does the moderate amplitude generate movement or muscle contraction?

      The stimulus evoked muscle responses changed their size depending on many variables, such as stimulus intensity, torque direction (i. e, voluntary muscle pre-activation in combination with other muscles activities), and the recording muscle. The stimulus threshold for each facilitatory and inhibitory effect is changed depending on these variables. Therefore, we did not aim to measure stimulus threshold independently. However, it was essential to map spinal somatotopic representation in relation with the site of the stimulus electrode for the experiment in Figure 4. Therefore, we delivered spinal stimulation with each electrode channel under anesthesia in order to capture muscle representation without concomitant voluntary descending drives in the intact monkey.

      As the reviewer indicated the importance, we agree to obtain the information of stimulus-evoked torques at each stimulus intensity while the wrist torque was neutral in the awake monkeys. In addition, we presented data of stimulus-evoked muscle responses and torques at each low, moderate, and high stimulus intensity while the monkeys’ cursor was maintained on the center target in Figures 2 and 7 (see the responses to previous comments).

      3) Please explain the term Spinal PD.

      Does the PD of the background EMG remain the same irrespective of the current intensity and site of stimulation? There is a decrease in background EMG amplitude in Fig. 2A and B with increasing stim amplitude. Can the authors please discuss this observation and how it would affect the efficacy of the spinal stimulation in facilitating descending inputs?

      Spinal PD is the preferred direction (PD) of facilitative evoked muscle responses (Facilitation) or suppressive evoked muscle responses (Suppression) that was calculated separately by the data obtained during the hold period for the peripheral targets. We added this explanation in the text (lines 146-149) and the legend of Figure 2D (lines 183-185).

      The amplitude of the background EMG changed with increasing current intensities, as the reviewer pointed out. Hence, it might be possible that the large ulnar-flexor torques due to the high stimulus currents had somewhat direction-biased effects on the required voluntary effort (i.e., for ulnar-flexor targets, less voluntary commands for ulnar-flexor muscles might be required under the support of stimulus evoked torque whereas for radial-extensor targets, more voluntary commands for radial-extensor muscles might be required under the opposed stimulus evoked torque). Nevertheless, we confirmed that the PD of the background EMG was consistent irrespective of the current intensity and stimulus site as presented in Figures 3A, 3B, 4B, and 4C (green polar plots). In addition, we showed that Spinal PD at high current was even opposite to the PD of background EMG, indicating that the magnitude of background EMG hardly explains the differences in the results between low to medium and high stimulus currents.

      Results Lines 146-149 “Significant PDs were observed in the 603 muscular conditions in 16 muscles for Facilitation (Spinal PD of Facilitation), 333 muscular conditions in 16 muscles for Suppression (Spinal PD of Suppression), and 1006 muscular conditions in 16 muscles for background EMG.”

      Legend of figure 2D Lines 183-185 “ Spinal PD (top panels) and Background EMG PD (middle panels) show the PDs calculated by the magnitudes of Facilitation or Suppression of PStEs and by the magnitudes of background EMG activity, respectively, during the hold period for the peripheral targets.”

      4) Line 546 - The authors speculate that higher current intensities resulted in direct activation of motoneurons. While this is certainly possible, It seems somewhat do the authors see proof of this in their data? Latency measurements?

      We newly analyzed the results for onset latency of PStEs as Figure 8, and added the relating descriptions in the Results, Discussion, and Materials and Methods of the revised manuscript. Please refer the responses to the 2nd comment from Reviewer 1. The results showing the latency shortening at the high currents support our statement that higher current intensities result in direct activation of ventral root axons.

      5) Line 589 - "However, in the rostrally-innervated muscles, the PDs for facilitation effects from caudal sites were opposite to those for background EMGs (Figure 4G, bottom-left panel), suggesting the direct activation of motor nerves." Can the authors clarify how they infer direct activation of motoneurons from the discrepancy between spinal PD and background EMG PD?

      We revised the Discussion as follows:

      Lines 702-710 “However, an exception was observed in some cases of rostrally-innervated muscles that showed facilitation effects. The Spinal PDs for facilitation in the rostrally-innervated muscles from caudal sites were opposite to those for background EMGs (Figure 4G, bottom-left panel). The magnitude of these responses was quite small (Figure 4E, left panel), but this feature of responses was similar to the response at higher current (Figure 3F, lower panel). These results suggest that some motoneurons of rostrally-innervated muscles may not receive excitatory ascending inputs from afferents of the caudal part of the spinal site. Although there is a considerable distance between them, current targeting to the caudal site might spread to ventral roots of rostrally-innervated muscles.”

      • I wonder why the authors did not look at the effect of spinal stimulation-evoked EMG and torque during the movement of the cursor? This could be used to determine the parameters that improve the performance of the task, by either increasing the speed or decreasing the effort required to perform the task.

      As the aim of this study was to reveal fundamental characteristics of descending commands on stimulus effects, we systematically and quantitatively explored evoked motor outputs, but did not directly investigate how the spinal stimulation improves the motor task to suggest a therapeutic interventional approach.

      For the analyses shown in Figure 7, we have shown the data of evoked torques, instead of the movement of the cursor, and concluded that the magnitudes and directions of evoked torque change depending on the current intensity and direction of voluntary torque production.

      • I wonder if the current dataset allows the generation of a map that shows the lower and upper limits of current intensity that result in facilitation of descending inputs for each muscle, at each stimulation location. Additionally, is this map stable across days/sessions.

      In the present study, we showed that descending commands amplified the functions of intraspinal neural elements regardless of stimulus sites (Figures 4G and H). In addition, we revealed that the current of 150-1350 μA boosted torque production in a direction corresponding with the direction of voluntary torque production (Figure 7C and F).

      Since it took many days to get these data with various stimulus conditions (stimulus current and site), we could not compare motor outputs to spinal stimulation in the same stimulus condition across days/sessions. Future studies will be needed to investigate the stability of motor outputs. We add this issue in discussion as follows:

      Lines: 750-752 “However, the effectiveness of subdural stimulation in controlling dexterous hand movements and the long-term stability of motor output need to be determined in future studies.”

      Reviewer #3 (Public Review):

      1) To characterize the effects of stimulation, stimulation was first delivered during an anesthetized experiment to map the evoked responses from each electrode. A major result of the paper is that the level of background activity affects the response to stimulation. It would be interesting to see these baseline responses to stimulation in awake monkeys while they were sitting quietly and not attempting a task to see if these align well with the anesthetized responses.

      As we had similar comments from Reviewer 2, we presented additional data of the evoked muscle responses and evoked torques during the hold period for the center target where the wrist torque production was not intended in awake monkeys (Figure 2A-C, Figure 5A-C, Figure 6A-D and Figure 7B-D). These data support our results that descending commands amplify the function of intraspinal elements. Please refer the responses to the 2nd comment from Reviewer 2 for the revisions to the text.

      On the other hand, the currents and frequencies of subdural spinal stimulation used in the anesthetized monkeys were different from those in awake monkeys. Thus, we could not compare the evoked motor outputs between anesthetized and awake conditions in present study.

      2) To understand the coordinated effects of stimulation across muscles, the authors present wrist torque data in Figure 7. These data are certainly important from a functional perspective and provide some information about coordination, but additional detail about coordination across muscles would be helpful throughout the paper. Currently, most of the results are presented on a per-muscle basis but don't describe whether there were (un)coordinated responses across muscles. For example, was there co-contraction of agonists or antagonists during stimulation? Increased activity of multiple antagonists could potentially lead to increased joint stiffness or fatigue without resulting in an increase in joint torque at the wrist.

      As you suggested, the inter-muscular relationship is another aspect of important information to comprehend the coordination of forearm muscles. Based on our data, the monkeys properly engaged each muscle as agonist with following anatomical constraint. We found antagonistic voluntary contraction to be quite rare or mostly non-dominant even during high intensity electrical stimulation, suggesting that the stimulus evoked responses of each muscle were independent of the voluntary activation (i.e., background EMG) of antagonistic muscles. We added these results in Figure 7-figure supplement 1 and the relating descriptions in the text as follows:

      Results Lines 460-467 “During the 8-directional torque task, the monkeys properly engaged each muscle as agonist (Figure 7-figure supplement 1). We found the antagonistic voluntary contraction were quite rare or mostly non-dominant even during high intensity electrical stimulation. There was a tendency that the magnitude of PStEs was stronger in agonists and weaker in antagonists at low and medium currents (Figure 7-figure supplement 1A and B). On the other hand, stimulation at high currents tended to induce large magnitudes of facilitation effects for all targets irrespective of agonist and antagonists (Figure 7-figure supplement 1C).”

      Legend of Figure 7-figure supplement 1 Lines 1179-1189 “Figure 7-figure supplement 1. Subdural spinal stimulation simultaneously evoked facilitative and suppressive effects in multiple muscles and activated synergistic muscle groups. (A-C) StTAs of rectified EMGs in five wrist muscles during the hold period for the center and the 4 peripheral targets. Each polar plot was normalized by the maximum value of each muscle. Each example in (A-C) corresponds to the cases of Figure 7B-D, respectively. At low and medium currents of stimulations, large magnitudes of PStEs were observed in the muscles with high background EMG. For instance, stimulation given at the flexion directed target in (B) strongly facilitated wrist flexor muscles (e.g., FCR, PL and FCU), while stimuli at the extension directed target strongly facilitated wrist extensor muscles (e.g., ECR and ECU). On the other hand, at high current of stimulation, the magnitudes of PStEs hardly changed regardless of the magnitudes of background EMGs and the directions of voluntary torque.”

      Discussion Lines 644-650 “The inter-muscular relationship characterized by the PDs of background EMGs in the wrist muscles (Figure 7-figure supplement 1) demonstrate that the monkeys consistently engaged each muscle as agonist, and that antagonistic voluntary contractions were rare irrespective of stimulus currents (see polar plots of background EMGs of Figure 7-figure supplement 1A-C). This result indicates that the presumed different activation in the spinal excitatory and inhibitory interneurons at different current intensity is not supported by the change of wrist torque production strategy.”

      3) Authors infer from the consistent ulnar wrist torque during high amplitude stimulation that these responses are likely to direct activation of the ventral motor pathway rather than activation through the dorsal sensory pathway and spinal circuitry. Is there any evidence in the EMG data (e.g. decreased latency, more consistent pulse-to-pulse amplitude of evoked EMG responses) to further support this finding?

      We added the results of the onset latency of PStEs as Figure 8, and the related description in the Results, Discussion, and Materials and Methods. The results showing the decreased latency at high stimulus current supports our argument that stimulus-evoked muscle response at the high currents resulted from the direct activation of ventral motor pathways. Please refer the response to Reviewer 1 for the revisions to the text.

  3. Oct 2022
    1. Author Response

      Reviewer #1 (Public Review):

      Overview

      In this work, the authors set to study the effects of topographic connectivity in a hierarchical model of neural networks. They hypothesize that the topographic connectivity, often observed in cortical networks, is essential for signal propagation and allows faithful transmission of signals. To study the effects of topographic connectivity on the dynamics, the authors consider a network composed of several layers. Each layer is a recurrent neural network with excitatory and inhibitory sub-populations. The excitatory neurons in each layer enervate a sub-population of the following layer. The receiving excitatory sub-population targets a specific group in the next layer and so on. This procedure leads to separate channels that carry the inputs through the network. The authors study how the degree of specificity in each targeted projection, called ’modularity,’ affects signal propagation through the network.

      The authors find that the network reduces noise above a critical level of network modularity: the deep layers show a clear separation of an active channel and inactive channels, despite the noisy input signal. They study how different dynamical and structural properties affect the signal propagation through the network layers and suggest that the dynamics can implement a winnertakes-all computation.

      We thank the reviewer for the concise summary of our work.

      Strengths and novelty

      Topographic projections, in which sub-populations of neurons target specific cells in efferent populations, are common in the central nervous system. The dynamic and computation benefits of this organization are not fully understood. With their simple model, the authors were able to quantify the amount of topographic structure and selectivity in the network and study its impact on the network’s steady-state. In particular, a bifurcation point suggests a qualitative difference between networks with and without sufficient topographic modularity. The theoretical analysis in the paper is rigorous, and the mean-field study shows good agreement with computer simulations of the model.

      We thank the reviewer for acknowledging the rigor of our work both in terms of theory and simulations.

      The authors describe simulation results of networks with different dynamical properties, including rate-based networks, integrate-and-fire neurons, and more realistic conductancebased spiking neurons. All simulations exhibit similar qualitative behavior, supporting the conclusion that the behavior due to structural modularity will carry to more complex and biologically relevant neural dynamics.

      Overall, the authors convince that the topographic structure of the network can lead to noise reduction, given that the input to the network is provided as distinct channels.

      Weaknesses

      The authors support their hypothesis and show a relation between topographic connection and noise reduction in their model. However, I find the study limited and struggle to see the impact it will have on the field. The paper is purely theoretical; it does not provide any physiological evidence that supports the conclusion. On the other hand, and this is the key issue, I do not find real theoretical insights in this work. In the following, I elaborate on why I hold this opinion.

      We understand the reviewer’s point and therefore significantly extended our theoretical results and their conclusions in the revised manuscript (see below). We are confident that the revised manuscript provides the theoretical insights that the reviewer was asking for.

      The hypothesis is that topographic projections in cortical areas allow faithful signal propagation. However, as the authors point out, reliable transmission can be achieved in other ways, such as by direct routing of information (lines 17-19). Furthermore, denoising can be accomplished by a simple feedforward network (e.g., ref 38) without E/I balance and with plasticity rules that do not require topographic connectivity. Thus, I find the computational model not well motivated.

      The reviewer mentions an important point that has not been sufficiently addressed in the previous version, namely the distinguishing feature of our model. Direct routing is indeed a simple way to transmit signals, but without the possibility of denoising them. The reviewer is also right that the denoising solution in the work by Kadmon and Sompolinsky (ref 38) does not require any topographic connectivity. However, their model does not constrain feedforward connections between layers in any way. In particular, neurons can excite and inhibit other neurons (i.e., ignoring Dale’s law) in downstream layers so that feedforward input covers a much wider range, thereby extending the activity range of the target neurons and generating fixed points more easily. In the biologically more plausible setting that we study (excitatory and inhibitory populations, excitatory background input and excitatory feedforward connectivity), we find that recurrent inhibition is crucial to compensate the excitation from previous layers and the external input. Only if the recurrent inhibition is sufficiently strong does the topographic organization of feedforward connections enable denoising. This is addressed in a new section ”Critical modularity for denoising” of the revised manuscript, where we also study the case of no recurrent connectivity and excitatory recurrent connectivity (for further details, see answers below). We further extended our discussion on other forms of signal transmission and denoising (see lines 489-498).

      The task studied here is a simple classification of static inputs: the efferent readout needs to identify the active channel. Again, this could be achieved by a single layer of simple binary neurons [Babadi and Sompolinsky 2014]. The recurrent connectivity and E/I balance suggest that dynamics should play an essential part in the model. However, the task is not well suited for understanding the role of dynamics.

      We appreciate the reviewer’s comments and completely agree. The simple classification task we explored can certainly be performed by simpler network architectures, such as the one studied in Babadi and Sompolinsky. However, as discussed above, this only works if the feedforward connectivity is unconstrained. In the case of Babadi and Sompolinsky, there is an expansion of inputs into a higher dimensional space through random connectivity drawn from a centered Gaussian distribution and appropriately chosen readout weights. This scenario is not compatible with the well-established biological constraints mentioned above that our model takes into account. In the new section ”Critical modularity for denoising” of the revised manuscript we show that recurrent inhibition is necessary to enable signal transmission and denoising under these constraints. The inhibition thereby not only generates competition between input channels but it also allows the modules to track their input very rapidly (as originally demonstrated by van Vreeswijk and Sompolinsky in 1996). To demonstrate this point and emphasize the relevance of dynamics, we added a new signal reconstruction task in the new section ”Reconstruction and denoising of dynamic inputs”, where we show that our model can faithfully track and denoise spatially encoded time-varying inputs.

      The authors perform a mean-field study to explain how modularity affects signal propagation. At the heart of their argument is that the E/I network exhibit bistability. However, bistability can be achieved by an excitatory population with a threshold [Renart et al., 2013]. The role of the inhibitory population does not seem crucial for the task and questions the motivations for this analysis.

      We thank the reviewer for raising this important point which we address in the section ”Critical modularity for denoising” of the revised manuscript. The reviewer is correct that bistability can be obtained in a purely excitatory network, and the modular topographic connectivity in our work essentially renders the stimulated pathway excitatory. The important feature of our model, however, is that the non-stimulated pathways remain inhibitory to get a distinction between stimulated and non-stimulated populations and the denoising feature. This is only achieved by recurrent inhibition that causes competition between pathways. Our analyses show that, for networks without recurrent connections or even excitatory recurrent connections, the network lacks mechanisms to compensate the excitatory feedforward and external background input. In these cases, all populations show high (and synchronous) activity and no classification and denoising can be achieved. Therefore, the revised manuscript unambiguously demonstrates the critical role of recurrent inhibition.

      Active and inactive channels are decided by the two stable states of the network: the high and the low activity regimes. However, noise fluctuations and their propagation through the network may have a prominent role in the overall dynamics. I find that noise fluctuation analysis is bluntly missing in this work.

      Fig. 7b of the previous version showed the stability of theoretically predicted fixed points using numerical fluctuation analysis around the fixed points. We apologize for not having made this sufficiently clear, and have therefore updated the caption of Fig. 7 to emphasize this point and extended the subsection ”Fixed point analysis” of the Methods detailing our approach. Furthermore, we fully agree with the reviewer that fluctuation analyses are important to understand the dynamics of our system. Therefore, we performed a theoretical fluctuation analysis in the new Figure 8 and the extended Appendix B of the revised version. This extended theory shows that competition induced by recurrent inhibition stabilizes the low activity state of non-stimulated sub-populations such that fluctuations cannot build up and propagate across layers, in line with the previously presented numerical simulation results.

      The main finding is a critical level of modularity, m= 0.83, above which the network shows denoising properties of silencing inactive channels and increasing the mean activity of active ones. However, the critical modularity is numerically demonstrated and is not derived theoretically. For a theoretical insight into this transition between denoising and mixing properties of the network, I would have liked to see a more rigorous discussion on the critical value. What does the critical point depend on? The authors show that the single-neuron dynamics do not affect the critical value, but what about other structural elements such as the relative efficacies of the E/I and the feedforward connectivity matrices? Do the authors suggest that m=0.83 is a universal number? I expect a more detailed analysis and discussion of this core issue in a theoretical paper.

      We fully agree with the reviewer and are grateful that this point was brought up. The initial submission did not provide a sufficent or deep enough discussion on which features determine the critical modularity and it certainly is important to do so. We also apologize that our presentation was misleading and suggested a universal number for the critical modularity. Unfortunately, there is no closed form expression for the critical modularity for the non-linear activation functions shown in the previous version. We therefore added a new analysis with a fully tractable piecewise linear activation function that allows us to derive a closed-form solution for the critical modularity. The new section ”Critical modularity for denoising” and Appendix B show the results of this analysis and discuss the various parameters that affect the value of the critical modularity. In short, the reviewer was completely right that the critical modularity depends on a number of connectivity parameters as well as single-neuron properties. In particular, our theoretical results show that recurrent inhibition is crucial for denoising.

      To conclude my main criticism, I believe that a theoretical paper should offer a more in-depth analysis and discussion of the core ideas presented and not rely mainly on simulations. For example, to provide theoretical insight, the authors should address central questions such as the origin of the critical modularity, the role of the recurrent balance connectivity, and how the network can facilitate computations other than winner-takes-all among channels. Alternatively, if the authors aim to describe a neural dynamics model without deep theoretical insights, I would expect to see physiological evidence supporting the suggested dynamics.

      We are very grateful for the reviewer’s criticism and believe the manuscript has substantially improved as a consequence. We are confident that our revised manuscript, by addressing these issues and extending the theoretical insights, now provides a much more thorough and comprehensive understanding.

      Conclusions

      The model studied by the authors is novel and provides a valuable way of exploring the effects of modularity and topographic connectivity on signal propagation through hierarchical recurrent neural networks. However, the study lacks theoretical insights into cortical circuit functions in its current version. I believe that for this work to impact the field, it needs to show further analysis and not rely on a numerical study of the model with limited theoretical derivations.

      Reviewer #2 (Public Review):

      This manuscript puts forward a new idea that topography in neural networks helps to remove noise from inputs. The neural network consists of multiple stages. At each stage, the network is structured to be balanced in terms of the strength of inhibitory and excitatory signals. Because of topography, the networks become ”dis-balanced” and receive more recurrent excitatory signals locally for those regions that receive strong initial inputs. This leads to error correction. The main weakness in the manuscript is that the approach will only work for inputs that are constant-in-time. It is important to acknowledge this limitation in both the title and throughout the manuscript.

      We thank the reviewer for the concise summary of our work and for acknowledging its novelty. Given the importance of the issue raised by the reviewer regarding the nature of the input signals, in the revised manuscript we added a new section ”Reconstruction and denoising of dynamic inputs” in which we investigate more complex, time-varying inputs and demonstrate that the model, due to the balance between excitation and inhibition, is able to quickly follow, process and denoise the external inputs. There are of course limits to the signal frequencies which can be successfully denoised, which we discuss in the Supplementary Materials (see Figure 10 - supplement 1) and elaborate on in the Discussion, but these are roughly within the ranges found in Human psychophysics studies.

    1. Author Response

      Reviewer #1 (Public Review):

      In the article "Neuroendocrinology of the lung revealed by single cell RNA sequencing", Kuo et. al. described various aspects of pulmonary neuroendocrine cells (PNECs) including the scRNA-seq profile of one human lung carcinoid sample. Overall, although this manuscript does not have any specific storyline, it is informative and would be an asset for researchers exploring various new roles of PNECs.

      Thank you for appreciating the significance of the data presented. Our storyline focuses on the newly uncovered molecular diversity of PNECs and the extraordinary repertoire of peptidergic signals they express and cell types these signals can directly target in (and outside) the lung, in mice and human, and in health and disease (human carcinoid tumor).

      Major comments:

      The major concern about the work is most results are preliminary, and at a descriptive level, conclusions or sub-conclusions are derived from scRNA-seq analysis only, lacking in-depth functional analysis and validation in other methods or systems. There are many open-end results that have been predicted by the authors based on their scRNA-seq data analysis without functional validation. In order to give them a constructive roadmap, it would be better to investigate literature and put them in a potential or probable hypothesis by citing the available literature. This should be done in each section of the result part. The paper lacks a main theme or specific biology question to address. In addition, the description about the human lung carcinoid by scRNA-seq is somehow disconnected from the main study line. Also, these results are derived from the study on only one single patient, lacking statistical power.

      We agree that much of the data and analysis presented in the paper is descriptive and hypothesis-generating for PNECs, however we do not consider it preliminary. We focused on validating two key conclusions from the scRNA-seq analysis: PNECs are extraordinarily diverse molecularly (as validated by multiplex in situ hybridization and immunostaining) and they express many different combinations of peptidergic signals (and appear to package them in separate vesicles). From the lung expression profiles of the cognate receptors, we also predicted the direct lung targets of the dozens of new PNEC peptidergic signals we uncovered, and validated the cell target (PSN4, a recently identified subtype of pulmonary sensory neuron) of one of the newly identified PNEC signals (the classic hormone angiotensin) by confirming expression of the cognate receptor gene in PSN4 neurons that innervate PNECs and showing that the hormone can directly activate PSN4 neurons. The characterized human carcinoid provided evidence that during tumorigenesis, the amplified PNECs retain a memory (albeit imperfect) of the molecular subtype of PNEC from which they originated. As suggested by the Reviewer, we have provided more background in Results by adding additional citations from the literature to clarify the rationale for each analysis and what was known prior to the analysis. We feel that our paper provides a broad foundation for exploring the diversity and signaling functions of PNECs, and although each molecular type of PNEC and new PNEC peptidergic signal we uncovered and potential target cell in (and outside) the lung warrants follow up (as do the sensory and other properties of PNECs we inferred from their expression profiles), such studies will require the effort of many individuals in many labs studying both normal and disease physiology in mouse and human, and exploiting the data, hypotheses, approaches, and framework we provide.

      Reviewer #2 (Public Review):

      Pulmonary neuroendocrine cells (PNECs) are known to monitor oxygen levels in the airway and can serve as stem cells that repair the lung epithelium after injury. Due to their rarity, however, their functions are still poorly understood. To identify potential sensory functions of PNECs, the authors have used single-cell RNA-sequencing (scRNA-seq) to profile hundreds of mouse and human PNECs. They report that PNECs express over 40 distinct peptidergic genes, and over 150 distinct combinations of these genes can be detected. Receptors for these neuropeptides and peptide hormones are expressed in a wide range of lung cell types, suggesting that PNECs may have mechanical, thermal, acid, and oxygen sensory roles, among others. However, since some of these cognate receptors are not expressed in the lung, PNECs may also have systemic endocrine functions. Although these data are largely descriptive, the results represent a significant resource for understanding the potential roles of PNECs in normal biology as well as in pulmonary diseases and cancer and are likely to be relevant for understanding neuroendocrine cells in other tissue contexts.

      However, there are several aspects of the data analysis that are unclear and require clarification, most notably the definition of a neuroendocrine cell (points #1 and #2 below).

      1) Figure S1 shows the sorting strategy used for isolation of putative PNECs from Ascl1CreER/+; Rosa26ZsGreen/+ mice, and distinguishes neuroendocrine cells defined as ZsGreen+ EpCAM+ and "neural" cells defined as ZsGreen+ EpCAM-; the figure legend also refers to the ZsGreen+ EpCAM- cells as "control" cells. However, the table shown in panel D indicates that the NE population combines 112 ZsGreen+ EpCAM+ cells together with 64 ZsGreen+ EpCAM- cells to generate the 176 cells used for subsequent analyses. Why are these ZsGreen+ EpCAM- cells initially labeled as neural or control, but are then defined as neuroendocrine? If these do not express an epithelial marker, can they be rigorously considered as neuroendocrine?

      As explained above in the response to Essential Revision point 1, we define pulmonary neuroendocrine cells (PNECs) throughout the paper by their transcriptomic clustering and signatures, which includes the dozens of newly identified PNEC markers as well as the few extant marker genes available before this study (listed in Table S2). The confusion here arises from the two previously known markers (Ascl1 lineage marker ZsGreen, EpCAM) we used for flow sorting to enrich for these rare cells for transcriptomic profiling (Fig. S1). Although most of the cells with PNEC transcriptomic profiles were from the ZsGreenhi EpCAMhi sorted population (as expected), some were from the ZsGreenhi EpCAMlo sorted population. The latter resulted from the high EpCAM gating threshold we used during flow sorting, which excluded some PNECs with intermediate levels of surface EpCAM. Indeed, nearly all PNECs (> 95%) expressed EpCAM by scRNAseq, and there was no difference in EpCAM transcript levels or transcriptomic clustering of PNECs that were from the ZsGreenhi EpCAMhi vs. ZsGreenhi EpCAMlo sorted populations, as we now show in the new panels (C', C'') added to Fig S1C. This point is now clarified in the legend to Fig. S1C, and it nicely demonstrates that transcriptomic profiling is a more robust method of identifying PNECs than flow sorting based on two classical markers.

      2) Similarly, in the human scRNA-seq analysis, how were PNECs defined? The methods description states that these cells were identified by their expression of CALCA and ASCL1, but does not indicate whether they also expressed epithelial markers.

      Human PNECs were identified in the single cell transcriptomic analysis by the same strategy described above for mouse PNECs: by their transcriptomic clustering and signatures, which includes the dozens of newly identified PNEC markers as well as the few extant marker genes available before this study (listed in Table S2). In addition to expression of classic and new markers, the human PNEC cluster defined by scRNA-seq indeed showed the expected expressed of epithelial markers (e.g, EPCAM, see dotplot below), like other epithelial cells.

      3) The presentation of sensitivity and specificity in Figure 1 is confusing and potentially misleading. According to Figure 1B, Psck1 and Nov are two of the top-ranked differentially expressed genes in PNECs with respect to both sensitivity and specificity. However, the specificity of these two genes appears to be lower than that of Scg5, Chgb, and several other genes, as suggested in Figure 1C and Figure S1E. In contrast, Chgb appears to have higher specificity and sensitivity than Psck1 in Figures 1C and E but is not shown in the list of markers in Figure 1B.

      As explained above in the response to Essential Revision point 2, because different marker features are important for different applications, we have provided several different graphical formats (Figs. 1B,C, Fig. S1E) and a table (Table S1) to aid in selection of the optimal markers for each application. Fig. 1B shows the most sensitive and specific PNEC markers identified by ratio of the natural logs of the average expression of the marker in PNECs vs. non-PNEC epithelial cells (Table S1), and we have added a two-dimensional plot of this sensitivity and specificity for a large set of PNEC markers (new panel E of Fig. S1). The violin plots in Fig. 1C allow visual comparison of expression of selected markers across PNECs and 40 other lung cell types including non-epithelial cells (from our extensive mouse lung atlas in Travaglini, Nabhan et al, Nature 2020). Pcsk1 and Nov score high in the analysis of Fig. 1B because they are highly sensitive and specific markers within the pulmonary epithelium, and they are also valuable markers because they are highly expressed in PNECs. However, they appear slightly less specific in the violon plots of Fig. 1C (Pcsk1) and Fig. S1F (Nov) because of expression (though at much lower levels) in individual lung cell types outside the epithelium: Pcsk1 is expressed also at low levels in some Alox5+ lymphocytes, and Nov is expressed at low levels in some smooth muscle cells. Chgb is a new PNEC marker that did not make the cutoff for the list in Fig. 1B because it is expressed in a slightly higher percentage of non-PNEC epithelial cells than the markers shown, which ranked slightly above it by this metric (see Table S1).

      4) The expression of serotonin biosynthetic genes in mouse versus human PNECs deserves some comment. The authors fail to detect the expression of Tph1 and Tph2 in any of the mouse PNECs analyzed, but TPH1 is expressed in 76% of the human PNECs (Table S8). Is it possible that Tph1 and Tph2 are not detected in the mouse scRNA-seq data due to gene drop-out? If serotonin signaling by mouse PNECs is due to protein reuptake, as implied on p. 5, is there a discrepancy between serotonin expression as detected by smFISH versus immunostaining?

      It is always possible that the failure to detect expression of Tph1 and Tph2 in the mouse scRNA-seq dataset is due to technical dropout, however when we analyzed this in our other mouse PNEC scRNA-seq dataset obtained using a microfluidic platform and also deeply-sequenced (Ouadah et al, Cell 2019), we found similar values as in the previously analyzed dataset: no Tph2 expression was detected and only 3% (3 of 92) of PNECs had detected Tph1 expression, whereas 24% (22 of 92) had detected expression of serotonin re-uptake transporter Slc6a4. Because our mouse and human scRNA-seq datasets were prepared similarly and sequenced to a similar depth (105 to 106 reads/cell), the difference observed in Tph1/TPH1 expression between mouse (0-3% PNECs) and human (76% PNECs) is more likely a true biological difference. We also analyzed serotonin levels in mouse PNECs by immunohistochemistry (not shown) and detected serotonin in nearly all (~90%) embryonic PNECs but only ~10% of adult PNECs. Systematic follow up studies will be necessary to resolve the mechanism of serotonin biogenesis and uptake in PNECs, and the potential stage and species-specific differences in these processes suggested by this initial data.

      5) The smFISH and immunostaining analyses are often presented without any indication of the number of independent replicate samples analyzed (e.g., Figure 2B, Figure 3F, G).

      The number of samples analyzed have been added (the values for Fig. 2B are given in legend to Fig. 2C, the quantification of Fig. 2B).

      6) It would be helpful to provide a statistical analysis of the similarities and differences shown in the graphs in Figures 1E and G.

      We added a statistical analysis (Fisher's exact test, two-sided) of Fig. 1E comparing expression of each examined gene in the two scRNA-seq datasets (Table S4). We added a similar statistical analysis of Fig. 1G comparing the expression values of each examined gene by scRNA-seq vs smFISH (see Fig. 1G legend).

    1. Author Response

      Reviewer #2 (Public Review):

      SIGNIFICANCE: Movement is based on the coordinated activation and deactivation of muscle groups that depend on the timing and strength of firing of the motoneurons connected to them. Motoneuron recruitment ultimately depends on the activity of local interneurons. By difference to other CNS regions, the interneurons in the spinal cord controlling motor output display a very high diversity in genetics, anatomy, localization, and electrophysiological properties. Making sense of the interneuronal circuits that modulate motor output to the different muscles of the body has revealed to be quite complex. One technique proposed over 10 years ago is the use of retrograde transsynaptic-monosynaptic tracing with modified rabies virus injected in single muscles to define premotor connections to individual motor pools controlling single muscles. Using this technique, the original authors suggested that interneurons controlling flexors and extensors occupied different locations in the spinal cord. This idea was an extension of pioneering work from the Jessell lab at Columbia University demonstrating that positional identity determined input connectivity of motoneurons, at least from Ia afferents. This principle, if extended to premotor spinal interneurons would simplify mechanisms by which extensor and flexor interneuron networks could be connected and controlled. In this paper, the authors combine data from four independent groups to show this principle might not be correct. In other words, interneurons connected to individual motor pools are highly intermingled in the spinal cord. This raises the bar for understanding both the intrinsic organization principles of interneuron microcircuits in the spinal cord (if any) and how they develop their specific connectivity.

      STRENGTHS AND WEAKNESSES: The authors propose that the conflicting conclusions occur because technical differences. The technique is based on complementation of rabies virus glycoprotein (G) in specific targeted motoneurons infected with a glycoprotein deficient rabies virus (RVdG). The way G and RVdG are delivered to specific motoneurons controlling one muscle differ. Originally this was accomplished by co-injecting RVdG and an AAV-G vectors in the same muscle simultaneously. However as previously published by a different group and now confirmed by the authors, this approach also infects muscle sensory afferents capable of transynaptically labeling populations of interneurons in the spinal cord anterogradely. This results in the labeling of mixed interneuron populations through their output to specific motor pools and/or their input from primary afferents of the same muscle. To avoid this problem the authors used transgenic approaches to induce expression of G in all motoneurons (not sensory neurons) and obtain muscle specifity by injecting RVdG in single muscles. Unfortunately, there is no single gene that selects only motoneurons for transgenic expression and tools for intersectional approaches were not available. Therefore, G is unavoidably expressed in some interneurons, in addition to motoneurons. These interneurons could be an additional source of transsynaptic jumps if they receive the RVdG from the motoneurons, raising the possibility that some labeling is the result of disynaptic, not monosynaptic, connections. The authors try to control for this possibility by comparing two different cre lines to direct G expression in motoneurons and each with different types of additional interneurons targeted. The results in both lines are similar raising confidence in the main conclusions. Moreover, the authors indicate that some motoneurons outside the intended pools are also labeled because motoneuron-to-motoneuron connections. In other words, the starter neurons for tracing monosynaptic connections are not as homogeneous or specific to a single motor pool as desired. This is acknowledged as a current limitation and is addressed in the discussion by proposing possible alternative approaches. Despite this weakness, the main conclusion of the study remains strong.

      A second technical issue raised by the authors is that of possible leakage during injection in the muscle. To reduce this possibility the authors reduced the volume injected compared to previous studies from 5 to 1 microliter and checked post-hoc the injection site for possible leakage (these are neonatal pups with muscles volumes of 2-3 microliters or less). In addition, they make a nice comparison injecting different titers of RVdG to demonstrate that variations in the number of infected motoneurons of one or two orders of magnitude does not alter the main conclusion on the topographic positioning of the interneurons connected to different motor pools. One weakness is that the exact numbers of motoneurons that start the tracing is impossible to evaluate and this prevents accurate comparisons across experiments. This is because cell death induced by the rabies virus is to be expected and only a variable subset of surviving neurons can be identified. Currently, this is an unavoidable characteristic of the technique. Nevertheless, there is a nice correlation between titer, surviving motoneuron numbers and interneurons labeled in number and location. The large number of replicates and their consistency further raises confidence in the authors claim of high specificity and replicability during injection, despite variable numbers of recovered motoneurons. The authors conclude that it is very important to check for the number and localization of starter motoneurons to confirm specificity after the injections. This reviewer totally agrees and is surprised this was not done in the experiment in which they try to replicate previous experiments by co-injecting in muscle AAV-G and RVdG.

      We agree with the reviewer that ideally the starter cells should have been identified in all the experiments. However, data were collected independently, at very different times in each of the labs involved, with different initial aims and there was no prior agreement on the details of injection and post-processing. The realization that we had similar experiments, performed with different techniques, led us to pool our observation together in order to give a picture of the distribution of premotor interneurons, the leitmotif of this paper, and a great effort has been devoted to ensure that all the cell counting procedures were uniform across labs. The lack of initial coordination is the reason why in some datasets the starter cells have not been quantified. Furthermore, in the previous version we had wrongly indicated that motor neurons analysed at Glasgow University were identified by ChAT expression. We have corrected this in the current version, since for those experiments motor neurons were only identified by location within any of the motor nuclei and size (diameter greater than 30 µm). On the other hand, since we have started comparing results, we have agreed on a uniform way of analysing and representing the data using the same normalization criteria. Therefore, while we cannot compare quantities like the ratio of secondary and primary infected cells for all the experiments (but we do it for the subset in which this is possible, see new Figure 4-Figure supplement 3 and comment number 3 below), the positional analysis has been done following the exact same criteria.

      One final problem with interpretation is that, for yet unknown reasons, the technique is dependent on the age of the animal and cannot be implemented in mature animals. Therefore, the connectivity revealed here is the one present during the first few days of life in the mouse. This is a period of significant synaptogenesis and synaptic selection and de-selection. The authors are encouraged to discuss further this limitation when interpreting interneuron connectivity in adult from studies in neonates.

      A very important point, see detailed answer to comment number 10 below.

      In summary, the authors have introduced new technical variations to trace premotor interneurons and challenge a major idea in the field, that is that interneuron connected to flexors and extensors occupy different positions in the spinal cord. The technique has still some weaknesses. 1) possibility of disynaptic jumps, 2) accurate identification of starter neurons, 3) restriction to neonates. However, the authors strengthen their conclusions considering alternatives and introducing a large number of controls (two cre lines, different titers, large number of animals analyzed, large numbers and consistency of replicates, independent counting in different labs... etc). This is an important and very useful study that suggests topographic localization is not a major organizing principle for interneuron connections with motor pools. It remains to be investigated then what are the organizational mechanisms that couple interneurons to functional distinct motor pools.

      The weaknesses summarized in the paragraph above are addressed in detail below in the answers to the specific comments.

      Reviewer #3 (Public Review):

      The manuscript by Ronzano et al presents a rigorous neuroanatomical study to convincingly demonstrate that there is no difference in the medio-lateral organization of flexor and extensor premotor interneurons. The study uses monosynaptic restricted transsynaptic tracing from ankle flexor and extensor muscles with several (4) strategies for delivery of the G protein complement and delta G Rabies virus, and additional (2) variations that consider titer and cre line. The authors went to great lengths here in attempt to replicate prior studies for which they had initial conflicting findings. Further, the experiments are performed in laboratories in four different locations. The variations on the Rabies and complement delivery, regardless of lab performing the experiment and analysis, all converge on the same conclusion. Aside from the primary conclusion, the paper can be used as a manual for anyone considering transsynaptic tracing as it details the benefits and caveats of each strategy with examples.

      The initial conflicting results put the onus on the authors to demonstrate where the divergence occurred. The authors took a highly comprehensive approach, which is a clear strength of the paper. All of the data is fully and transparently presented. Standardizations and differences between experiments run or analyzed in each lab are well laid out. Figure 1 and Table 2 provide a great summary of the techniques and their limitations. These are also well thought out and discussed within each section of results.

      The only thing missing is a likely explanation for the differences seen. Although the authors made several attempts to provide such explanation, the question remains - how did two groups who published independent studies using different strategies demonstrate flexor and extensor separation in the dorsal horn, when this study, using several strategies in multiple labs, show that the premotor neurons are in complete overlap? Additional small differences in methodologies could be identified which are not discussed and may provide potential explanations, but only for discrepancies in results of single techniques, not for all of the strategies used. The lack of reason for the discrepancy with prior studies despite the extensive efforts is unsatisfying, but, most importantly, the experiments were rigorously performed and the data support the conclusions presented.

      We thank the reviewer for the positive comments and we share the opinion that the discrepancy is unsatisfying. While we propose possible explanations, despite the extensive efforts, we could not provide a definite answer, but we hope that making our work public and all the data available, will trigger even more efforts from the rest of the community.

    1. Author Response

      Reviewer #1 (Public Review):

      This is a very interesting paper. In this manuscript Hendi et al. examined how two independent mechanisms, Wnt signalling and gap junction control two critical aspects of neuronal tiling. Here they have quite elegantly used two neighboring GABAergic motor neurons to show while one specific C. elegans Wnt-homolog, EGL-20, regulates the axonal tiling; innexin UNC-9-mediated gap junction at a very specific position on these axons regulate the chemical synapse tiling on these axons. They also performed multiple experiments to show that the UNC-9 gap junctions controls chemical synapse tiling independent of their channel activity.

      Overall, the paper is interesting and would be of general interest for many neuroscience researchers, specifically to those who are studying neuronal tiling and the role of gap junctions. However, there are some concerns with this study.

      Major concerns:

      1) Authors here only looked at the tiling of axons and presynaptic clusters in DD5/DD6 axons. However, these neurites get transformed in L1 from dendrite to axon and subsequently the nature of the synaptic termini also changes from postsynaptic to presynaptic. To say that egl-20/UNC-9 specifically control axonal tiling and GABAergic presynaptic tiling the authors must check the dendritic tiling and tiling of postsynaptic termini. Specifically, a) does UNC-9 channels also affect the postsynaptic patterning in L1? b) what is the time of unc-9 puncta formation? Is it present in the L1 stage or appears at L2 stage only after the fate switch from dendrite to axon? c) does egl-20 also control dendritic tiling in L1?

      We thank the reviewer for their insightful comments. As described in our original manuscript, we could not check the dendritic tiling between DD5 and DD6 at L4 stage due to the inconsistent labeling of DD6 dendrite with our fluorescent marker. As an alternative method, we measured the length of the (ventral) posterior dendrite of DD5 and showed that it is significantly longer in the egl-20(n585) mutant than in wild type at L4 stage. We also measured the length of postsynaptic domains in the DD5 posterior dendrite and showed that it was also longer in the egl-20(n585) mutant than wild type. Furthermore, we show that the UNC-9 localization at the tip of DD6 dendrite is unaffected in the egl-20(n585) mutant, despite the extension of postsynaptic domains. From these observations, we suggested that postsynaptic spines are distributed throughout the dendrite of DD5 in the egl-20(n585) mutant, and it is not regulated by unc-9.

      In the revised manuscript, we included images of wild type and egl-20(n585) animal in which ACR-12::GFP is co-labeled with mCherry::CAAX. In these strains, the expression of mCherry::CAAX and ACR-12::GFP is not detectable in DD6 in most animals. Using these strains, we confirmed that the DD5 postsynaptic sites are present throughout the dendrite of DD5 in both wild type and egl-20(n585) mutant backgrounds (Figure 1- figure supplement 1).

      a) Unfortunately we were not able to quantify postsynaptic patterning at L1 due to the low expression of ACR-12::GFP and mCherry::CAAX at L1 stage.

      b) UNC-9::7×GFP puncta are present at the tiling border of DD neurons on both ventral and dorsal sides throughout the development. In the original manuscript, we only showed the UNC-9 localization at the dorsal side. We believe our limited description of UNC-9 in the dendrites has caused confusion regarding the phenotypes of DD5 posterior dendrite and postsynaptic sites. In the revised manuscript, we have updated the images of UNC-9::7×GFP to show that the puncta are present in both axons and dendrites (Figures 2F-H).

      In the revised manuscript we also show that UNC-9 puncta are present at DD tiling border in L1 animals. We have included images of UNC-9::7×GFP at L1 at the axonal and dendritic tiling borders of DD5 and DD6 in both wild type and egl-20(n585) animals in Figure 2- figure supplement 5.

      c) As described above, we could not quantify dendritic tiling at L1 due to the low expression of our fluorescent makers at the L1 stage.

      2) Authors have shown that the previously known regulators for gap junction formation, NLR-1 and ZOO-1, do not regulate UNC-9 gap junction puncta on DD5/DD6 axons. Since they are cell adhesion molecule and tight junction component, respectively, presynaptic tiling should be checked in these mutants as well. Also, it is not clear whether these proteins are expressed in DD5/DD6 neurons at all. Since, NLR-1 has previously been shown to regulate unc-9 puncta in nerve ring, expression of these genes in DD5/DD6-neurons should be checked before making these conclusions.

      In the revised manuscript, we have included the presynaptic tiling quantification in zoo-1(tm4133); egl-20(n585) and nlr-1(miz202) egl-20(n585) mutants which showed no significant presynaptic tiling defects (Figure 2- figure supplement 1). We also cited a paper (Taylor et al., 2021) that described the expression of zoo-1 and nlr-1 in the DD neurons.

      3) Authors assumed that the relevant gap junction to be an UNC-9 homotypic homomeric channel, but DD neurons also express several other innexins (inx-1, inx-2, inx-10, inx-14 and unc-7). This raises the possibility that unc-9 channel could be heteromeric in nature. Effect of some other expressed innexins on synaptic tiling apart from unc-7 should also be tested.

      We thank the reviewer for their comment. As per their advice, we tested four additional innexins (inx-1, inx-2, inx-10, and inx-14) which have been reported to be expressed in DD neurons and examined their potential role in presynaptic tiling in egl-20(n585) mutant background. We found that none of them showed significant presynaptic tiling defect. In the revised manuscript, we have included this data in Figure 2E.

      4) Effect of unc-9(Del18) and unc-1 double mutant should be tested.

      We knocked out unc-1 using CRISPR/Cas9 genome editing in the egl-20(n585); unc-9(syb3236 [unc-9(ΔN18)]) mutant background and observed no significant presynaptic tiling defect compared with egl-20(n585); unc-9(syb3236 [unc-9(ΔN18)]), which further strengthen our model that the gap junction channel activity of UNC-9 is dispensable for its function in presynaptic tiling. We have included this data in Figure 5D.

      5) Authors have acknowledged the need to study the role of UNC-9 gap junction channels in maintaining the presynaptic pattering. This reviewer appreciates that idea and suggests the authors check whether late expression of UNC-9 is sufficient to rescue the presynaptic pattering defect observed in egl-20; unc-9 double mutant animals.

      We thank the reviewer for their comment. We conducted late rescue experiment using a heat shock promoter to express unc-9 at L2 stage after the presynaptic tiling competes. We did not observe significant rescue in presynaptic tiling defect in two independent transgenic lines of Phsp::unc-9. While we understand that this does not deny the function of unc-9 for the maintenance of presynaptic tiling, this result is consistent with the idea that unc-9 is required for the establishment of presynaptic tiling. We have included this data in Figure 2- figure supplement 4.

      Reviewer #3 (Public Review):

      This interesting paper from Hendi et al. describes a novel mechanism governing synaptic tiling that depends on expression of a gap junction protein at the border between adjacent presynaptic domains of neighboring neurons. The authors define the role of innexin UNC-9 in establishing the spatial arrangement of synapses in adjacent C. elegans GABA motor neurons. They show that axonal tiling is controlled by Wnt signaling. However, synaptic tiling is preserved when axonal tiling is disrupted in egl-20/Wnt mutants. Synaptic and axonal tiling are both disrupted in egl-20; unc-9 double mutants, suggesting these two processes are controlled through distinct molecular mechanisms. The authors find that UNC-9 is localized to the border between axons of adjacent GABA neurons and provide evidence that the function of UNC-9 in tiling does not require its channel function. The experiments are made possible by the development of a new system for labeling adjacent GABA motor neurons that will also be of general use to the field. The studies rule out requirements for either gap junction activity or several other genes previously implicated in gap junction function/localization, but fall short of clearly defining mechanism. Instead, the study provides additional support for channel-independent structural roles of gap junctions in the nervous system.

      The study's conclusions are generally well-supported by the data but more clarification is required in some areas:

      1) Overlaps between DD5 and DD6 dendrites are not evaluated directly. The authors show the extent of labeling in the DD5 dendrite. This should be clarified.

      We thank the reviewer for their comment. As described above, we could not directly quantify dendritic tiling defect between DD5 and DD6 neurons due to the inconsistent expression of mCherry in the dendrite of DD6. Alternatively, we measured the length of DD5 posterior dendrite in wild type and the egl-20(n585) mutant, and found a significant increase in the DD5 posterior dendrite length in the egl-20(n585) mutants. In the revised manuscript, we have edited the text to more clearly explain the defect of DD5 posterior dendrite.

      2) The authors suggest UNC-9 establishes axonal tiling as early as L2 stage, immediately following DD remodeling. However, no data is shown for UNC-9 localization at this developmental stage. It would also be interesting to know whether UNC-9 performs a similar role prior to remodeling, or if UNC-9 itself undergoes redistribution during the remodeling process.

      We thank the reviewer for their comment. As described above, we acknowledge our initial description of UNC-9 localization in the DD neurons was not sufficient. UNC-9 is present at both the axonal and dendritic tiling borders between DD5 and DD6 neurons throughout the larval development.

      In the revised manuscript, we included UNC-9 localization at the axonal and dendritic tiling borders between DD5 and DD6 in both wild type and egl-20(n585) animals at the L1 stage (Figure 2- supplement figure 5). However, we could not determine whether egl-20(n585); unc-9(e101) mutant exhibits presynaptic patterning defect in the ventral axons prior to remodeling at the L1 stage due to the low expression of our axonal and presynaptic markers at L1 stage.

      3) Based on the representative image, UNC-9 abundance appears reduced in unc-104. The authors should quantify.

      We thank the reviewer for their comment. In the revised manuscript, we quantified the signal intensity of UNC-9::7×GFP at the DD5-DD6 axonal tiling border in wild type, egl-20(n585), unc-104(e1265), zoo-1(tm4133) and nlr-1(gk366849). We found that the fluorescent intensity of UNC-9::7×GFP was indeed slightly lower in egl-20(n585) and unc-104(e1265) mutants compared with wild type animals. This result implies that egl-20 and unc-104 have a minor role in UNC-9 localization. Nevertheless, the UNC-9 puncta are always present in all genotypes we examined. The quantification is included in Figure 2- figure supplement 6, and we suggest that the weak presynaptic tiling defect in the egl-20 single mutant could be explained by this reduction of UNC-9 localization (lines 284-285).

      4) The authors show the distribution of muscle NLG-1 mirrors that of RAB-3. While this suggests the altered distribution of RAB-3 reports on synaptic rearrangement, this conclusion would be strengthened by analysis of an active zone marker.

      We agree with the reviewer that examining the co-localization of RAB-3 with an active zone protein would strengthen our conclusion. As such, we expressed BFP::RAB-3 under the DD specific promoter, flp-13, in a transgenic marker strain (wyIs292) that expresses the active zone protein, UNC-10::tdTomato under the GABAergic promoter, unc-25, and NLG-1::YFP expressed under the body wall muscle promoter, unc-129dm (Maro et al., 2015). Using this strain, we show that RAB-3 co-localized with UNC-10 and apposed to the postsynaptic NLG-1 in both wild type and the egl-20(n585); unc-9(e101) mutant. The representative images are included in Figure 2- figure supplement 2.

    1. Author Response

      Reviewer #1 (Public Review):

      The stated goal of this research was to look for interactions between metabolism, (manipulated by glucose starvation) and the circadian clock. This is a hot topic currently, as bi-directional links between metabolism and rhythmicity are found in several organisms and this connection has important implications for human health. The authors work with the model organism Neurospora crassa, a filamentous fungus that has many advantages for this type of research.

      The authors' first approach was to assay the effects of glucose starvation on the levels of the RNA and protein products of the key clock genes frq, wc-1, and wc-2. The WC-1 and WC-2 proteins form a complex, WCC, that activates frq transcription. The surprising finding was that WC-1 and WC-2 protein levels and WCC transcriptional activity were drastically reduced but frq RNA and protein levels remained the same. Under conditions where rhythmicity is expressed, the rhythms of frq RNA, FRQ protein, and expression of clock-driven "output" genes were also unaffected by starvation. The standard model for the molecular clock is a transcription/translation feedback loop dependent on the levels and activity of these clock gene products, so this disconnect between the starvation-induced changes in the stoichiometry of the loop components and the lack of effects of starvation on rhythmicity calls into question our understanding of the molecular mechanism of the clock. This is yet another example of the inadequacy of the TTFL model to explain rhythmicity. For me, the most significant sentence in the paper was this: "...an unknown mechanism must recalibrate the central clockwork to keep frq transcript levels and oscillation glucose-compensated despite the decline in WCC levels."

      The author's second approach was to try to identify mechanisms for the response to starvation by focussing on frq and its regulators, using mutations in the frq gene and strains with alterations in the activity of kinases and phosphatases known to modify FRQ protein. The finding that all of these manipulations have some effect on the starvation-induced changes in WC protein level is taken by the authors to indicate a role for FRQ itself in the response to starvation. This conclusion is subject to the caveat that manipulations of the activity of multifunctional kinases and phosphatases will certainly have pleiotropic effects on many cellular processes beyond FRQ protein activity.

      Because of the sometimes-speculative nature of our conclusions and based on the suggestion of the editor, we restructured the Discussion and discuss now the mechanism addressed by the Reviewer in the subsection "Ideas and Speculation". We added a sentence to the section about the possible pleiotropic effects of the tested signaling pathways: "Starvation triggers characteristic changes in the activity of signaling routes that affect basic components of the circadian clock. Although the multifunctional pathways might act via pleiotropic mechanisms as well, based on their earlier characterized role in the control of the Neurospora clock, their action can be inserted into a model describing the glucose-dependent reorganization of the oscillator."

      The third section of the paper is a major transcriptomic study of the effects of starvation on global gene expression. Two strains are compared under two conditions: wc wild-type and the wc-1 knockout strain, under fed and starved conditions. The hypothesis is that WCC has a role in the starvation response. The results of starvation on the wild-type are unsurprising and predictable: the expression of many genes involved in metabolic processes is affected. There are no new insights that come from these results and no new testable hypotheses are generated by the data.

      We agree with the reviewer that it is not surprising that glucose depletion strongly affects genes involved in metabolic processes and monosaccharide transport. These data obtained in wt served rather as a control for our experimental conditions. As a new aspect, our analysis focused on the differences between wt and wc-1 in the transcriptomic response to altered glucose availability.

      The authors refer to the wc-1 mutant strain as "clockless" and discuss its effects on the transcriptome only in terms of WC-1's function in the clock mechanism. However, WCC is known to be a major transcriptional regulator, controlling a number of genes beyond the TTFL. As acknowledged earlier in the paper, WC-1 is also the major light receptor in Neurospora. The transcriptomics experiments were carried out in a light/dark cycle, with cultures harvested at the end of the light period, when "an adapted state for light-dependent genes can be expected" according to the authors. However, wc-1 mutants are essentially blind, and so those samples are equivalent to being harvested in the dark. The multifunctional nature of WCC complicates the interpretation of the transcriptomics data. The differences in the transcriptome between wild-type and wc-1 may not be due to loss of clock function, but rather the loss of a major multifunctional transcription factor, or the difference between light and "dark".

      The reviewer is right, when we discussed the difference between wt and wc-1 in the transcriptional response to glucose, we did not emphasize the possible contribution of the photoreceptor function of the WCC. We added the following sentence to the revised version of the discussion: "Further investigations could differentiate between the clock and photoreceptor functions of the WCC in the glucose-dependent control of the transcriptome." Furthermore, we more specifically indicate that in wc-1 the lack of the WCC (and not the lack of a functional clock) results in the altered transcriptomic response to starvation when compared to wt (P15 L14-17).

      In the final set of experiments, the authors tested the hypothesis that the changes in the transcriptome between wild type and wc-1 might make wc-1 less competent to recover growth after starvation. They also test the recovery of frq9, a "clockless" mutant. The very surprising result is that the growth rates of these two mutants are slower than the wild type after transfer from starvation media to high glucose. This is surprising because there will be several generations of nuclear division and doublings of mass within a few hours and the transcriptome should have recovered fully fairly rapidly. A mechanism for this apparent "after-effect" is suggested with evidence concerning differences in expression of a glucose transporter, but it is not clear why this expression should not change rapidly with re-feeding on high glucose. As with previous experiments, the cultures were grown in light/dark cycles, which results in different conditions for the mutants, both of which have very low or absent WC-1 and are therefore blind to light. The potential effects of light have been disregarded.

      The reviewer is right that several generations of nuclear divisions occur within a few hours and lead to a number of doublings of the biomass. However, when the first phase of regeneration is delayed in one or more strains compared to the control, until the stationary phase a substantial difference in the biomass can be expected.

      To the expression change of the glucose transporter: In order to emphasize the different tendency of how glt-1 levels respond to glucose in the different strains, in the previous version of the manuscript we normalized the expression levels to the beginning of recovery (time point of glucose addition). Thus, expression differences between the strains were not shown. To give a more comprehensive picture, in the revised version of the manuscript expression levels without normalization are depicted (Fig 5F). The mutants did not adapt efficiently to changes in the glucose levels, i.e. expression of the transporter was relatively high in both wc-1 and frq10 during starvation and did not further increase upon glucose addition. On the other hand, 24 hours after glucose resupply, glt-1 levels were similar in all strains which might contribute to the similar growth rates observed under steady-state conditions in the standard medium.

      To the photoreceptor-independent function of the WCC during growth recovery: In the revised version of the manuscript we present additional data suggesting the importance of the photoreceptor-independent function of the WCC for efficient recovery from starvation. Fig. 5C and Fig. 5D show now that upon resupply of glucose, wt grows faster than the clock-deficient strains Δwc-1 and frq10 in both LD cycles and constant darkness, indicating that the role of the WCC in growth regeneration is at least partially independent of its photoreceptor function. To the function of the WCC in frq10: frq10 can not be considered blind. Although both Δwc-1 and frq10 lack a functional clock and WC levels are reduced in frq10, these strains show significant differences in WCC activity. While Δwc-1 is considered blind, in frq10 lack of the negative feedback results in high activity of the WCC in both DD and LL and expression levels of all examined, light-sensitive or light-dependent genes were found comparable in wt and in frq-less mutants (Schafmeier et al., 2005; Hunt et al., 2007; own unpublished data).

      The title of the paper refers to a "flexible circadian clock" but this concept of flexibility is not developed in the paper. I would substitute "the White Collar Complex" for this phrase: "Adaptation to starvation requires a functional White Collar Complex in Neurospora crassa" would be more accurate. Some experiments are also conducted using an frq null "clockless" strain, but because WC expression is very low in frq null mutants, any effects of frq null could also be attributed to WC depletion.

      As detailed above, low level of the WCC in the frq-less mutant does not mean low transcriptional activity and accordingly, the two clock mutants, wc-1 and frq10 show important functional differences. We used the word "flexible" to indicate that the molecular clock is able to operate under critical nutrient conditions and with a significantly changed stoichiometry of its key components. Results of our new experiments performed in DD (mentioned above) indicate that growth regeneration is rather independent of the photoreceptor function of the WCC. Nevertheless, we accepted the criticism of the reviewer and changed the title to "Adaptation to glucose starvation is associated with molecular reorganization of the circadian clock in Neurospora crassa".

      The major conclusion I took away from this paper is the multifunctional nature of the WCC as a transcription factor complex. It has been known for a long time that WCC controls the expression of many genes beyond the frq gene at the core of the circadian transcription/translation feedback loop. WC-1 is also the major blue light photoreceptor in Neurospora, controlling the expression of light-regulated genes, and this fact is barely touched on in the paper. These new data now extend the role of WCC in the regulation of metabolic networks as well.

      Reviewer #2 (Public Review):

      The authors have performed an interesting study addressing a topical question in considering how circadian oscillators remain accurate in changing environmental conditions and these circadian oscillators contribute to responses to environmental changes. The authors have performed their studies in Neurospora crassa. The authors have made a very interesting finding that starvation causes a profound decrease in white collar 1 WC-1 abundance, yet the circadian system continues to run despite this decrease in the abundance of a core oscillator component. The study of chronic glucose starvation in a Δwc-1 mutant is interesting and provides the opportunity to investigate the role of the WHITE COLLAR COMPLEX (WCC) and the clock system in adaption to starvation.

      Strengths:

      The authors have used a range of techniques to measure clock behaviour, including qPCR, phosphorylation, protein abundance, and subcellular localisation studies.

      An frq9 mutant was used to test the effects of FRQ on WC1 abundance since WC1 decreased during starvation. This is elegant, though it is not quite clear the logic of this experiment because FRQ did not change abundance during starvation, so why did the author think this experiment was needed?

      We regret that the examination of frq9 was not clearly justified in the previous version of the manuscript. It is true that FRQ levels did not change during starvation, only phosphorylation of the protein was affected, i.e. FRQ became more phosphorylated (displayed by an electrophoretic mobility shift on the Western blot (Garceau N, Liu Y, Loros J J, Dunlap J C. Cell. 1997;89:469–476.)) under low glucose conditions. We tested the starvation response in the FRQ-less strain because WCC level changed significantly in wt upon glucose depletion and expression of WC proteins is known to be controlled by FRQ. In the revised version of the manuscript we tried to introduce and explain the experiments performed with frq9 more thoroughly (P7 L22-P8 L14; P16 L21 – P17 L6).

      An interesting experiment was performed to test whether CK1a-dependent phosphorylation and inactivation of the WCC are involved in the starvation response. An FRQΔFCD1-2 mutant is used in which FRQ cannot interact with CK1a and therefore CK1a cannot phosphorylate and inactivate WC. This experiment suggested that CK1a is not involved in the response to starvation, again leading to the conclusion that FRQ is not involved in the starvation regulation of WC.

      The referee is right, effect of FRQ-bound CK-1a seems to be minor on the adaptation of the molecular clock to starvation, and this is also our conclusion in the manuscript. The major message of this experiment was that FRQ became phosphorylated in response to starvation without stably interacting with CK1a, probably via another mechanism. We agree with the notion that the behavior of WCC levels upon starvation was similar to that in the FRQ-less mutant.

      PKA is shown to be involved in the starvation-induced reduction of WC because the starvation-induced reduction in abundances of WC-1 was absent in the mcb strain in which the regulatory subunit of PKA is defective and hence, PKA is constitutively active.

      The authors have found an interesting potential link between glucose levels and WCC phosphorylation, they demonstrated that starvation reduces PP2A activity and that in a regulatory mutant of PP2A, which has reduced PP2A activity, there is little effect of starvation on WCC levels, suggesting the hypothesis that glucose-dependent PP2A dephosphorylation stabilises WCC.

      Analysis of starvation-regulated transcriptome in Δwc-1 and wild type found strong evidence that the transcriptomic response to starvation is in part dependent on WCC. Much of the misregulated transcriptome appears to be associated with metabolism.

      In a series of growth studies in wild-type frq and wc-1 mutants the authors provide strong evidence that FRQ and WC are involved in growth and survival following starvation, and recovery from starvation.

      Weaknesses:

      The authors describe Neurospora crassa as a model for circadian biology and apparently make the assumption that the findings are indicative of the behaviour of clock systems in other kingdoms. This is not the case. Neurospora crassa is a wonderful model for studying fungal clocks and is a great tool for studying basic circadian dynamics, but the interesting findings here are of a detailed molecular nature and therefore are applicable for fungal clocks, but not other kingdoms.

      We agree that we still do not know whether the described mechanism is specific for only fungal clocks. However, besides the basic feedback loop, overlapping mechanisms (controlled by e.g. casein kinases, glycogen synthase kinase, PKA, PP2A) are involved in the regulation of circadian timekeeping in different eukaryotic systems (reviewed in Reischl and Kramer, 2011, FEBS Lett; Brenna and Albrecht, 2020, Front Physiol). Our results suggest that some of these common factors (PKA, GSK, PP2A) are involved in the reorganization of the Neurospora clock in response to changes in glucose availability. Therefore, it is possible that analogous changes occur in the time keeping mechanisms of other eukaryotic systems when they face serious environmental challenges.

      We included a short section into the Discussion which gives a short overview about known interactions between glucose availability and circadian timekeeping at different levels of the phylogenetic hierarchy (P15 L18 – P16 L7).

      The authors assume that the reader is intimate with the intricacies of Neurospora crassa circadian studies and the significance of differences between LL and DD investigations. More background on the logic of the experiments would be helpful for readers from other fields.

      Thank you for the comment. In the revised version of the manuscript we tried to introduce the molecular clock of Neurospora more thoroughly and completed the description of the experimental conditions with detailed explanations.

      The data in Figure 2 are essential for the interpretation of the findings, demonstrating the presence of free-running rhythms. However, the data are entirely qualitative, making it hard to fully assess the authors' interpretations, a more quantitative assessment of the data would improve clarity.

      We quantified the Western blot signals and show the results in Fig 1E in the new version of the manuscript (according to the reviewer's suggestion Fig 2 of the old version is now part of Fig 1). Our data indicate that oscillation of FRQ levels is similar under both nutrient conditions.

      The conclusion that FRQ contributes to the regulation of WC1 abundance in response to starvation does not seem to be supported by the data because FRQ RNA does not change upon starvation. Furthermore, the authors conclude that the starvation-induced decrease in WC-1 and WC-2 protein levels are due to FRQ because a lack of reduction in an frq9 mutant is open to misinterpretation because this mutant makes WC levels low and therefore starvation might not lower already low levels of WC. Indeed WC-1 is lower in the frq9 mutant under any condition than in the WT under starvation and WC-2 does decrease in abundance in the frq9 mutant in starvation. The data strongly suggest to this reader that FRQ does not participate in the regulation of WC abundance in response to starvation.

      After rereading the criticized section, we admit that the text was not well structured and we carried out several modifications. We intended to emphasize that upon drastic changes of the glucose availability frq RNA levels remained compensated in wt, but this compensation was affected when functional FRQ was not present. We agree with the reviewer's opinion that the low expression of the WCC in frq9 makes it difficult to compare the glucose-dependence of WCC expression in frq9 and wt. We modified the conclusion by adding this information and now mainly focus on the strain-dependent difference in the changes of frq RNA expression. (P7 L22-P8 L14)

      The discussion accurately summarises the results and provides an interpretation but lacking is a comparison to other circadian systems in other kingdoms. How do the data compare with the effects of glucose and other sugars on the mammalian, plant, and insect clocks?

      We included a short section into the Discussion which gives a short overview about known interactions between glucose availability and circadian timekeeping in different organisms (P15 L18-P16 L7).

      How changes in WCC might result in changes in transcription is not explained. This might be very obvious to the authors but to the reader, it is not. Are the transcriptional outputs direct targets of WCC? Has WCC CHIPseq been performed by the authors or others, are the regulated transcripts directly bound by WCC? What are the enriched promoter sequences in the regulated genes, is it possible to identify the network by which these changes in transcription occur?

      We now show the list of genes (Figure 4 – Figure supplement 2) that changed in a strain-specific manner in response to glucose starvation and, based on Chip-Seq results, were earlier described as direct targets of the WCC (Smith et al., 2010; Hurley et al., 2014). Based on the literature data showing that the WCC affects the expression of several other transcription factors and controls basic cellular functions which might affect the expression of further genes, it was not surprising that only 90 out of the 1377 genes were reported to be direct targets of the WCC.

      Whilst the authors claim it is the circadian clock that is involved in the starvation response, in my view a more precise interpretation of the data is that WCC is involved in the response. Since WCC is a photoreceptor with dual function in the clock, is it yet possible to conclude that the effects discovered are due to the clock role of WCC? Or do the data support the role of light signalling in regulating the starvation response through WCC?

      We thank you for the comment. In the revised version of the manuscript we more specifically indicate that in wc-1 the lack of the WCC (and not the lack of a functional clock) results in the altered transcriptomic response to starvation compared to wt. In addition, in the revised version we present a new experiment (Fig. 5D.) which shows that upon resupply of glucose wt grows faster also in constant darkness than the clock-deficient strains wc-1 and frq10 do. This indicates that the role of the WCC in growth regeneration is largely independent of its photoreceptor function.

      The authors do not apparently reconcile that the effect of starvation is to hugely decreases WCC levels, but they find the transcriptional and growth response to starvation requires WCC?

      We agree with the reviewer that the problem of how low levels of WCC could sufficiently support the transcription of frq and different output genes under starvation conditions was not discussed properly. Our results suggest a model in which the maintained level of nuclear WCC and the weakened inhibition by both FRQ (the hyperphosphorylated form is less active in the negative feedback) and PKA (its activity lowered upon glucose depletion) together might ensure that transcriptional activity of the WCC is preserved upon glucose withdrawal in both DD and LL despite the decrease of the overall level of the complex. In the revised version these aspects are discussed more thoroughly (P16-18).

      This study contributes to the increased focus of the circadian community on the regulation of outputs by circadian oscillators. The manuscript will be of interest to many in the field. There needs to be less assumption of knowledge about the N. Crassa circadian system, and better discussion in a broader context of clocks in other kingdoms.

      We added a new section to the Discussion with data concerning interrelationships between glucose availability and the circadian clock in other organisms.

    1. Author Response

      Reviewer #1 (Public Review):

      Drosophila ovarian follicle cells have been utilized as a model system to study organogenesis and tumorigenesis of epithelia. Studies have found that lack of proper cell polarity causes invasive delamination of cells and formation of multilayered epithelia, reminiscent of Epithelial-Mesenchymal Transition (EMT). Using this system, the authors analyzed the single-cell transcriptome of follicle cells and show that distinct cell populations emerge shortly after induction of polarity loss. Authors identified dynamic activation of Keap1-Nrf2 pathway Finally, subpopulation classification and analysis of regulon activity identified that Keap1-Nrf2 pathway is responsible for epithelial multilayering caused by polarity loss.

      Strengths:

      The authors characterized the single-cell transcriptome of follicle cell subpopulations after induction of polarity loss. Using temperature-inducible driver, they can induce the polarity loss in a short period of time, which enables detection of epithelial populations in various transition stages. Detected cell-heterogeneity could be caused intrinsically or by environmental cues within in vivo tissue. Therefore, it is likely well recapitulating tumorigenesis in vivo.

      Weaknesses:

      1) Authors should show cells corresponding to identified key cell clusters within the tissue by immunostaining, GFP-trap, or RNA FISH.

      We thank the reviewer for their comment. However, for this particular case, we would like to underscore the observation that the clusters derived from our integrated analysis do not exhibit mutually exclusive gene expression. This is unlike other studies where different clusters exhibit unique markers. The different clusters in this study represent distinguishable cell states and not distinct cell types. Even though the Lgl-KD follicle cells transcriptomically deviate from their corresponding cells of origin to form their own clusters, they continue to express several markers that show gene-expression overlap with normal follicle cells. This overlap exacerbates the problem of identifying distinct cells using differentially-enriched markers.

      However, we have shown the antibody staining against Drpr to identify cluster 8 follicle cells that associate with Dcp1+ dying germline cells. We have used GstD-lacZ reporter (cluster 7 marker, specifically cluster 7_3) to show pathway activity within the multilayer. Besides GstD-lacZ, we also show F-Actin enrichment in cluster 7 (specifically 7_3) cells, that is significantly enriched in invasive cells. Additionally, we now have added images depicting the cell/stage specific expression pattern of JNK pathway components pJNK and puc, as well as that of Thor (4E-BP) which is expressed at high levels in cluster 8 and medium levels in cluster 7, and Xbp1-GFP (UPR stress sensor) that marks late stages of Lgl-KD cells.

      2) Images are low magnification and difficult to see individual cells.

      We have replaced several such images in the revised manuscript. Specifically, the revised manuscript has entirely new (or improved versions of) image panels in figure 5. In figure 1A, the focus is the entire ovariole and therefore, we have only highlighted the enrichment of Hnt and pH3 antibody staining separately for a subsetted region of interest (ROI). The ROI panels are included within the larger image itself. For figure 6, we have converted the LUTs of panels showing distinct channels for RFP and Shg/Arm antibody stainings to grayscale.

      3) Manuscript is written weighted toward the technical aspect and more biology behind this study has to be discussed.

      We have added new paragraphs to discuss the evidence supporting the loss of polarity, specifically that of Lgl, in human cancers. Additionally, we have also discussed how our results regarding Keap1 relates to what is already known about it and the implications of our results in context to cancer progression and metastasis.

      Reviewer #2 (Public Review):

      Chatterjee et al. perform extensive image and single-cell RNA sequencing (scRNA-seq) analysis of Drosophila ovaries with and without knockdown of a gene, Lethal giant larvae (Lgl), which is known to establish apical-basal polarity as well as controlling proliferation of epithelial tissues. The goal of the study is to characterize the effect of apicobasal-polarity loss in epithelial cells via Lgl knockdown on Drosophila ovaries at the phenotypic, cellular, single-cell gene expression and regulatory level. By focusing on single-cell gene expression clusters that are unique to Lgl-KD compared to those from flies without the knockdown, they were able to identify a highly transient cluster (cluster 7) which consists of tumorigenic cells. Differential markers within a sub-cluster (cluster 7_3) of this cluster followed by validation using a GstD-lac-Z enhancer-trap reporter assay lead to their conclusion that cluster 7 represents the cells of multilayering phenotype (i.e., the major Lgl-KD phenotype observed from image analysis) where activation of Keap1-Nrf2 signaling was observed. The KEAP1-NRF2 pathway is associated with protecting cells from oxidative stress. KEAP1 forms part of an E3 ubiquitin ligase, which controls NRF2, a transcription factor, by targeting it for ubiquitin-mediated proteasomal degradation. Surprisingly, inducing loss of function of both Keap1 and separately NRF2 (cnc in Drosophila) in Lgl-KD cells resulted in the same phenotype/rescue, loss of the multilayering phenotype. Over expression of Keap1 in Lgl-KD induced increased multilayer volume compared to Lgl-KD alone further supporting the role of Keap1 in cellular invasion and possibly early stages of tumorigenesis when epithelial cells start losing their polarity.

      The strengths of this paper are:

      The mutually reinforcing advanced imaging, scRNA-seq and genetic manipulation (knockdown and over expression) experiments/analyses that largely support the major conclusions of the manuscript which are summarized above as well as more minor observations that the authors make.

      The systems biology flow of the study from broad to a specific gene/pathway implicated in the phenotype. The authors start with a clear phenotypic characterization of Lgl-KD and genome-wide scRNA-seq analysis. This leads to regulatory factor enrichment and further identification of a cluster (cluster 7) and then to a sub-cluster (cluster 7_3). This is followed by the identification of the KEAP1-NRF2 pathway and demonstration that KEAP1 knockdown and overexpression in Lgl-KD rescues and aggravates the cell multilayering phenotype, respectively.

      The multilayering phenotype, genes and regulatory factors associated with loss of polarity are known to play an important role in the epithelial to mesenchymal transition (EMT). For example, this includes the enrichment of AP-1 family members, which are known to regulate EMT, in the regulon analyses as well as identification of KEAP1-NRF2.

      The weaknesses of the paper are:

      The framing/motivation of the study could be improved especially for those who study EMT/metastasis in humans. Given that loss of polarity is one of many events associated with tumorigenesis and metastatic progression, the claims made that studying Lgl-KD in Drosophila ovaries directly leads to insights into tumor cell invasiveness, early stages of tumorigenesis and EMT may leave some readers doubtful if they are not familiar with Lgl. Reviewing major findings that show that Lgl is a tumor suppressor as is its human homologue Hugl-1 as well as making a stronger case that studying Lgl-KD in Drosophila is relevant for tumorigenesis and EMT would be helpful.

      We thank the reviewer for these suggestions. Accordingly, we have added new paragraphs to the Discussion section, where how the Lgl-KD mediated polarity loss links to mammalian tumorigenesis, as well as the implications of our results, have been discussed.

      Given that Keap1 antagonizes NRF2, the apparent contradictory result that inducing loss of function of both Keap1 and separately NRF2 (cnc in Drosophila) in Lgl-KD cells resulted in the same phenotype/rescue (loss of the multilayering phenotype) is not fully addressed. Keap1 over expression revealed it aggravates multilayering. NRF2 over expression experiments were not performed. In addition, it was shown that over expression and knockdown of Keap1 did not affect NRF2 gene expression (Figure 5C); however, Keap1 regulates Nrf2 at the protein level directly via ubiquitin-mediated proteasomal degradation. Nrf2 protein levels in flies with and without Lgl-KD with various manipulations of Keap1 including control, KD and OE were not measured.

      As the Keap1-Nrf2 pathway is widely studied in context of oxidative-stress response signaling, Keap1 is widely accepted as a negative regulator of Nrf2-driven transcription. However, Nrf2 has been found to positively drive the expression of Keap1 (Sykiotis and Bohmann, 2008), and that manipulating Keap1 did not change Nrf2 expression (Fig.5C). In response to this comment however, we performed additional experiments driving the ectopic expression of Nrf2 (CncC-OE) in Lgl-KD cells, which increased the invasiveness of Lgl-KD cells, similar to that by Keap1-OE. Since the UAS-CncC line has been shown to upregulate Keap1 expression (Sykiotis and Bohmann, 2008), we concluded that this increase in invasiveness is indirectly due to the increase in Keap1 expression itself.

      Given that the antagonizing relationship of Keap1 and Nrf2 is only relevant to oxidative-stress response pathway, the genetic epistasis experiments in this study render that relationship irrelevant in context to the observed phenotype, as KD or OE of both components result in comparable phenotypes. Previous studies showing that Keap1 plays a role in cytoskeletal regulation (which is in agreement with our observation) also add weight to the argument that the observed phenotype is likely an indirect consequence of Keap1-Nrf2 signaling activation.

      Many of the conclusions in early Results paragraphs are purely technical and not biological. For example, "These observations highlight the limitations of marker validation to identify specific cells of the differential Lgl-KD phenotype" and "SCENIC was able to detect the common as well as distinct transcriptomic states of the cells in unique Lgl-KD clusters, while also highlighting the heterogeneity among them". Some of these technical conclusions could be part of brief discussions in the Methods section.

      For those not familiar with various detailed scRNA-seq analysis approaches (e.g., RNA velocity analysis), a brief description of how they should be interpreted biologically in Methods would be helpful. This might help resolve what appear to be contradictory/confusing results. First, the upper branch of cluster 7 (which is a focus of the study) shown in Fig. 3B is in a "late" stage based on Velocity Pseudotime analysis (left panel) and a "root" or an early stage based on Terminal end-points of differential analysis (right panel). The bottom branch of cluster 7 is "late"/"stable end point" based on these two analyses which is now consistent. Second, given these differences between the upper and lower branch of cluster 7, how is cluster 7 biologically the same cluster? Third, the bottom branch of cluster 7 bleeds into cluster 8 and while Ets21C is uniquely expressed in the bottom branch of 7, important markers of the study including Jra, kay (AP-1 family members), grnd, cnc (NRF2), Keap1, and the genes shown in Fig. 6F are all robustly expressed in clusters 7 (bottom branch) and 8. The biologically relevant distinction between the bottom branch of cluster 7 and 8 is not clear. Is cluster 8 important/relevant to the phenotypes observed as well?

      We have now added the following paragraph elaborating the logical choices made within the analytical pipeline in our Methods section:

      In this study, we have highlighted RNA velocity-derived interpretations that strictly agree with the other analytical perspectives pursued in this study. We applied scVelo to obtain information on the underlying lineage for (1) all unique Lgl-KD clusters, and (2) cluster-7 cells. The cells of the unique Lgl-KD clusters represent a mixed population of mitotic, post-mitotic, border-follicle cells and dying germline-cell associating cells that depict inconsistent transcriptional lineages. In this group of cells, the true developmental end-point of the observed Lgl-KD lineage is cluster 8 (germline-cell death occurs at the end of Lgl-KD follicular development), which likely consists of a mixed population of cells from the lateral epithelia as well as the multilayered epithelia, all responding to germline-cell death. Indeed, certain sections of cluster 7 appear more similar to cluster 8 and others seem comparable to that of cluster 13. These observations underscore our conclusions that the unique Lgl-KD clusters exhibit distinguishable gene expression, representing different cell states. For cluster 7, the state of transcriptomic heterogeneity is what defines its unique state of gene expression and we have assessed this heterogeneity by specifically sub-setting those cells.

      For a comprehensive interpretation of the results of the RNA-velocity based analysis, more information can be found in the scVelo tutorial (https://scvelo.readthedocs.io/).

    1. Author Response

      Reviewer #1 (Public Review):

      Gu et al. examine how activity in the substantia nigra pars reticulata (SNr) contributes to proactive inhibition - the suppression of upcoming actions - by recording SNr activity in rats performing a task requiring them to be prepared to cancel a planned movement. This task was developed in a previous study by the same authors in which they examined how globus pallidus pars externa (GPe) activity depends on proactive inhibition (Gu et al., 2020), which motivated the present focus on SNr. The task is rich and the complementary analyses of how the neural activity relates to the behavior, at the level of individual neurons and populations, are appropriate and illuminating. Overall, this study is well done and has the potential to be a nice contribution to our understanding of how the SNr, and therefore the basal ganglia, mediate behavioral inhibition. Addressing a few questions, however, would improve the paper.

      We appreciate both the positive comments and constructive criticism.

      • It is not obvious why the presence or absence of proactive inhibition should be determined on a session-by-session basis. It seems quite possible that proactive inhibition is not an all-or-none phenomenon, and also that it might be exhibited to a greater or lesser extent across a session (e.g., due to changes in motivational drive). It would therefore strengthen the paper to better explain the rationale for comparing neural activity across entire sessions "with" and "without" proactive inhibition. Within-session variation in proactive inhibition could be quite advantageous, allowing for within-neuron comparisons. It is even possible that the differences in neural activity that the authors report here using session-by-session analysis are an underestimate of the true effect of proactive inhibition.

      It is true that some of our analyses compare whole sessions with- and without- overall behavioral evidence for proactive inhibition. But our primary results come from within-session comparisons of Maybe-Stop to No-Stop trials. For this purpose, the session-wide assessment of proactive inhibition is primarily a screen for which sessions to use for within-session analysis.

      It would be desirable if we could use behavior to determine the degree of proactive inhibition on each individual trial, and then compare this to neural measures. Unfortunately, this is not generally feasible in our experiments. Our key evidence for proactive inhibition is the prolongation of reaction times (RTs). However, RTs are famously highly variable over trials. This variability likely reflects a variety of factors, not simply proactive inhibition. For example, in our previous paper (Gu et al. 2020) we showed that dividing trials into slower and faster RTs did not reproduce the same neural differences as comparing Maybe-Stop to No-Stop trials.

      An alternative approach to investigating proactive inhibition is to focus on the increased restraint that typically follows over-hasty responses. We found that when rats fail to Stop, on the next trial the degree of SNr variability increases (Fig. 6). We have now expanded this analysis to include additional types of errors. We find that another form of over-hasty action, premature responses before the Go cue, are also followed on the next trial by increased SNr variability (Fig. 6- supp1). By contrast, other error types (wrong choices; failure to respond quickly enough) do not provoke greater variability. These additional within-session analyses provide convergent evidence for increased variability as an adaptive response to failures evoked by excessive haste.

      • It is difficult to rule out alternative explanations for the observed differences in SNr activity. While the authors acknowledge this point in the 3rd paragraph of the discussion, they only discuss one potential alternative - reward expectation. Another difference between maybe-stop and no-stop trials is the likelihood that a particular target should be selected, which has also been shown to modulate SNr activity (Basso & Wurtz, 2002). As is often the case with complex behavioral tasks, there may be many other differences between trial types that may contribute to differences in neural activity. It would be helpful for the authors to more fully explain how their results relate to contextual modulation of SNr activity, and why the dependence of SNr activity on proactive inhibition may be a novel finding.

      We have expanded the Discussion to include additional alternative explanations.

      • A natural question arising from this study, as with most studies of neural recordings during behavior, is the causal nature of the neural activity. It would be non-trivial and beyond the scope of the current study to perform the sort of perturbations that could determine whether population variability causally relates to preparation to suppress actions. But it would be useful to discuss future experiments that might be able to test causality.

      We added in Discussion the possibility of using optogenetic manipulations of specific inputs to SNr, to help determine their distinct contributions to SNr firing patterns and proactive behavior.

      Reviewer #2 (Public Review):

      The authors have recorded the activity of neurons in the rat substancia nigra pars reticulata (SNr) while animals performed a version of a stop-signal task. The goal of this study is to investigate and describe the contribution of SNr in proactive inhibitory control. By examining single-cell responses as well as population activity, the authors show that increasing the probability of stop signal trials induces several changes in SNr responses. First, specific populations of SNr neurons increase their activity during proactive, direction-specific inhibition. At the population level, neurons are biased away from the side of the movement that has to be potentially inhibited. Second, during proactive inhibition, neuron activity is more variable, both at the single-cell and population levels. Finally, the authors show that animals' outcome history influences both firing rates and variability of neuron responses in the current trial. Especially, neural variability is increased following a failure to inhibit a movement.

      Strengths

      The manuscript provides an interesting and timely insight into the role of the basal ganglia output nucleus in movement initiation control. The paper is often clearly and concisely written (although see one issue related to this below). One of the main strengths of the work is to allow an interesting comparison with recent work by the same team, aimed at investigating the responses of another basal ganglia nucleus (GPe) in the same task, using similar analyses (this comparison is not extensively exploited in the discussion section though). Another potential strength is the use of different analysis scales. The authors investigated single-unit responses as well as population "trajectories" in the neural state space. This is an interesting option that could have been better motivated, given that the two approaches assume quite different brain operations.

      Thank you for the interest and careful comments.

      Weaknesses

      The analyses and results sometimes lack clarity and details. For instance, and unless I missed the information, it is not clearly stated whether "maybe-stop" trial analyses only include Go trials or if (failed) Stop trials are also considered. Moreover, quite complicated figures are often described very briefly in the main text. Methods are also often too succinctly described, and sometimes refer to a previous publication (Gu et al., 2020) that readers did not necessarily read.

      We have made a range of changes to make the analyses and their rationale more clear. This includes specifying that Maybe-Stop trials include both Go and Stop trials (and why). We have also added more details in both main text and Methods.

      There are some points that the authors might need to discuss more. Especially, a global picture of the role of the different basal ganglia nuclei during movement control would have been appreciated. Also, the authors monitored the activity of the rat basal ganglia output. We would have appreciated more information regarding the impact of this output activity on SNr target areas, as compared to their previous work that focused on GPe for instance. Another example concerns the observation that SNr activity is elevated during active inhibition regardless of the firing rate pattern before movement (increase or decrease). As noted by the authors themselves, this is inconsistent with the classical role assigned to the basal ganglia output nucleus (i.e. a decrease in activity promotes movement). Despite that this observation is of potential interest to readers working on the basal ganglia, it is not discussed.

      The revised Discussion includes a section on how altered basal ganglia output may affect targets to alter behavior.

    1. Author Response

      Reviewer #3 (Public Review):

      In the submitted manuscript, Eliazer et. al. conclude that Dll4 and Mib present on myofibers maintain a continuum of SC fates providing SCs capable of regenerating muscle and repopulatin the SC niche. The data provide new insights into the maintenance of SCs, demonstrating niche-derived factors are responsible for regulating SC behavior. Loss of either Dll4 or Mib from the myofiber reduces SC numbers and impairs muscle regeneration. Overall the data provide compelling evidence that niche-derived Dll4 and Mib regulate SC fate, however, whether the interaction maintains a continuum of SC fates as concluded by the authors is insufficiently supported by the data provided.

      We thank the reviewer for their comments.

      One significant issue with the manuscript is the "discovery" of an SC continuum related to the relative levels of Pax7 expression. A similar continuum was established nearly a decade ago by Zammit et al., 2004 and Olguin et al., 2004 and thus, is not new. The authors need to reference the work and discuss the prior published data with regard to the observations in the current manuscript. The data establishing a continuum of SCs and the relationship to Pax7 protein levels can largely be eliminated and referenced by the two former manuscripts. For example, these manuscripts establish that elevated Pax7 levels drive quiescence and low Pax7 levels correlate with differentiation. The data from these manuscripts establish that SCs with modest Pax7 protein levels can acquire quiescence accompanied by increases in Pax7 protein

      The omission of these two seminal papers was a massive oversight on our behalf. They have now been included. In the original manuscript we acknowledged that SCs exist on a continuum-a gradual transition from one state to another, based on scRNA-seq studies and the present data (Dll4, Pax7 and Ddx6 expression). The references for the sequencing data were included. But with all due respect to the reviewer, the Zammit and Olguin papers binned Pax7 into discrete classes once satellite cells had activated. This is not a demonstration of a continuum. Moreover, we do not make any statements about Pax7 levels in activated conditions. Therefore, the reviewer is drawing comparisons between two different contexts. The statements we have made as they pertain to a continuum under homeostatic conditions are accurate with publications to date.

      The data relating the level of Pax7 expression with Dll4a and Mib are intriguing but the authors do not establish a direct relationship, demonstrating that Dll4 or Mib regulate Pax7 levels. An alternative explanation is that Dll4 and Mib inhibit differentiation and thus promote SC quiescence indirectly. This is a critical distinction, as the authors could be correct and Dll4 via Mib regulate SC fate.

      We don’t make the claim that Dll4/Mib1 regulates Pax7 directly. We would side with the majority of publications showing that Notch signaling directly regulates Pax7. We have now added further experiments to examine whether Dll4 regulates Notch signaling. We crossed a transgenic mouse line harboring a Notch reporter with MF-Dll4 mice to analyze Notch signaling in SCs. The first experiment we performed with this reporter was to correlate the levels of Pax7 and Notch signaling on a cell-by-cell basis. In control mice, we found a linear positive relationship between levels of Pax7 and the Notch reporter. Next, we compared Notch reporter levels in control versus Dll4-null. We observed that Notch reporter levels decreased to below detectable levels in Dll4 null muscle. Therefore, Dll4 acts non-autonomously to regulate Notch signaling in SCs during homeostasis (refer to Reviewer 1 comment 1, and Essential revisions #3).

      The reviewer raises an important point: Does Notch regulate quiescence directly or a differentiation/commitment program when SCs are in a quiescent state. We never claimed that Dll4/Mib1 regulates quiescence. The only way to conclude anything about quiescence would be to examine expression of proliferative markers in vivo. Rather, throughout the manuscript we referred to Dll4 regulating the state of the quiescent SC pool, as measured by changes in Pax7 and Ddx6 expression. In the discussion section we had discussed that Notch signaling may regulate differentiation/commitment of cells in a quiescent state.

      It is unclear that the loss of Dll4 or Mib1 reduce diversity of SCs. If these repress differentiation then their loss would be expected to enhance differentiation and reduce SC numbers, which is what the data demonstrate.

      Diversity can be restated as the variability across a population. We demonstrate that the variance of Pax7 and Ddx6 expression decreases after Dll4 deletion. Important to note that we are analyzing the SCs that are not lost through differentiation. The fact that some of the SCs are lost through differentiation is not inconsistent with a shift in the continuum. We expect SCs to be lost through differentiation as they shift along the continuum towards a Dll4/Notch/Pax7 low state.

      We observe reduced number of Dll4/Pax7 high cells, which is consistent with a shift in continuum. The counterpoint would be that Dll4/Notch/Pax7 high cells commit to differentiation. There is no evidence for that conclusion in this work or any other work published to date. We discussed this issue in the results section.

      We have also performed an experiment where mice were treated with a lower dose of TMX to reduce rather than delete Dll4. We find that the total number of SCs does not change, while the relative number of Dll4/Pax7 high cells is reduced while mid and low are increased (Figure 4). This is consistent with a shift in a continuum of states.

      Finally, the injury data provided are for 4d post injury and thus, the data may represent a delay in regeneration as opposed to a failure to regenerate. At 30 d post injury regeneration is typically considered complete. How do wild type and Dll4 null as well as Mib null muscle compare at 30d post injury.

      We analyzed muscle regeneration of MF-Dll4fl/fl tissue, 40 days after injury. The mean CSA of muscle fibers are significantly smaller than the control fibers, suggesting a defect in tissue regeneration. This is now included in Figure 5-figure supplement 2. Due to time constraints, we have not performed the same experiment with Mib1 mutants.

    1. Author Response

      Reviewer #1 (Public Review):

      Weaknesses:

      The data presented here is, on the whole, descriptive. Whilst the descriptive elements are strong and important, more analysis and quantification is required to support the conclusions made in the paper. For example, in contrast to their analysis of the rail-MIP, their assertion that the ciliary vane orientation is linked to the CPC orientation is not backed up by quantification. In addition, this paper does not extensively discuss proteins within the MIP densities and central pair complex in detail, to the extent they can be discussed using the recent structures from Chlamydomonas.

      We thank the reviewer for pointing out these areas for improvement, which are addressed. We are grateful for their helpful suggestions, which we have incorporated to the best of our ability to improve the quality of the manuscript.

    1. Author Response

      Reviewer #2 (Public Review):

      Wang et al. elegantly exploit single-cell RNA-seq datasets to question the putative involvement of lncRNAs in human germ cell development. In the first part of the study, the authors use computational approaches to identify and characterize, from existing data, lncRNAs expressed in the germline. Of note, the scRNA-seq data used were generated from polyA+ RNAs, and thus non-polyadenylated lncRNAs could not be retrieved. Most of the lncRNAs identified in the germ cells and in the somatic cells of the gonads were previously unannotated. While this increases the catalog of lncRNA genes in the human genome, further characterization is needed to determine which fraction of these newly identified lncRNAs represent bona fide transcripts or transcriptional noise.

      Differential expression analysis between developmental stages, sexes, or cell types led to several observations: (i) whatever the stage of development, the number of expressed lncRNAs is higher in fetal germ cells compared to gonadal somatic cells; (ii) there is a continuous increase in the number of expressed lncRNA during the development of the germline; of note, a similar, although the more subtle trend is observed for protein-coding genes; (iii) the developmental stage at which there is the highest number of lncRNA expressed differs between male and female germ cells. While convincing, the significance of these observations is difficult to assess. However, the authors remain prudent with their conclusion and are not over-interpreting their findings.

      We appreciate Reviewer #2 precise summary of our analysis and highlighting the significances of these datasets for other researchers and future studies.

      Interestingly, integrating lncRNA expression to classify cell types led to the identification of a novel population of cells in the female germline that had not been revealed by protein-coding gene only-based classification. The biological relevance of this population, which cluster with mitotic populations, remains to be demonstrated. Finally, by examining lncRNA biotype, the authors could demonstrate an enrichment, in the germ cells, of the antisense head-to-head organization (in relation to the nearby protein-coding gene) compared to other biotypes. Whether this is different from the general distribution of lncRNA should be discussed.

      We analyzed the lncRNAs in NONCODEv5 database (human genome), and the result showed that XH type occupied 21.73% of the intragenic lncRNA-mRNA pairs in NONCODEv5 database (human genome), which is lower than 26.58% in fGC and 26.23% in mGC (Response Figure 1).

      Response Figure 1. Genomic distribution and biotypes of the lncRNAs in NONCODEv5 database and lncRNAs expressed in human gonad.

      In the second part of the manuscript, Wang et al focus on one pair of divergent lncRNA-protein coding genes (LNC1845-LHX8). To document the choice of this particular pair, it would be informative to have its correlation score indicated in Figure 3C. he existence of this transcript was validated using female fetal ovaries, and its function was addressed in late primordial germ cells like cells (PGCLC) derived from human embryonic stem cells (hESCs). The authors have used an admirable set of orthogonal approaches that led them to conclude as to a role for LNC1845 in regulating in cis the nearby gene LHX8. They further went on to identify the underlying mechanisms, which involve modification of the chromatin landscape through direct interaction of LNC1845 with a histone modifier. Among the different strategies used (KO, stop transcription, overexpression), the shRNA-mediated knock-down is the only one to specifically address the function of the transcript itself, as opposed to the active transcription. The result of this experiment led the authors to conclude that the LNC1845 RNA is functional, a conclusion that is reinforced by the demonstration of physical interaction between the LNC1845 RNA and WDR5, a component of MLL methyltransferase complexes. The result of the KD experiment is however puzzling as RNAi has been shown not to be the method of choice for targeting nuclear lncRNAs (Lennox et al. NAR 2016).

      We thank the Reviewer #2’s suggestion to add the correlation score of LNC1845-LHX8 pair and the Pearson Correlation of this pair is 0.3268. We have added the number to Figure 4C because which the expression correlation of LNC1845 and LHX8 was first mentioned. We have compared many other similar studies, shRNA knockdown has been widely used to target nuclear lncRNAs (Guttman et al. Nature 2011; Luo et al. Cell Stem Cell 2016; Subhash et al. Nucleic Acids Res. 2018; Li et al. Genome Res 2021), and the knockdown efficiency seemed to be feasible and acceptable to be used. The knockdown results are consistent with the deletion mutation and stop transcription approaches, all three showed that LNC1845 transcriptional expression is required for proper LHX8 expression in late PGCLCs.

      Overall, the functional investigation is convincing and strengthened by the inclusion of multiple clones for each approach, and by the convergence in the outcome of each individual approach. The depth of characterization is also remarkable. The analyses of the mechanisms at stake are somehow less solid, as there is less evidence demonstrating the involvement of the LNC1845 RNA and its interaction with WDR5.

      We have added more experimental evidence to strengthen the model especially the interaction of LNC1845 and WDR5. Apart from the RIP-qPCR results of WDR5 demonstrating the enrichment of LNC1845 by WDR5 pulldown (Figure S8D), we performed chromatin isolation by RNA purification (ChIRP) assay using antisense oligos along the entire LNC1845 transcript sequence. ChIRP results confirmed that WDR5 protein were enriched when anti-LNC1845 oligo probes were used to isolate the complex but not the controls without the probes or without overexpression of LNC1845 transcript (Response Figure 2). Taken together, the findings of both approaches support the model that LNC1845 directly interacts with WDR5 to modulate the H3K4me3 modification for LHX8 transcriptional activation. (Related to supplementary figure 8D and 8E.)

      Response Figure 2. LNC1845 binding for WDR5 was verified by CHIRP-western blot.

      Altogether, this study provides a convincing demonstration of the role of a lncRNA on the regulation of a nearby gene in the context of the germline. However, to have a better understanding of the functionality of lncRNA genes in general, it would be interesting to know whether other pairs of lncRNA-PC genes have been functionally investigated in this context, where no function for the lncRNA gene could be demonstrated. Negative results are highly informative and if so, these could be included in the manuscript.

      We appreciate Reviewer #2 suggestion to add other lncRNA-PC gene pairs results. In fact, we have analyzed and presented the results of another 2 pairs in figure 7D. LncRNAs LNC3346 and LNC15266 were also transcriptionally regulated by FOXP3, and they may regulate their neighbor genes TMCO1 and MPP5, as figure 7D showed. Our analysis showed that other lncRNA-PC gene pairs may also have the similar transcriptional regulation as LNC1845-LHX8 during germ cell development.

    1. Author Response

      Reviewer #1 (Public Review):

      Einarsson et al have produced CAGE data from EBV-immortalised lymphoblastoid cells from more than a hundred individuals from two genetically diverse African populations (YRI and LWK), and used it to study how sequence variation affects the activity of promoters at the level of expression variability and at the level of transcription start site usage within promoters across individuals.

      The dataset is very exciting, and the analyses were performed carefully and described well. The results show that promoters in the genome vary a lot with respect to their expression variability across individuals and that their level of variability is closely associated with their biological function and their sequence and architectural features. These results are often confirmatory - it is well established that promoters have different architectures associated with different sequence elements, different types of gene regulation and even differences across individual cells. In general, the multifarious observations boil down to one key distinction:

      • Regulated genes have promoters that look and act differently from those of housekeeping genes.

      We are pleased that the reviewer is as excited as we are about the unique dataset, the rigorous analyses performed, and the biological results. While we agree that housekeeping and regulated genes show apparent differences in terms of promoter variability, our analyses were not informed or guided by the expression of promoters/genes across cell types or tissues, but rather by their variability within the same cell type across individuals. It is indeed interesting that the same underlying mechanisms that cause stable expression across cell types also attenuates variability across individuals. And, similarly, promoters that display cell-type restricted (regulated) expression levels tend to also be more variable within the same cell type across individuals. While one may argue that these relationships are unsurprising they have, to the best of our knowledge, not been demonstrated before. Of note, however, while most low variable promoters regulate housekeeping genes and highly variable ones regulate regulatory genes, this is not always the case.

      While this is unsurprising, the authors then proceed to analyse other underlying differences between low variability (mostly housekeeping) and high variability (overwhelmingly regulated) promoters. Several observations have alternative and sometimes more elegant explanations if some of the previously worked out properties of housekeeping vs regulated promoters are taken into consideration:

      • The authors are keen to interpret the architectural features of ubiquitously expressed (housekeeping) promoters as selected for robustness against mutations in ensuring stable and steady expression levels.

      However, there are some known facts about both housekeeping and regulated promoters that make alternative interpretations plausible.

      • When discussing broad promoters, the authors disregard the well known fact that the most commonly used transcription start positions are those with YR sequence at (-1,+1) position. Any mutation within the span of broad promoter cluster that removes an existing YR or introduces a new one has the capacity to change both the TSS distribution pattern and overall level of expression of that promoter - but only slightly. This way, broad promoters can be viewed as adaptation not for robustness but for ability to take many mutations with small effect size that will drive any positive selection smoothly across a changing fitness landscape.

      We thank the reviewer for these remarks. We fully agree with the scenario described by the reviewer, that disruptions of TSSs may have different consequences depending on whether this would be in a broad promoter with multiple YR sequences or within a sharp promoter. However, we argue that the observation that promoters containing such flexible TSSs are not affected much upon genetic perturbations reveals robustness. Per definition, robustness is the ability to produce a persistent phenotype (in our case the molecular phenotype of promoter expression) even in perturbed conditions (e.g. under the influence of natural genetic variation affecting TSS usage). The very fact that TSS disruptions will only have small effect sizes in certain promoters but not in others, tells us that the unaffected or only mildly affected promoters have architectural properties that minimize the effect sizes of these disruptions and thereby cause robustness in overall promoter expression. Hence, we do not see our explanations and those of the reviewer to contradict each other.

      • Indeed, the main property of low variability promoters is that there isn't a single nucleotide change (either substitution or indel) that can substantially change their activity. (In that they are clearly different from e.g. TATA-dependent promoters, where one change can abolish TBP binding or deprive the promoter of a YR dinucleotide at a suitable distance from the TATA box.) This is achieved by their dependence on broad and weak sequence signatures such as GC composition and nucleosome positioning signal. However, most such genes are not known to have a strict requirement for dosage control. On the contrary, dosage seems to be much more critical for the functional classes that in the authors' analysis show variable expression.

      • Whether it is a removal of YR dinucleotide, introduction of a new one, or the change of nucleosome positioning, it seems that the transcription level from housekeeping, low variability promoters is unaffected, or at least affected mildly enough that it is not within the statistical power of the CAGE data across different individuals to detect the difference. Rather than robustness, it can be interpreted as competition - the architecture recruits preinitiation complex at a fairly constant rate, and it is the different YR positions that "compete" for serving as transcription initiation position, with the CAGE signal reflecting the relative effectiveness of each position in that competition. If one of the YR dinucleotides is removed, often the other, neighbouring ones will be used instead. The same might happen for potential multiple nucleosome positioning signals - if one becomes less efficient at stopping a nucleosome, another will be used more often.

      • The fact that decomposed parts of housekeeping promoters add up to approximately the same expression level across individuals even when they are uncorrelated point that they might actually be anticorrelated - indeed, the UFSP2 plot in Figure 4E looks like the two decomposed promoters are anticorrelated. That would argue against the independence of the decomposed promoters - indeed it may again point to "competition" where the decrease in use of one will simply shift most initiation events to the other.

      We thank the reviewer for these thoughts. The reviewer has made an excellent observation regarding the correlation between decomposed promoters within low variable promoters. While decomposed promoter pairs of highly variable promoters frequently have correlated expression levels, low variable multi-modal promoters often contain decomposed promoters that have low or even negative expression correlation across individuals. We agree that negative correlation points to the possibility that these decomposed promoters are competing for the transcriptional machinery. Indeed, nucleosome positioning analysis (described below), suggests the existence of diverse configurations of chromatin accessibilities within low variable multi-modal promoters with low or negatively correlated decomposed promoters. This may suggest a competition between the usage of their decomposed promoters. We have revised the manuscript to better reflect this aspect, discussed the potential for YR shifts encoded within the promoter sequence, and also toned down the independence of decomposed promoters. However, regardless of whether decomposed promoters are independent (low correlation) or competing for the transcriptional machinery (negative correlation), we do not agree that this violates our conclusion of robustness. A competition between decomposed promoters within a low variable multi-modal promoter would favor the strongest decomposed promoter, and if the strongest decomposed promoter is affected by genetic perturbation (for instance though disruption of YRs or proximal TF binding) this will affect the competition and shift the dominant usage to another decomposed promoter, as suggested by the frQTL analysis, leading to minimal change in total promoter expression, i.e. a robust molecular phenotype.

      • In general, not everything is a result of direct evolutionary selection, and that is what should have clear landmarks of purifying selection. On the contrary, promoters, especially housekeeping promoters, have vastly different nucleotide and dinucleotide compositions across Metazoa, both at large and at relatively short distances, which means they can undergo concerted evolution as a group, which means they should be "robust" to mutations in a way that allows them to change much more and more rapidly than some other promoter architectures - especially TATA-dependent architectures whose key elements and spacing between them haven't substantially changed for more than a billion years, and possibly longer.

      We fully agree with the reviewer and have revised the manuscript to remove the evolutionary aspect of robustness. We believe our results are better interpreted with regards to the existence of inherent mechanisms of low variable multi-modal promoters to provide regulatory robustness. Indeed, the vastly different sequence composition of housekeeping promoters between species makes these properties even more interesting. We do not believe that the robustness for perturbations need to be encoded by a specific sequence signature. Rather, we observe that multimodal promoters with low variability require broad initiation regions and a flexibility in the usage of TSSs. This fits well with observations in flies (Schor et al, 2017, DOI: 10.1038/ng.3791) of shifts in the shape of the promoter, which we believe to reflect shifts in decomposed promoter usage, upon genetic perturbation.

      • While housekeeping promoters are broad but mostly not among the broadest, regulated promoters can be either broad or narrow. This is also known - while narrow promoters are overrepresented for tissue-specific and non-CGI promoters, promoters of Polycomb-bound developmental genes are often broad and have large CpG islands; the latter may account for some of the broadest CAGE clusters observed in the data. It would be an interesting finding if both TATA-dependent and developmental promoters were found to be variable across individuals in a non-trivial way (the trivial way being the variability due to larger dynamic range of their expression - e.g. the expression of SIX3 in many cell types is basically zero, while the dynamic range of RPL26L1 is very limited) - this should be checked by analysing them separately.

      We agree that an analysis of the variability of developmental, Polycomb-bound promoters would be very interesting and thank the reviewer for ideas for a follow-up study. We do not feel that LCLs are the best model system for analyzing developmental promoters and therefore argue that this is out of scope in this study.

      • While broad promoters can be decomposed into subclusters with differential expression across individuals, the authors do not seem to allow for the decomposition of intertwined TSS positions within the cluster, but rather postulate hard boundaries between subclusters. This is different from e.g. overlapping maternal and zygotic promoter use (Haberle et al Nature 2014), where the distribution of the used TSS positions is different but the clusters can overlap.

      This is correct, we do not allow for overlapping decomposed promoters. We agree that the work by Haberle et al (2014, DOI: 10.1038/nature12974) on switches between maternal and zygotic TSSs is an excellent demonstration of how intertwined promoters can occur and be of biological relevance. Our analysis is based on the observation that low variable promoters are often multimodal and can not be well-explained by simply the width of promoters. This led us to decompose multimodal promoters into their sub-peak constituents. We believe that the frQTL analysis and the new decomposed promoter QTL (dprQTL) analysis clearly demonstrate the value of our approach. While it would indeed be interesting to see the results of an alternative approach for decomposition, we feel this is out of scope in this study but acknowledge that additional determinants of promoter variability may possibly be discovered using alternative strategies.

      • Both Dreos et al (PLOS Comp Biol 2016) and Haberle et al. (2014) show that one stable element of a broad promoter is the positioning signal of its first downstream nucleosome. As seen very convincingly in both Drosophila and zebrafish, the dominant TSS position of the broad promoter is highly predictive of the position of first downstream nucleosome and its underlying positioning sequence, and the most plausible interpretation is that there is an "optimal" distance from nucleosome for transcriptional initiation, resulting in the dominant (i.e. most often used) TSS position. In mammals, broad promoters are even broader than in those two species and might have multiple nucleosome positioning signals they can use. In such cases, mutations in one of the nucleosome positioning signals, or indels changing the spacing between the nucleosome and the part of sequence that contains TSS, might lead to differential use of one nucleosome signal vs other. This would be compatible with the authors' observations in low variability promoters that decomposed promoters are used to different extents in different individuals.

      We thank the reviewer for this excellent suggestion. In the revised manuscript, we have analyzed both the preference of the distance between the dominant TSS and the downstream (+1) nucleosome and the positional fuzzyness of that nucleosome. We observe a clear separation between low variable multimodal promoters with highly correlated decomposed promoters and those with low correlated decomposed promoters. Interestingly, those with low correlated decomposed promoters show a much less restrictive +1 nucleosome positioning with higher fuzziness, in contrast to what we would expect from broad CGI promoters having a reported fixed +1 nucleosome positioning. While this may be unexpected, it fits well with a model on how a flexible nucleosome positioning architecture can allow differential usage of decomposed promoters. Our results suggest that an array of underlying nucleosome positioning configurations exists for these promoters across single cells, which causes fuzzy nucleosome positioning and may allow for a competition between initiation sites, which provide robustness through their compensatory usage. Interestingly, we find that these results are consistent when analyzing the relationship between transcription initiation and nucleosome positioning within a single individual. This suggests that there is an inherent mechanism of flexibility in TSS usage in these robust promoters even when there is no differential influence of genetic variants. However, to which extent TSS preference is affected by nucleosome positioning or whether nucleosome positioning reflects TSS usage remains unclear. We believe these results further strengthen our general conclusions and thank the reviewer for this constructive suggestion of new analysis.

      • If we were to look for sources of difference other than the actual sequence architecture, some differences between regulated and unregulated promoters can be explained by the key difference: the regulation of regulated genes comes from outside the core promoter; the regulation of housekeeping genes is largely dependent on the intrinsic activity of the core promoter itself. This way, for example, in the absence of a causative variant in the promoter itself, the observed variability in the SIX3 promoter might not be encoded in the promoter itself - instead, enhancer responsiveness might be encoded in the promoter, and the variability itself could be due an enhancer that can be hundreds of kilobases away. Such a scenario combined with broad promoter would likely result in decomposed promoters that are highly correlated across individuals - because they are both externally controlled by the same regulatory inputs.

      These thoughts are very much in line with our own ideas on how enhancers may influence expression variation. Here, we aimed to investigate variability from a promoter perspective and we are confident that we observe several promoter features associated with low variability. Describing these, we agree that it is important to speculate also on the added contributions by distal elements. We now acknowledge the likely added contribution by enhancers in the Discussion:

      “The promoter sequence may also encode a promoter’s intrinsic enhancer responsiveness (Arnold et al., 2017), which may influence its expression variability. Although current data cannot distinguish between direct or secondary effects, an increased variability mediated via enhancers is supported by a higher dependency on enhancer-promoter interactions for cell-type specific genes compared to housekeeping genes (Furlong and Levine, 2018; Schoenfelder and Fraser, 2019). However, compatibility differences between human promoter classes and enhancers only result in subtle effects in vitro (Bergman et al., 2022), suggesting that measurable promoter variability is likely a result of both intrinsic promoter variability and additive or synergistic contributions from enhancers. Directly modeling the influence and context-dependency of enhancers on promoter variability would therefore be important to further characterize regulatory features that may amplify gene expression variability.”

      Reviewer #2 (Public Review):

      This manuscript by Einarsson and colleagues in the Andersson lab examined how genetic variability across a population impacts both gene expression and promoter architecture in a human population. The authors generate new CAGE data in 108 lymphoblastoid cell lines (LCLs). The authors' analysis is focused on defining how DNA sequence and promoter architecture correlate with population-variation in expression across this cohort. In general, there is a lot that I like about this manuscript: The dataset will be an extremely valuable resource for the genomics community. Furthermore, the biological findings are often thoughtful and potentially interesting and significant for the community. The analysis is generally very strong and is clearly conducted by a lab that has a lot of expertise in this area. My main concerns are centered around the often unwarranted implication that DNA sequence or promoter features cause differences in variation at different genes.

      We are pleased that the reviewer is as excited as we are about the unique dataset, the rigorous analyses performed, and the biological results. In our revised manuscript we have followed the recommendations by the reviewer and:

      ● Toned down implied causal relationships and added additional interpretations to our results, including YR positional preferences

      ● Performed additional analyses on nucleosome positioning of low variable promoters, as well as genetic association testing for decomposed promoter expression

      In all, we believe these revisions substantially improved our manuscript and even strengthened our previous conclusions.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript "Centrally expressed Cav3.2 T-type calcium channel is critical for the initiation and maintenance of neuropathic pain" identifies a subset of parvalbumin-expressing GABAergic neurons in the anterior pretectum (APT) that co-express Cav2.3 T-type calcium channels. The firing frequency and burst patterns are potentiated in these neurons following spared nerve injury (SNI) and the development of neuropathic pain. Deletion of the channels in these cells reduced both the development and maintenance of mechanical and cold allodynia. Studies show nice co-expression of the PV and GFP in the Cav2.3.2eGFP-flox KI mouse line.

      Multi-unit recordings from PV-Cre X Ai32 mice show that PV neurons in the APT are fast-spiking and that the mean firing rate and frequency of spikes in bursts are potentiated in SNI animals. The graphs in Fig. 2, panel F show compiled data of 18-20 cells from 6-8 animals depending on the treatment. The statistical design for the in vivo experiments (and actually all of the studies) are not clearly stated with degrees of freedom. It is important to know if recordings from a single animal are considered independent observations, and if so, what the rationale for that is. This information should be included in the Quantification and Statistical Analysis section. In addition, it would be interesting to determine if T-type calcium channel blockers can reverse this behavior in these recordings.

      We considered each unit as an independent observation. We did so since the number of recorded PV+-units per animal (identified with the PINP method) was small and varied greatly between animals, from 2 to 6 units. We are not aware of statistical methods using a nested design for multiple cells in animals that could be used in such condition.

      Since the measured variables did not follow normal distributions, we performed unpaired comparison with the Wilcoxon sum rank test. This is now stated in the section ‘quantification and statistical analysis’ (page 19, line 683). P-values are now included in the figures and the result section when appropriate.

      In vitro electrophysiological studies show that the PV-expressing APT neurons exhibit fast-spiking to depolarization and single-cell RT-PCR shows that Cav3.2 is expressed in APT neurons that also express GABA. These cells show an after-hyperpolarization burst of APs that is reduced by blockers of Cav3.2 channels. There are no statistics displayed on panels C-E in Fig. 3, although they are reported in the text. Again, the test used and degrees of freedom, etc. should also be reported as it allows for evaluation of the experimental design.

      We apologize for the lack of statistics in Fig. 3 (now Fig. 4). Statistics are now clearly presented on each figure panel and the statistical tests are stated in the figure legends and in the results (page 4, line 144-159).

      As it is now stated in the “Quantification and Analysis section” (page 19, line 682), each neuron was considered as an independent observation since 1 to 3 neurons were recorded per mouse. The number of mice and the mean number of neurons per mouse are indicated in the data-set for each experimental condition in order to allow for a clear evaluation of the experimental design. Note that in the experiments with application of T-channel antagonist, only one neuron was recorded per slice. This is now specified in the Method Details (page 17, line 579).

      It is also noted in the Discussion (lines 185-186) that "Our in vitro data indicate that 92% of APT-PV+ neurons are able to discharge bursts of action potentials at high frequency underpinned by a large transient depolarization due to the activation of T channels." It would be more clear to refer to the rebound as the figure also shows the fast-spiking properties due to depolarization as well as the transient depolarization due to the rebound but only an effect of the Cav2.3 on the rebound.

      We agree and have changed the sentence accordingly (page 5, line 202).

      Behavioral studies of mechanical and cold allodynia in male and female naïve and SNI-treated KI and KO mice were performed. These results show a clear contribution of the Cav3.2 channels in APT in both the development and maintenance of neuropathic pain. Again, the statistical design is not clearly defined and it is extremely difficult to resolve what comparisons are delineated in panels B-E of Fig. 4.

      We fully agree with the reviewer that the rationale for the choice of statistical tests used to analyze the behavioral data was lacking. We have rewritten the relevant paragraph in the Quantification and Analysis section (page 19, lines 699-714). The statistical results presented in the Fig 4 and its supplemental figure (now Fig 5 and Figure 5 – Figure supplement 1) are now clearly stated in the legends.

      Reviewer #3 (Public Review):

      The authors used state-of-the-art techniques to investigate the role of centrally located (GABAergic APT neurons) CaV3.2 isoform of T-channels in an animal model of neuropathic pain using speared nerve injury model. This is generally an excellent and very rigorous study. The data is very compelling and it is likely going to have a major impact in the field of ion channels and pain transmission. The data presentation is superb and major conclusions are highly justified. Major strengths include the use of powerful complementary techniques such as molecular (single-cell PCR), mouse genetics, and pain testing in vivo, as well as sophisticated ex vivo (slice physiology) and in vivo recordings (burst analysis using tetrodes). This study may explain recent clinical studies that failed to show the efficacy of peripherally acting Cav3.2 channel blockers in patients with neuropathic pain. Hence, this study has the potential to change the focus from peripheral to supraspinal Cav3.2 channels in various pain pathologies.

      Some moderate weaknesses are identified and should be addressed:

      1) The data showing the effect of T-channel deletion on the excitability of GABAergic neurons of APT is very convincing. However, what is missing is a discussion of how changes in the excitability of inhibitory APT neurons impact the circuitry that is involved. Without knowing the circuitry involved, one could speculate that blocking inhibitory drive may do just the opposite effect of what is proposed and increase hyperalgesia.

      We agree with the reviewer that discussing this issue is essential and it has now been added (page 6, lines 270-295).

      2) Methods should clearly state if any experiments were done in a blinded fashion.

      Behavioral experiments were performed blind. This has been added in the method section (page 18, line 624). For in vivo electrophysiological experiments, we cannot say that we performed blind experiments (although we tried). Indeed, under anesthesia, the forelimb of SNI animals presents a slight but observable withdrawal.

      3) There is no mention anywhere of how was selective Cav3.2 knock-out achieved, nor how was this assessed. It would be very helpful if authors could perform recordings of T-channel amplitudes in sham animals, animals after SNI and after selective knock-out in the SNI group.

      The efficiency of the Cav3.2 deletion after Cre virus injection was assessed by immunolabeling of GFP in APT slices. As shown in Figure 5 – Figure supplement 1, we checked that unilateral injection of AAV8-hSyn-Cre-mCherry virus induced a drastic reduction in the number of GFP+ neurons when compared to the non-injected hemisphere. The absence of Cav3.2 expression in Cre injected APTs was systematically checked in each mouse at the end of the behavioral tests (Figure 5A). This is now added in the Method Details (page 18, line 655).

      4) It should be discussed that global Cav3.2 animals had only minimal neuropathic pain phenotype (Choi et al., 2007).

      This point is now discussed (page 6, line 251).

    1. Author Response

      Reviewer #3 (Public Review):

      Dingus et al. have developed an innovative and powerful approach for improving the intracellular stability of nanobodies. Nanobodies are single chain antibodies that are typically generated in select species such as llamas or alpacas. Because nanobodies are secreted and are present in general in the extracellular environment, they often become unstable when expressed in the reduced intracellular environment. Dingus et al. investigated 75 nanobodies from the Protein Data Bank and found that 42 were unstable when expressed intracellularly. In order to improve stability of these nanobodies, they first determined consensus residues that were present within the framework region, which does not include the CDR regions, in over 80% of the stable nanobodies. Mutating residues within the framework of unstable nanobodies to match consensus residues in the stable nanobodies stabilized 26 of 42 nanobodies. Mutating consensus unstable residues stabilized another 11. Thus 37/42 unstable nanobodies were stabilized using this mutational approach. Further experiments provided evidence that some of the stabilized nanobodies still had some affinity for their targets. Furthermore, one stabilized nanobody was stable when expressed in the retina in vivo and 3 of 5 were stable when expressed in bacteria.

      1) This study provides a straightforward approach to improving the intracellular stability of nanobodies that could prove to be very useful for solving a common and vexing problem.

      Thanks!

      2) From the data provided, it was difficult to determine whether the binding affinity of the mutated nanobodies had been diminished by the mutations that increased stability, and if so, by how much. Furthermore, target binding affinity was assessed for just 5 nanobodies, which calls into question whether this strategy will be useful.

      It is the case that we are unable to guarantee that any nanobody stabilized by our consensus-based approach will retain full target-binding affinity. It is additionally not guaranteed that a given nanobody will be able to bind its target in cells in the absence of any mutagenesis, as paratope structure may be influenced/compromised in the intracellular environment. We are additionally limited in what we can test intracellularly, as the majority of current nanobodies target extracellular factors that cannot be effectively expressed intracellularly. What we provide is a rationale for limited impact on target binding via partial consensus mutagenesis, which excludes highly variable framework positions, most likely to contribute directly to binding, from mutagenesis. While our approach to generalizable intracellular stabilization may not be perfect for every nanobody, we believe it is likely to be a simple and useful approach in a variety of cases, and likely the majority of cases.

      3) Ultimately, the goal of expressing most nanobodies intracellularly is to bind to endogenous targets. It is difficult to assess how useful the stabilization strategy will be since it was not determined whether any of the stabilized nanobodies could bind their endogenous targets intracellularly.

      We are limited in the number of intracellular targets we are currently able to test, as most current nanobodies target extracellular antigens. Endogenous intracellular targets are even more limited. However, we agree that targeting endogenous targets is ultimately the goal. We have included an in vivo experiment against the endogenous target GFAP in our revised manuscript, where we show that binding is preserved following mutagenesis (Figure 6).

    1. Author Response

      Reviewer #1 (Public Review):

      Kohler and Murray present high-throughput image-based measurements of how low-copy F plasmids move (segregate) inside E. coli cell. This active segregation ensures that each daughter cell inherit equal share of the plasmids. Previous work by different labs has shown that faithful F-plasmid segregation (as well as segregation of many other low-copy plasmids, segregation of chromosomes in many bacterial species and segregation of come supramolecular complexes) require ParA and ParB proteins (or proteins similar to them) and is achieved by an active transport mechanism. ParB is known to bind to the cargo (plasmid) and ParA forms a dimer upon ATP binding that binds to DNA (chromosome) non-specifically and also can bind to ParB (associated with cargo). After ATP hydrolysis (stimulated by the interaction with ParB), ParA dimer dissociates to monomers and from ParB and the chromosome. While different mechanisms of the ParA-dependent active transport had been proposed, recently two mechanisms become most popular - one based on the elastic dynamics of the chromatin (Lim et al. eLife 2014, Surovtsev PNAS 2016, Hu et al Biophys.J 2017, Schumaher Dev.Cell 2017) and the other based on a theoretically-derived "chemophoretic" force (Sugawara & Kaneko Biophysics 2011, Walter et al. Phys.Rev.Lett. 2017).

      It is a minor comment, but we would like to point out that we do not consider these two model types as alternatives but rather as models with different levels of coarse-graining. Our interest is in the molecular-level (stochastic) models (Lim et al. eLife 2014, Surovtsev PNAS 2016, Hu et al PNAS 2015, Hu et al Biophys.J 2017, Schumacher Dev.Cell 2017).

      The authors start by following motion of F plasmid with one or two plasmids per cell and by analyzing plasmid spatial distribution, plasmid displacement (referred to as velocity) as a function of their relative position, and autocorrelations of the position and the displacement. They concluded that these metrics are consistent with 'true positioning' (i.e. average displacement is biased toward the target position - center for one plasmid and 1/4 and 3/4 positions for two plasmids ) but not with 'approximate positioning' (i.e. when plasmid moves around target position, for example, in near-oscillatory fashion). This 'true positioning' can be described as a particle moving on the over-dampened spring. They reproduce this behavior by expanding the previous model for 'DNA-relay' mechanism (Lim et al. eLife 2014, Surovtsev PNAS 2016), in which plasmid is actively moved by the elastic force from the chromosome and ParA serves to transmit this force from the chromosome to the plasmid. Now, the authors explicitly consider in the model that the chromosome-bound ParA can diffuse (which the authors refer as 'hopping') and this allows the model to achieve 'true plasmid positioning' for some combination of model parameters in addition to oscillatory dynamics reported in the original paper (Surovtsev PNAS 2016).

      Based on their computational model, the authors proposed that two parameters, diffusion scale of ParA = 2(2Dh/kd)1/2/L (typical length diffused by ParA before dissociation) and ratio of ParB-dependent and independent hydrolysis rates = kh/kd are key control parameters defining what qualitative behavior is observed - random diffusion, near-oscillatory behavior, or overdamped spring ('true positioning'). They vary this two parameters ~30- fold and ~200-fold range by changing Dh and kh respectively, to illustrate how dynamics of the system changes between these 3 modes of motion. While these parameters clearly play important role, the drawback is that the authors did not put either theoretical reasoning why these parameters are truly governing or showed it by varying other model parameters (kh, number of ParA NParA, spring constant of chromosome k, diffusion coefficient of the plasmid Dp) to show that only these combinations define the type of the system behavior. The authors qualitative analysis on importance of relies on the steady state solution for the diffusion equation for ParA. It is really unfortunate that no ParA distribution was measured simultaneously with the plasmid motion, as this would allow to compare experimental ParA profiles to expected quasi-steady-state solutions.

      We spend almost an entire section and a figure explaining the theoretical reasoning behind the identification of the $\lambda=s/(L/2n)$ as an important system parameter (section “Hopping of ParA-ATP on the nucleoid as an explanation of regular positioning” and Figure 2) and predicted that regular positioning could only occur for $\lambda>1$. This was confirmed by parameter sweeps for the cases of 1 (Figure 3I) and multiple plasmids (Figure 5-figure supplement 1), indicating that $\lambda$ is indeed an important system parameter and that our conceptual understanding of this aspect of the system is correct. This point has now been made clearer.

      However, we agree that the reasoning for $\epsilon$ (varied through the hydrolysis rate $k_h$) was not clear. It was chosen to allow us to modulate the ParA concentration at the plasmid compared to elsewhere, motivated by the differences between different ParABS systems. We originally had also considered a third quantity related to the number of nucleoid-bound ParA but we found that this had little effect on the nature of the dynamics. All three quantities describe how the timescale of a reaction/process (ParA hopping/diffusion across the nucleoid, ParB induced hydrolsysis, ParA association to the nucleoid) compares to the timescale of basal hydrolysis, which we use as a reference timescale.

      We have now made this clearer as well as adding supplementary figures showing the effect of varying other system parameters at several locations in the phase diagram (Figure 3-figure supplement 3 and 4). These sweeps justify our identification of $\epsilon$ and $\lambda$ as a useful/important set of quantities for determining the dynamics of the system.

      Additionally, we now add example kymographs showing the ParA distribution (Figure 3-figure supplement 2C).

      The authors also show by simulations that overdamped spring dynamics can transition into oscillatory behavior when decreases, for example by cell growth. Indeed, they observed more oscillatory behavior when they compared single-plasmid dynamics in the longer cells compared to the shorter cells. This was not the case in double-plasmid cells, in eprfect agreement with their analysis. They also calculated ATP consumption in the model and concluded that the system operates close but below (perhaps, "above" should be used as it refers to bigger ) the threshold to oscillatory regime which minimize ATP consumption. While ATP consumption analysis is very intriguing, this statement (Abstract Ln24-25) seems at odds with the authors own analysis that another ParA-dependent plasmid system, pB171, operates mostly in oscillatory regime, and it is actually for this regime the authors' analysis suggest minimal ATP-consumption (Fig. 8).

      To clarify, we found that pB171 (which in our hands has a copy number of 2-3 in the SR1 reduced-copy-number strain) is only clearly oscillatory in cells with a single plasmid (and only mildly so in cells with two plasmids). Otherwise, it behaves very similarly to F plasmid. We therefore believe that these two distantly related ParABS systems exhibit, overall, similar dynamics and differ only in how close the systems are to the threshold of oscillatory instability. This was not clear as we did not specify the copy number of pB171. We now provide this in Figure 7–figure supplement 1.

      We refer to these systems as lying just below, rather than above, the threshold of the oscillatory instability because, on average, plasmids do not oscillate but only do so in cells with the lowest plasmid concentration.

      I think the real strength of the paper is that it can potentially to show that if one considers that the intracellular cargo can be moved by the fluctuating chromosome via ParA-mediated attachments, then various dynamics can be achieved depending on combinations of several control parameters (plasmid diffusion coefficient, ParA diffusion coefficient, rate of hydrolysis and so on) including previously reported 'oscillations' (Surovtsev PNAS 2016), 'local excursions' (Hu et al Biophys.J 2017) and 'true positioning' (Schumaher Dev.Cell 2017). The main drawback (in this reviewer opinion) that this is obscured by the current presentation and discussion of this work and previous modelling work on ParA-dependent systems. For example, instead of using "unifying" potential of the presented model, yet another name 'relay and hopping' is used in addition to previously used 'DNA-relay', 'Brownian ratchet', 'Flux-based positioning', …

      In the abstract and discussion, we already refer to developing a “unified” model (p1 L21, p15 L22 of the original manuscript) and in the discussion we explain how our model contains other models as limiting cases. But we agree with this recommendation - the unifying nature of our model is its main strength. We now emphasise this more.

      Regarding the model name, we felt obliged to refer to the previous named models (DNA-relay and Brownian ratchet) and simply gave our model a name to avoid confusion when making comparisons. We have now removed almost all mention of ‘hopping and relay’ and just refer to ‘our model’. However, our gitlab repository with the code must have a name and therefore is still called ‘Hopping and relay’ and so the same term is used in Table 3.

      … and it appears that the presented model is an alternative to these previously published work. And only in model description (in Methods section) one can find that the "... model is an extension of the previous DNA-relay model (Surovtsev et al., 2016a) that incorporates hopping and basal hydrolysis of ParA and uses analytic expressions for the fluctuations rather than a second order approximation"(p.17, ln15-17).

      We are sorry that this reviewer felt that the fact that our model is an extension of DNA relay is hidden in the methods. However, we wrote in the main text:

      “Motivated by the previous discussion, we decided to develop our own minimal molecular model (‘hopping and relay’) of ParABS positioning, taking the DNA relay model as a starting point … The original scheme is as follows… We supplemented this scheme with two additional components: diffusion (hopping) of DNA-bound ParA-ATP dimers across the nucleoid (with diffusion coefficient Dh, where the subscript indicates diffusion of the home position) and plasmid-independent ATP hydrolysis and dissociation (with rate kd). See Material and Methods for further details of the model. “

      We now make this clearer.

      However, we would argue that as models of the same system, there are naturally overlaps and the models of Hu et al and Schumacher et al could also be thought of as extensions of the DNA relay model.

      While it is of course the authors right to decide how to name their model, it should be explicitly clear to the reader what is a real conceptual difference between presented and previous models from the abstract, introduction and discussion section of the paper, not from the "fine-print" details in the supplementary materials.

      The main conceptual difference is that we have identified the importance of having a finite diffusive length scale for ParA diffusion/hopping on the nucleoid. This allows both oscillations and regular positioning to occur for biologically relevant parameter values and reproduces the length dependent transition from mid-cell positioning to confined oscillations that we observe for F plasmid. The DNA relay model does not have this behaviour as the ParA diffusive length scale in zero while it is infinite in the models of Ietswaart et al 2014 and Schumacher et al 2017. The model of Hu et al 2017 does have a finite length scale but the authors appear not to have realised its importance and never discovered the regular positioning regime at \lambda >1. While we make these points in the discussion in the context of Figure 8A, where we compare our model to the others, we agree with this reviewer that we should have been more explicit in the abstract and introduction. We have now corrected this.

      This would allow to avoid unnecessary confusion (especially for the readers not directly involved into the modelling of ParA/B system) and clarify that all these models rely on the elastic behavior of fluctuating chromosome to drive active transport of the cargo. This reviewer believes that more explicit discussion on the models (one from the authors and previously published) differences and similarities will help with our understanding of how ParA-dependent system operate. This discussion should also include works on PomXYZ system, in which it was shown that similar dynamic system can lead to specific positioning within the cell (Schumaher Dev.Cell 2017, Kober et al. Biophys.J 2019). This will may it explicit that the models results have direct impact beyond the ParA-dependent plasmid segregation.

      To further clarify the differences between the models (beyond the second and third sections of the main text and the discussion), we have now added a section to the methods and a new table (Table 3). We have also included the mentioned PomXYZ model. However, we would like this was not the first stochastic model to have ‘true’ positioning as this reviewer cites above. Though they did not include the mechanism of force generation, the model of Ietswaart et al 2014 produces regularly positioned plasmids and is referenced repeatedly in Schumacher et al. 2017.

      I think that expanded parameter analysis, and explicit model comparison/discussion will make the contribution of this work to the field more clear and with the potential to advance our general understanding of how the same underlying mechanism can lead to various modes of intracellular dynamics and patterning depending on parameters combination.

      Reviewer #2 (Public Review):

      The work presented in this manuscript details an analysis of the partitioning of low copy plasmids under the control of the ParABS system in bacteria. Using a high throughput imaging set up they were able to track the dynamics of the partition complex of one to a few plasmids over many cell cycles. The work provides an impressive amount of quantitative data for this chemo-mechanical system. Using this data, the paper sought to clarify whether the dynamics of plasmids is due to regular positioning or noisy oscillations around a mean position. They supplement their experimental work with an intuitive model that combines elements of previous modelling efforts. Their model relies on diffusion of the ParA substrate on the nucleoid with the dynamics of the ParB partition complex being driven by the underlying elastic force due to the nucleoid on which the substrate is tethered. Their model dynamics depend on two parameters, the ratio of the length over which the substrate can explore to the characteristic length of the space and the ratio of stimulated to non-stimulated hydrolysis rates of the substrate. If the length ratio is large, ParA can fully explore the space before interacting with the ParB complex leading to balanced fluxes and regular positioning. If it gets reduced, for example by lengthening the cell, oscillations can emerge as fluxes of substrates become imbalanced and a net force can pull the partition complex.

      Strengths:

      Given the large amount of data, the observations unambiguously show that one particular ParABS system under the conditions studied is carrying out regular positioning of plasmids. The model synthesizes prior work into a nice intuitive picture. These model parameters can be fit to the data leading to estimates of molecular kinetic parameters that are reasonable and in line with other observations. Lining up the experimental observations with the phase space of the model suggests that the system is poised on the edge of oscillations, allowing for the system to have regular positioning with low resource consumption.

      Weaknesses:

      However, despite the correspondence of the simulated results with the experimental findings, other explanations are not completely ruled out. The paper emphasizes that ParA diffusion/hopping on the nucleoid is essential for the establishment of regular positioning and that without it, only oscillations were possible. Prior simulation efforts, that the paper cites, which include ParA diffusion and mixing in the cytosol but no diffusion on the nucleoid have shown that regular positioning is possible and that oscillations could get triggered as the system lengthened. Thus ParA hopping is not a necessity for regular positioning (as claimed in the paper), but very well might be needed for the given kinetic parameters of the system studied here.

      We now comment on this result. In short, we believe that the mentioned model/regime is not relevant due to stochastic effects. We are not able to produce, with biological relevant parameters, regular positioning without ParA hopping.

      The paper also presents experimental results for a second ParABS system (pB171) that is more likely to show oscillations. They attribute the greater likelihood of oscillations for pB1717 being due to ParA exploring a smaller space than the F plasmid system that showed regular positioning. This is pure conjecture and the paper does not provide any evidence that this is the reason. Thus it is hard to conclude if oscillations may not be due to other factors.

      We do not explicitly make that claim. We did have a point in the phase diagram of Figure 8A representing pB171 with a lower value of lambda than F plasmid and stated “The location of pB171 is an estimate based on a qualitative comparison of its dynamics”. We agree this was unclear.

      We now indicate the region that has oscillations with roughly the same period as single plasmids of pB171. We also make it clear that we speculate, but have not shown, that the length scale of ParA hopping is smaller than for F plasmid.

      An important point here is that we can explain both oscillations and regular positioning in the same model with the same kinetic parameters, the regimes being determined by the cell length and plasmid number in a manner consistent with experimental observations.

    1. Author Response

      Reviewer #1 (Public Review):

      This work sheds light on the adverse effects of Bacillus thuringiensis, a strong pathogenic bacteria used as a microbial pesticide to kill lepidopteran larvae that threaten crops, on gut homeostasis of non-susceptible organisms. By using the Drosophila melanogaster as a non-susceptible organism model, this paper reveals the mechanisms by which the bacteria disrupt gut homeostasis. Authors combined the use of different genetic tools and Western blot experiments to successfully demonstrate that bacterial protoxins are released and activated throughout the fly gut after ingestion and influence intestinal stem cell proliferation and intestinal cell differentiation. This phenomenon relies on the interaction of activated protoxins with specific components of adherens junctions within the intestinal epithelium. Due to conserved mechanisms governing intestinal cell differentiation, this work could be the starting point for further studies in mammals.

      The conclusions proposed by the authors are in general well supported by the data. However, some improvements in data representation, as well as additional key control experiments, would be needed to further reinforce some key points of the paper.

      We thank reviewer1 for her appreciation of the work and in depth analysis of the data. We agree with all her comments and believe the suggestions significantly improved the manuscript.

      1) Figure 1 and others: Several graphs in the manuscript show the number of cells/20000µm2. How is the shape of the gut in the different conditions studied in this manuscript? The gut shape (shrunk gut versus normal gut for example) could influence the number of cells seen in a small area. For example, the number of total cells quantified in a small area (here 20000µm2) of a shrunk gut can be increased while their size decrease. As a result, the quantification of a specific cell type in a small region (here 20000µm2) can be biased and not represent the real number of cells present in the whole posterior part of the R4 region. Would it make sense to calculate a ratio "number of X cells/number of DAPI positive cells per 20000µm2"?

      We provided a suitable answer in the "Essential Revisions point 1" corresponding to this reviewer's concern. To summarize, we have now added whole posterior midgut images in the different conditions to highlight the intestinal morphology (Figure 1-figure supplement 1A). The whole gut morphology was not affected by the different challenges we performed. Indeed, we used low doses of spores and/or toxins in order to mimic "natural" amounts of spores/toxins the fly can eat in the environment and in order to avoid drastic gut lining disturbances.

      We have also added the cell type ratio in figure 1- figure supplement 2.

      2) Figure 4: Is it possible that Arm staining is less intense between ISC and progenitors after ingestion of the bacteria due to the fact there is a high rate of stem cell proliferation? Could it be an indirect effect of stem cell proliferation rather than the binding of the toxins to Cadherins?

      We thank the reviewer for this pertinent comment. Indeed, for this reason, we compared the intensity of Arm expression at the junction between neighboring progenitors with the Arm intensity around the rest of the cellular membranes and calculated the ratio between both values (see Figure 4-figure supplement 1F-G for an illustration of how we proceeded and the new section in the Material and Methods 736-742). Using this method, even if the whole Arm staining intensity is different (in all the midgut), the ratio reflects the internal cell-cell interaction changes between the two neighboring cells. Moreover, we have observed that Arm staining (using the usual monoclonal antibody N2 7A1 from the DSHB) was very variable from one midgut to another in the same feeding/intoxication condition. So, we do not want to draw conclusion about the whole Arm intensity due to this variability whatever are the intoxication conditions. Finally, the challenged guts always displayed a more disorganized epithelium due to cell proliferation and differentiation. Consequently, Arm staining in ECs and progenitor cells are found in the same focal plane while in unchallenged and well-organized guts, Arm staining in ECs is above the focal plane of Arm staining in progenitor cells. This likely leads to the impression that Arm staining is more intense in challenged midguts. This method description is now added in the Material and methods section (lines 736-742).

      Could the authors use the ReDDM system to distinguish between "old" and newly formed cells? This could be a good control to make sure that the signal is quantified in similar cells between the control and the different conditions.

      We have analyzed intensity of Arm expression between pairs of GFP cells. Most of these pairs arose from de novo divisions. Indeed, as shown in control conditions (water) with Dl-ReDDM (for example see figure 1-figure supplement 1D), pairs of GFP cells (ISC-ISC) are rare. Most pairs correspond to ISC-EB or ISC-EEP pairs with the progenitor marked by the RFP, meaning that it just arises from the GFP+ mother ISC. Therefore we assume, that in the esg>GFP genotype, pairs of GFP+ cells correspond to one ISC and one progenitor (see Figure 4 – figure supplement 1A-A'). Therefore, when we analyzed the Arm intensity between pairs of GFP cells after intoxication, these cells are very likely "newborn" cells. Even if we suppose there are ISCs and progenitors that remain stuck together for a long time (for instance several days), Cry1A toxins can also be able to disrupt their cell junction. In the context of Cry1A toxin activity, it seems important to analyze the whole impact on cell-cell junctions without discriminating old and new cell-cell interactions.

      We tried to use anti-Arm and anti-Pros double staining to mark new EEPs. Unfortunately, anti-Arm and anti-Prospero antibodies were both raised in mice. Co-staining with both antibodies give rise to bad labelling either for Arm or for Prospero or for both. Our first author spent lot of energy trying to set up good conditions but unfortunately this was unsuccessful.

      Here is an example of what we got (this was the best image we got) with esg>GFP flies fed with water (control) and labelled for Arm and Pros in red. White arrows point two EEPs. Red arrows points the Arm staining between two precursors (ISC/ISC or ISC/EB or EB/EB). It was extremely hard to identify junctions marked by Arm between EEPs and ISCs because the Pros staining was too strong.

      Another example with flies fed with spores of SA11 (increasing the number of EEs). In green is the esg>GFP and in Red Arm and Prospero. The right panel correspond to the single red channel (Arm/Prospero).

      Nevertheless, we have now performed a similar analysis in an esg>GFP, Shg::RFP background and analyzed Shg::RFP (Tomato::DE-Cadherin) labelling intensity. We found similar results that are presented in the new Figure 4 (data we Arm have been moved in Figure 4-figure supplement 1). This last analysis have been included in the text lines 285-299.

      Figure 4E' and 4G': Arm staining seems more intense when looking at the whole membrane levels of cells compared to control. Is it possible that the measured ratio contact intensity/membrane intensity presented in Figure 4I could be impacted and not reflect the real contact intensity between ISC and progenitor cells?

      Please check our answer just above: "…//… we have observed that Arm staining (using the usual monoclonal antibody N2 7A1 from the DSHB) was very variable from one midgut to another in the same feeding/intoxication condition. So, we do not want to draw conclusion about the whole Arm intensity due to this variability whatever are the intoxication conditions".

      See also our intensity measurement method described above to avoid bias: "…//… we compared the intensity of Arm expression at the junction between neighboring progenitors with the Arm intensity around the rest of the cellular membranes and calculated the ratio between both values (see Figure 4-figure supplement 1F-G for an illustration of how we proceeded and the new section in the Material and Methods 736-742). Using this method, even if the whole Arm staining intensity is different (in all the midgut), the ratio reflects the internal cell-cell interaction changes between the two neighboring cells."

      What is the hypothesis of the authors about the decrease of Arm or DE-Cad seen after bacterial/crystal ingestion? Does the interaction between the toxins and DE-Cad induce a relocation of DE-Cad?

      It has been shown that E-Cadherin could be recycled when adherens junctions are destabilized both in Drosophila and mammals(Buchon et al., 2010; O'Keefe et al., 2007; Tiwari et al., 2018). To investigate this possibility, we tried to analyze DE-Cad cytoplasmic relocalization using anti-DE-Cad immunostaining (DCAD2 antibody from DSHB) as well as Shg::RFP (Bloomington stock #58789) or Shg::GFP (Bloomington stock #60584) endogenous fusion. Unfortunately, we did not see obvious differences. Nevertheless, we have now added the split channels of the Shg::RFP labelling in the different conditions in Figure 4A-D'. Nevertheless, we are still interested in the behavior of the DE-cadherin (and signaling, see (Liang et al., 2017)) upon binding of the Cry1A toxin. N. Zucchini-Pascal (author in this article) are currently investigating this question.

      The authors should add more details about the way to quantify in the Material and methods section. How many cells have been quantified per intestine? How did they choose the cells where they quantified the contact intensity?..etc

      These details were missing in the methods and we thank the reviewer for highlighting this issue. We added these information to the methods (lines 725-742). The number of cell pairs analyzed was present in the raw data related to figure 4 but absent from the main figure and legend. It is now rectified. We only measured the intensity in isolated pairs of cells.

      Figure 4B, D, F and H: How did the authors recognize the ISCs?

      We agree with the reviewer comment. We cannot recognize ICS per se. Green cells correspond either to ISCs or to EBs. We modified the text accordingly (lines 285-287).

      Could the authors do quantifications of DE-Cad signal?

      This has been done. It is shown now in figure 4E and in Table 1. We also adapted the text (lines 289-299) to fine-tune our interpretation in light of this new analysis. Indeed, what we have now defined as "mild" adherens junction intensity is between the ratio 1.4 and 1.6 instead of the previous ratio (1.3 to 1.6), because we observed most of the EEP progenitors arising from cell displaying a junction intensity with their mother cells below the 1.4 ratio (see Table 1).

      Like Arm staining, the staining seems stronger at the whole membrane level in F and H compared to the control.

      As we described above for Arm staining, the intensity of Tomato::DE-Cad labelling can differ from one posterior midgut to another one. One simple explanation would be related to changes in the structure of midgut epithelium which is well organized in unchallenged conditions, while in challenged midguts the epithelial cells are not well-arranged anymore due to rapid cell proliferation and differentiation. Consequently, DE-Cad labelling in ECs is at the same level as that in ISC/progenitors cells, giving the impression that the labelling is stronger.

      3) Figure 5: How is the stem cell proliferation upon overexpression of DE-Cad in control or upon bacteria/crystals ingestion? Do the authors think that the decrease of Pros+RFP+ new cells upon overexpression of DE-Cad could result from a decrease of stem cell proliferation?

      Great suggestion. Thereby, we chose to count the progenitor cells (GFP+ cells) reflecting the ISC division during the last 3 days. Moreover, this also has the advantage of working on the same pictures (samples) used for all the analyzes shown in figure 5 and Figure 5-figure supplement 1. Hence, If we consider the number of GFP+ cells (esg expressing cells corresponding to ISC, EB or EEP) in challenged midguts, the overexpression of the DE-Cad did not seem to alter ISC division. In addition, we still observed more GFP+ cells when the midguts were challenged with SA11 or crystals than with BtkCry, in agreement with the rate of ISC division observed in the WT genetic background shown in figure 1B.

      We have now added the counting of GFP+ cells in Figure 5-figure supplement 1E. The text has been modified to integrate this results (lines 306-308).

      Did the authors quantify the % of new ECs in the context of overexpression of DE-Cad?

      The data has been added in figure 5F. The text has been modified to integrate this result lines 312-313.

      Figure 5F: As asked before, did the authors distinguish the signal between newly born cells and the signal between older cells?

      In the new figure 5G: we used the esg-ReDDM system that is very efficient. Almost all ISC and progenitors express the GFP. The counting have been done between cell pairs that express both the GFP and RFP. It is specified in the text lines 310-311. Nevertheless, we cannot distinguish between new and old cells here. Indeed, the esg-ReDDM system induce both the GFP and the RFP in all esg+ cells (the old ones and the new ones). Hence, if a division has occurred just before the induction of the system to give birth for instance to an ISC and an EB, both cells will express the GFP and the RFP. But should we consider those pairs of cells as old cells or new cells? Noteworthy, as we analyzed the intensity of junctions 3 days after intoxication and induction of the ReDDM system, we assume that the pairs of GFP+/RFP+ cells arose after the induction of the system. Indeed, to our knowledge, nobody has shown in the posterior midgut, that a progenitor remains stuck to its mother ISC as long as 3 days. Even if we assume that this event can occur, Cry1A toxins can also be able to disrupt their cell junction.

      We now have removed the DAPI channel and added the RFP+ channel in Figure 5-figure supplement 1A-D' (previously the Figure S4A-D) to illustrate this explanation and to facilitate the interpretation by the reader.

      It would be interesting to compare the junction intensity between mother ISCs and their daughter progenitors before and after intoxication in a same intestine. But we think that this event is quite rare because of the experimental conditions we used (i.e. analyses 3 days after the induction of the ReDDM/intoxication).

      The same experiments (stem cell proliferation + quantification of the % of new ECs) could be also done when authors overexpress of the Connectin, supplemental figure 5. This would be another control to conclude that the effects on cell differentiation are specific due to the interaction between DE-Cad and the toxins.

      We have added the analyses in Figure 5 - figure supplement 2J and K.

      The text has been completed lines 317-320.

      In the "crystals" condition, the overexpression of Connection seems to partially rescue the increase % of new Pros+RFP+ new cells observed in Figure 3F (Figure S5I compared to Figure 3F).

      Yes, we agree with the reviewer comment. In an esg-ReDDM background (figure 3F), crystals induced a much greater increase in EE numbers than did SA11 spores. However, in a WT or esg>GFP background, crystals induced a similar increase in EE/EEP to that induced by SA11 spores. So we do not yet have explanation excepted the genetic background of the esg-ReDDM.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors use the nanobody tools generated in the companion manuscript and have combined them with DNA-Paint oligonucleotide labeling to generate super-resolution images of indirect flight muscles. Using this approach, they could map the precise organization of the different domains from the two giant titin-like fly homologs called Sallimus and Projectin against which the nanobodies had been raised with a precision ranging from 1 nm to 4 nm, depending on the distance between them. They show that in indirect flight muscles the N-ter of Sallimus is located within 50 nm of the Z-disc, and that its C-ter reaches the A-band roughly 100 nm away from the Z-disc. Likewise, they show that the N-ter of Projectin colocalizes with the C-ter of Sallimus at the edge of the A-band, whereas its C-ter is located about 250 nm away in the A-band and 350 nm from the Z-disc. It overall suggests a staggered and linear organization of both proteins with a potential area of overlap spanning 10-12 nm, that Sallimus could bridge the Z-disc to the A-band acting as a ruler, while Projectin should only overlap with 15% of the A-band and possibly a 10 nm of the I-band.

      Thanks for this nice summary of our findings.

      The value of this work comes from its use of advanced technologies (DNA-Paint + superresolution). The biological conclusions confirm and refine earlier and recent papers, especially EM papers and the impressive and very comprehensive JCB paper by Szikora et al in 2020, although the conclusions of the present work differ somewhat from those of Szikora who had predicted that Sallimus does not reach the A-band. That aspect could have been better discussed.

      We have further extended our discussions of the results from Szikora et al. 2020, in particular regarding Sallimus in this revised version.

      Reviewer #2 (Public Review):

      Taking advantage of the high molecular order of the Drosophila flight muscle, Schueder, Mangeol et al. leverage small (<4 nm) original nanobodies, tailored coupling to fluorophores, and DNA-PAINT resolution capabilities, to map the nanoarchitecture of two titin homologs, Sallismus and Projectin.

      Using a toolkit of nanobodies designed to bind to specific domains of the two proteins (described in the companion article "A nanobody toolbox to investigate localisation and dynamics of Drosophila titins" ), Schueder, Mangeol et al position these domains within the sarcomere with <5nm resolution, and demonstrate that the N-ter of Sallismus overlaps with the C-ter of Projectin at the A-band/I-band interface. They propose this architecture may help to anchor Sallismus to the muscle, thus supporting flight muscle function while ensuring muscle integrity.

      This study nicely extends previous work by Szikora et al, and precisely dissect the the sarcomeric geography of Sallismus and Projectin. From these results, the authors formulate specific functional hypotheses regarding the organization of flight muscles and how these are tuned to the mechanical constraints they undergo.

      Although they remain descriptive in essence, the conclusions of the paper are well supported by the experimental results.

      We thank this reviewer for the nice summary of our results.

      Reviewer #3 (Public Review):

      This manuscript by Schueder et al. provides new insight into an important question in muscle biology: how can the smaller titin-like molecules of the much larger sarcomeres of invertebrate muscle perform the same function as the larger titin of vertebrate muscles which have smaller sarcomeres? These functions include the assembly, stability and elasticity of the sarcomere. Using two state of the art methods--nanobodies and DNA-PAINT superresolution microscopy, the authors definitively show that in the highly ordered indirect flight muscle of Drosophila, the elongated proteins Sallimus and Projectin are arranged such that the N-terminus of Sallimus is embedded in the Z-disk, and the C-terminus is embedded in the outer portion of the A-band, and that in this outer portion of the A-band is also embedded the C-terminus of Projectin; thus, if the C-terminus of Sallimus can bind to thick filaments, and/or these overlapping portions of Sallimus and Projectin interact, there would be a linkage of the Z-disk and/or thin filament to the thick filaments to help determine the length and stability of the sarcomere.

      The strengths of this paper include the implementation of nanobody and DNA-PAINT superresolution microscopy for the first time for muscle. The extraordinary 5-10 nm resolution of this method alloiws imaging for definitive localization of the termini of these elongated proteins in the Drosophila flight muscle sarcomere. In addition, the manuscript is well written with sufficient background information and rationale presented, is easy to read, complex new methods are well-described, the figures are of high quality, and the conclusions are well-justified. A minor weakness is that despite the authors demonstrating that the Cterminus of Sallimus is located at the outer edge of the A-band, and that the N-terminus of Projectin is located also in the outer edge of the A-band, the authors provide no data to show whether, for example, these portions of these titin-like molecules interact, or whether Sallimus might interact with thick filaments. Such data would be required to prove their model. However, I can understand that this would require extensive additional study, and the authors have already provided a tremendous amount of data for this first step in supporting the model. Nevertheless, the authors should cite a relevant previous study on the Sallimus homolog in C. elegans called TTN-1, which is also a 2 MDa polypeptide of similar domain organization to at least the large isoforms of Salliums found in fly synchronous muscles. In the study by Forbes et al. (2010), immunostaining, albeit not to the impressive resolution achieved in the present paper, showed that TTN-1 was also localized to the I-band with extension into the outer edge of the A-band. More importantly, that study also showed that "fragment 11/12", Ig38-40, which is located fairly close to the C-terminus of TTN-1 binds to myosin with nanomolar affinity (Kd= 1.5 nM), making plausible the idea that TTN-1 may bind to the thick filament in vivo.

      We thank this reviewer for sharing his enthusiasm about our results and methodology, and also about the way the data are presented. This is one more argument for us to leave a shortened Figure 1 in the PAINT manuscript.

      We are particularly thankful for pointing out the important C. elegans data that we had missed and that, as the reviewer said, perfectly fit with the model we propose for flight muscle (and also the larval muscle data, as the C-term of Sls is the same). Hence, we highlight this paper now in our discussion and compare to our findings.

      Reviewer #4 (Public Review):

      This manuscript reports combining recently developed and described in the accompanying paper nanobodies against Sallimus and Projectin with DNA-Paint technology that allows super-resolution imaging. Presented data prove that such a combination provides a powerful system for imaging at a nano-scale the large and protein-dense structures such as Drosophila flight muscle. The main outcome is the observation that in flight muscle sarcomeres Salimus and Projectin overlap at the I/A band border. This was elegantly achieved using double color DNA-Paint with Sls and Projectin nanobodies.

      We thank the reviewer for appreciating the quality of our work.

      Overall, as it stands, this manuscript even if of high technological value, remains entirely descriptive and short in providing new insights into muscle structure and architecture. The main finding, an overlap between short Sls isoform and Proj in flight muscle sarcomeres, is redundant with the author's observation (described in the companion paper "A nanobody toolbox to investigate localisation and dynamics of Drosophila titins") that in larval muscles expressing a long Sls isoform, Sls and Proj overlap as well.

      Alternatively, combination of Sls and Proj nanobodies with DNA-Paint represents an interesting example of technological development that could strengthen the accompanying nanobodies toolkit manuscript.

      Every structural paper reports the structure and is thus by definition descriptive. This is the aim of our manuscript. We do not think that the other nanobody resource paper reports an overlap of Sls and Projectin in the larvae. To resolve such a possible overlap, super resolution would be needed. The other paper does report that larval Sls isoform is dramatically stretched, more than 2 µm, and that Projectin is decorating the thick filament, likely in an oriented manner. If N-term of Projectin overlaps with C-term of Sallimus in this muscle is an open question that needs DNA-PAINT imaging of larval muscle. This requires a TIRF setting that is technically not trivial to achieve for larval muscle and hence has not been done by anybody.

    1. Author Response

      Reviewer #2 (Public Review):

      Point 1: The transcriptomic analysis of E12.5 endocardial cushion cells in the various mouse models is informative in the extraction of Igf2- and H19-specific gene functions. In Fig. 6D, a huge sex effect is obvious with many more DEGs in female embryos compared to males. How can this be explained given that Igf2/H19 reside on Chr7 and do not primarily affect gene expression on the X chromosome? Is any chromosomal bias observed in the genomic distribution of DEGs?

      We examined chromosomal distribution of DEGs between WT and +/hIC1 (Supplemental Figure 6D) and did not see any bias on X chromosome. We described this result on lines 278-280: “Although the number of +/hIC1-specific DEGs largely differed between males and females, there was no sex-specific bias on the X chromosome (Supplemental Figure 6D).” Additionally, we agree with the reviewer that it is noteworthy that the dysregulated H19/Igf2 expression affected transcriptome in a sex-specific manner, especially when the mutation is located on a somatic chromosome. Although investigating the role of hormones versus sex chromosome in these effects would be quite interesting, it is beyond the scope of current study.

      Point 2: A separate issue is raised by Fig. 6E that shows a most dramatic dysregulation of a single gene in the delta3.8/hIC1 "rescue" model. Interestingly, this gene is Shh. Hence, these embryos should exhibit some dramatic skeletal abnormalities or other defects linked to sonic hedgehog function.

      The reason why Shh appeared to be differentially expressed between wild-type and d3.8/hIC1 samples was that Shh expression was 0 across all the samples except for two wild-type samples. In order to detect all the DEGs that might be lowly expressed, we did not want to filter DEGs based on the level of total expression. As a result, Shh was represented as significantly differently expressed in d3.8/hIC1 samples, although its expression in our samples appears to be too low to have any significant effect on development. This explanation was added to lines 310-312. To confirm that this was an exceptional case, we analyzed the expression of DEGs obtained from other pairwise comparisons. In the volcano plots below, genes of which expression is not statistically different between two groups are marked grey. Genes of which expression is statistically different and detected in both groups are marked red. Genes with statistically different but not detected in one group at all, such as Shh, are marked blue (Figure G). It is clear that that almost all of our DEGs are expressed consistently across the groups, and genes with no expression detected in one group are very rare.

      Point 3: The placental analysis needs to be strengthened. Placentas should be consistently positioned with the decidua facing up, and the chorionic plate down. The placentas in Fig. 3F are sectioned at an angle and the chorionic plate is missing. These images must be replaced with better histological sections.

      As requested, we have replaced placental images with better representative sections (Figure 3F and 4E). In addition, we have improved alignment of placental histology figures.

      Point 4: The CD34 staining has not worked and does not show any fetal vasculature, in particular not in the WT sample.

      As requested, we have replaced the CD34 vascular stained images with those that better represent fetal vasculature (Figure 3G).

      Point 5: The "thrombi" highlighted in Fig. 4E are well within the normal range, to make the point that these are persistent abnormalities more thorough measurements would need to be performed (number, size, etc).

      As requested, we measured the number and relative size of the thrombi that are found in dH19/hIC1 placentas with lesions. No thrombi were found in wild-type placentas whereas an average of 1.3 thrombi were found in six dH19/hIC1 placentas. The size of the thrombi widely varied, but occupied average of 2.58% of the labyrinth zone where these lesions were found (Supplemental Figure 4D). Additionally, we replaced the image in Figure 4E into the section that better represents the lesion.

      Point 6: The statement that H19 is disproportionately contributing to the labyrinth phenotype (lines 154/155) is not warranted as Igf2 expression is reduced to virtually nothing in these mice. Even though there is more H19 in the labyrinth than in the junctional zone, the phenotype may still be driven by a loss of Igf2. Given the quasi Igf2-null situation in +/hIC1 mice, is the glycogen cell type phenotype recapitulated in these mice, and how do glycogen numbers compare in the other mouse models?

      The sentence was edited in line 157. We performed Periodic acid Schiff (PAS) staining on +/hIC1 placentas to address if glycogen cells are affected by abnormal H19/Igf2 expression (Supplemental Figure 1E). In contrary to previous reports where Igf2-null mice had lower placental glycogen concentration (Lopez et al., 1996) and H19 deletion led to increased placental glycogen storage (Esquiliano et al., 2009), our quantification on PAS-stained images showed that the glycogen content is not significantly different between wild-type and +/hIC1 placentas. We have described this result in lines 166-168.

      Point 7: How do delta3.8/+ and delta3.8/hIC1 mice with a VSD survive? Is it resolved some time after birth such that heart function is compatible with postnatal viability? And more importantly, do H19 expression levels correlate with phenotype severity on an individual basis?

      Our study was limited to phenotypes prior to birth, thus postnatal/adult phenotypes were not examined. Because the VSD showed only partial penetrance in these mice, we cannot state that the d3.8/+ or d3.8/hlC1 mice with VSDs survive. It has also been previously reported in another mouse model with incomplete penetrance of a VSD that the mice which survived to adulthood did not have the VSDs (Sakata et al., 2002). We find it highly unlikely that either mouse model would survive significantly past the postnatal timepoint with a VSD. We have examined two PN0 d3.8/hIC1 neonates, and both did not have VSD.

      Regarding the second point, the only way to quantitatively address this question would be to do qPCR or RNA-seq on individual hearts, which then makes it impossible for those hearts to be examined for histology to confirm the VSD. Thus, hearts used to identify VSDs via histology could not also be used for quantitative H19 measurements. One thing to note is that the H19/Igf2 expression in independent replicates of d3.8/hIC1 cardiac ECs used in our RNA-seq experiment is quite variable, not clustering together in contrast to other mouse models used in this study (Fig. 6A). Such wide range of variability in the extent of H19/Igf2 dysregulation suggests that H19/Igf2 levels could have an impact on the penetrance or the severity of the VSD phenotype in d3.8/hIC1 embryos.

    1. Author Response

      Reviewer #2 (Public Review):

      Zylbertal and Bianco propose a new model of trial-to-trial neuronal variability that incorporates the spatial distance between neurons. The 7-parameter model is attractive because of its simplicity: A neuron's activity is a function of stimulus drive, neighboring neurons, and global inhibition. A neuroscientist studying almost any brain area in any model organism could make use of this model, provided that they have access to 1) simultaneously-recorded neurons and 2) the spatial locations of those neurons. I could foresee this model being the de-facto model to compare to all future models, as it is easy to code up and interpret. The paper explores the effectiveness of this distance model by modeling neural activity in the zebrafish optic tectum. They find that this distance-based model can capture 1) bursting found in spontaneous activity, 2) ongoing co-fluctuations during stimulus-evoked activity, and 3) adaptation effects during prey-catching behavior.

      Strengths:

      The main strength of the paper is the interpretability of the distance-based model. This model is agnostic to the brain area from which the population of neurons is recorded, making the model broadly applicable to many neuroscientists. I would certainly use this model for any baseline comparisons of trial-to-trial variability.

      The model is assessed in three different contexts, including spontaneous activity and behavior. That the model provides some prediction in all three contexts is a strong indicator that this model will be useful in other contexts, including other model organisms. The model could reasonably be extended to other cognitive states (e.g., spatial attention) or accounting for other neuron properties (such as feature tuning, as mentioned in the manuscript).

      The analyses and intuition to show how the distance-based model explains adaptation were insightful and concise.

      We thank the reviewer for these supportive comments.

      Weaknesses:

      Model evaluation and comparison: The paper does not fully evaluate the model or its assumptions; here, I note details in which evaluation is needed. A key assumption of the model - that correlations fall off in a gaussian manner (Fig. 1C-E - is not supported by Fig. 1C, which appears to have an exponential fall-off. Functions other than gaussian may provide better fits.

      A key feature of our model is that connection strengths smoothly decrease with distance. However, we did not intend to make strong claims about the exact function parametrizing this distance relationship. In light of the reviewer’s comment, we have additionally tested an exponential function and find that it too can describe activity correlations in OT with a negligible decrease in r2 (Figure 1 – figure supplement 1A-C). The main purpose of the analysis was to show that the correlation is maximal around the seed and decays uniformly with distance from it (i.e. no sub-networks or cliques are detected). We have emphasized this in a revised conclusion paragraph and note that while multiple functions can be used to parameterize the relationship, they are nonetheless certainly simplifications. Secondly, we also ran a version of the network simulation where the connections decay in space according to an exponential rather than Gaussian function and show that, as expected, tectal bursting is robust to this change.

      Furthermore, it is not clear whether the r^2s in Fig. 1E are computed in a held-out manner (more details about what goes into computing r^2 are needed).

      These values are computed by fitting the 2-d Gaussian (or exponential function) to all neurons excluding the seed itself (added a short clarification in the Methods).

      Assessing the model based on peak location alone (Fig. 1E) is not sufficient, as other smooth monotonically-decreasing functions may perform similarly.

      As discussed above, an exponential function indeed performs similarly to a Gaussian. However, goodness of fit is secondary to the main aim of Fig 1E, which is to show that the correlation peak tends to fall near the seed cell.

      Simulating from the model greatly improves the reader's understanding (Fig. 2D), but no explanation is given for why the simulations (Fig. 2D) have almost no background spikes and much fewer, non-co-occurring bursts than those of real data (Fig. 2E).

      In part this is because the simulation results depicted in Fig 2D were derived from the ‘baseline model’, prior to optimizing to match biological bursting statistics. It is thus expected that activity will differ from experimental observation and was our main motive to tune the model parameters (now emphasized in the text). However, the model will certainly not account for all aspects of tectal activity; rather, it was designed to reproduce bursting as a prominent feature of ongoing activity and in the second part of the paper we explore the extent to which it can account for other phenomena. As noted above, in the revised abstract, introduction and discussion we have tried to clarify the motivation for developing the model and how it was used to gain insight into activity-dependent changes in network excitability.

      A key assumption of the distance model (Fig. 2A) is that each neuron has the same gaussian fall-off (i.e., sigma_excitation and sigma_inhibition), but it is unclear if the data support this assumption.

      We intentionally opted for a simple model (i.e. described by few parameters), in part due to the lack of connectivity data and additionally to set a lower bound on the extent to which multiple features of tectal activity could be accounted for. More complex models with additional degrees of freedom (such as cell-specific connectivity) may well describe the data better, but likely at the cost of interpretability. We consider such extensions are beyond the scope of the present study but might be fruitful avenues for future research.

      Although an excitatory and inhibitory gain is assumed (Fig. 2A), it is not clear from the data (Fig. 1C) that an inhibitory gain is needed (no negative correlations are observed in Fig. 1C-D).

      This is now explored in the revised Figure 3A which includes the condition of zero inhibition gain. See also response to reviewer 1.

      After optimization (Fig. 3), the model is evaluated on predicting burst properties but not evaluated on predicting held-out responses (R^2s or likelihoods), and no other model (e.g., fitting a GLM or a model with only an excitatory gain) is considered. In particular, one may consider a model in which "assemblies" do exist - does such an assembly model lead to better held-out prediction performance?

      The model we developed is a mechanistic, generative model. In contrast to Pillow et al 2008, we did not fit the model to data but rather we used it to simulate network activity and tuned the seven parameters (using EMOO) to best match biological observations. Thus, rather than assessing goodness-of-fit using cross-validation, our approach involved comparison of summary statistics related to the target emergent phenomenon (tectal bursting). This was necessary as bursting appears highly stochastic. Further to the comments above, we have expanded the parameter space to include instances with only an excitatory gain (where bursting failed) and no distance-dependence (again, busting failed). Introducing assemblies into the model will inevitably support bursting (and introduce many more free parameters), but one of our key observations is that such assemblies are not required for this aspect of spontaneous activity. Again, our aim was not to produce a detailed picture of tectal connectivity, but rather to develop a minimal model and estimate the extent to which it can account for observed features of activity. Note that the second half of the paper (Figure 4 onwards) shows the model can explain phenomena that were not considered during parameter tuning.

      It is unclear why a genetic algorithm (Fig. 1A-C) is necessary versus a grid search; it appears that solutions in Generation 2 (Fig. 3C, leftmost plot, points close to the origin) are as good as solutions in Generation 30 and that the spreads of points across generations do not shrink (as one would expect from better mutations). Given the small number of parameters (7), a grid search is reasonable, computationally tractable, and easier to understand for all readers (Fig. 3A).

      Perhaps in hindsight a grid search would have worked, but at increased computational cost (each instantiation of the model is computationally expansive). At the time we chose EMOO, and since it produced satisfactory results, we kept it. As often happens with multi-objective optimization, an improvement in one objective usually happens at the expense of other objectives, so the spread of the points does not shrink much but they move closer to the axes (i.e. reduced error). The final parameter combination is closer to the origin than any point in generation 2, though admittedly not by much. Importantly, however, optimizing the model using the training features generalized to other burst-related statistics.

      It is unclear why the excitatory and inhibitory gains of the temporal profiles (Fig. 3I) appear to be gaussian but are formulated as exponential (formula for I_ij^X in Methods).

      The interactions indeed have exponential decay in time. These might appear Gaussian because the axis scale is logarithmic.

      Overall, comparing this model to other possible (similar) models and reporting held-out prediction performance will support the claim that the distance model is a good explanation for trial-to-trial variability.

      See comments above. A key point we want to stress is that we intentionally explored a minimal network model and found that, despite obvious simplifications of the biology, it was nonetheless able to explain multiple aspects of tectal physiology and behaviour. We hope that it inspires future studies and can be extended, in parallel to experimental findings, to more accurately represent the cell-type diversity and cell-specific connectivity of the tectal network.

      Data results: Data results were clear and straightforward. However, the explanation was not given for certain results. For example, the relationship between pre-stimulus linear drive and delta R was weak; the examples in Fig. 4C do not appear to be representative of the other sessions. The example sessions in Fig. 4C have R^2=0.17 and 0.19, the two outliers in the R^2 histogram (Fig. 4D).

      The revised figure 4 is based on new data and new analysis (see below), and the presented examples no longer represent the extreme tail of the distribution (they still, however, represent strong examples, as is now explicitly indicated in the figure legend).

      The black trace in Fig. 4D has large variations (e.g., a linear drive of 25 and 30 have a change in delta R of ~0.1 - greater than the overall change of the dashed line at both ends, ~0.08) but the SEMs are very tight. This suggests that either this last fluctuation is real and a major effect of the data (although not present in Fig. 4C) or the SEM is not conservative enough. No null distribution or statistics were computed on the R^2 distribution (Fig. 4C, blue distribution) to confirm the R^2s are statistically significant and not due to random fluctuations.

      We agree that this was not sufficiently robust and in response to this comment we undertook a significant revision to figure 4 and the associated text:

      i) The revised figure is based on an entirely new dataset, allowing us to verify the results on independent data. We used 5 min ISI for all stimulus presentations, regardless of stimulus type (high or low elevation), thus ensuring that we are only examining differences in state brought about by previous ongoing activity, without risk of ‘contamination’ by evoked activity.

      ii) As per the reviewer’s suggestion, we compared model-estimated pre-stimulus state to a null estimate using randomly sampled time-points. We additionally compared the optimised model with the baseline model. Whereas the null (random times) estimates had no predictive power, both models using pre-stimulus activity were able to explain a fraction of the response residuals with the optimised model performing better.

      iii) We refined the binning process by first computing, for each response, the mean of response residuals across neurons for each bin of estimated linear drive, and then averaging across responses. This prevents the relationship being skewed by rare instances involving unusually large numbers of neurons for a particular linear drive bin, and thereby eliminates the fluctuations the reviewer was referring to.

      The absence of any background activity in Fig. 6B (e.g., during the rest blocks) is confusing, given that in spontaneous activity many bursts and background activity are present (Fig. 2E).

      The raster only presents evoked responses and no background activity is shown. This has been clarified in the revised figure and legend.

      Finally, it appears that the anterior optic tectum contributes to convergent saccades (CS) (Fig. 7E) but no post-saccadic activity is shown to assess how activity changes after the saccade (e.g., plotting activity from 0 to 60).

      Activity before and after the saccade is shown in Fig 7A. Fig 7E shows the ‘linear drive’ (or ‘excitability’), and how it changes leading up to the saccade. Since we were interested in the association between pre-saccade state and saccade-associated activity, we did not plot post-saccadic linear drive. However, as can be seen in the below figure for the reviewer, linear drive is strongly suppressed by the saccade, as expected due to CS-associated activity.

      No explanation is given why activity drops ~30 seconds before a convergent saccade (Fig. 7E).

      This is no longer shown after we trimmed the history data in Fig 7E in accordance with a comment from reviewer 1. We speculate, however, that the mean linear drive of a compact population of neurons would be somewhat periodical, since a high linear drive leads to a burst which results in a prolonged inhibition (low linear drive) with a slow recovery and so on.

      No statistical test is performed on the R^2 distribution (Fig. 7H) to confirm the R^2s (with a mean close to R^2=0.01) are meaningful and not due to random fluctuations.

      We revised the analysis in Fig 7 along the same lines as the revision of Fig 4. Model-estimated linear drive predicts CS-associated activity whereas a null estimate (random times) shows no such relationship.

      Presentation: A disjointed part of the paper is that for the first part (Figs. 1-3), the focus is on capturing burst activity, but for the second part (Figs. 4-7), the focus is on trial-to-trial variability with no mention of bursts. It is unclear how the reader should relate the two and if bursts serve a purpose for stimulus-evoked activity.

      In the first part of the paper (Figs. 1-3), we use ongoing activity to develop an understanding (formulated as a network model) of how activity modulates the network state. In the second part, we test this understanding in the context of evoked responses and show that model-estimated network state explains a fraction of visual response variability and experience-dependent changes in activity and behaviour. In the revised MS we further emphasize this idea and have edited the results text to strengthen the connections between these parts of the study. See also comments above.

      Citations: The manuscript may cite other relevant studies in electrophysiology that have investigated noise correlations, such as:

      • Luczak et al., Neuron 2009 (comparing spontaneous and evoked activity).

      • Cohen and Kohn, Nat Neuro 2011 (review on noise correlations).

      • Smith and Kohn, JNeurosci 2008 (looking at correlations over distance).

      • Lin et al., Neuron 2015 (modeling shared variability).

      • Goris et al., Nat Neuro 2014 (check out Fig. 4).

      • Umakantha et al., Neuron 2021 (links noise correlation and dim reduction; includes other recent references to noise correlations).

      We agree that the manuscript could benefit from citing some of these suggested studies and have added citations accordingly.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript by McCafferty et al. presents the integrative computational structural modelling of the IFT-A complex, which is important to proper cilium organelle formation in eukaryotic cells. Recent advances in protein structure prediction (AlphaFold) allowed the authors to model the structures of the 6 individual subunits of the IFT-A complex. Interactions between IFT-A proteins were experimentally investigated by purifying Tetrahymena cilia, isolating IFT complexes, and utilizing chemical crosslinking and mass spectrometry (MS). In addition, the authors present a somewhat improved 23Å cryo-electron tomography (cryo-ET) map of the IFT-A complex (previously determined cryo-ET structures of IFT trains have resolutions of 24 - 40 Å). Integrative modelling using the predicted structures of the 6 IFT-A proteins and the experimental data as restraints allows the authors to present a structural model for the entire IFT-A complex. This model is analysed in the context of the polymeric IFT train structure, interactions with the IFT-B complex, and the structural position of ciliopathy disease variants.

      This is in principle a timely and interesting study that attempts to push the limits of structural modelling of large protein complexes using structure prediction in combination with experimental data. Unfortunately, the study has several shortcomings and the data providing restraints for the integrative modelling are not optimal.

      1) Chemical crosslinking and MS were used to obtain both intra-molecular crosslinks used to validate the structural models of the individual IFT-A proteins as well as inter-molecular crosslinks used as restraints in the structural modelling of the hexameric IFT-A complex. It is mentioned on p. 4, line 9, that IFT-A complexes were enriched from the flagellar lysate M+M fractions using SEC and that fractions from SEC containing IFT-A complexes were crosslinked for MS analysis. However, the authors do not show the data for this sample, neither SEC profiles, SDS-PAGE nor data of the cross-linked samples. On p. 7 the authors write that their SEC profile corresponds to monomeric IFT-A, but this is not shown anywhere in the manuscript. The reason this is so important is that the IFT-A complex assembles into linear polymeric structures together with the IFT-B complex as so-called IFT trains in cilia. Data obtained from isolated IFT trains would thus have additional crosslinks between subunits in neighbouring IFT-A complexes that, if used to restrain the position of subunits within a hexameric IFT-A complex, would likely result in a wrong architecture. The fact that the authors also observe crosslinks between IFT-A and IFT-B proteins strongly suggests that they indeed carried out the crosslinking experiment on polymeric rather than monomeric IFT complexes.

      These are excellent points, and we apologize for previously omitting these data. In the new Figure 1—figure supplement 2, we now include size exclusion chromatography elution profiles for IFT-A along with molecular weight calibrants, plotting the mass spectrometrically-determined abundances of IFT-A subunits. Based on these data, we experimentally determined the molecular weight of the IFT-A particles that we analyzed to lie between 720 kDa and 1.1 MDa, consistent with the expected monomeric molecular weight of 772 kDa.

      These samples were isolated directly from Tetrahymena cilia and were composed of ~3% each of IFT-A and IFT-B. However, as we now note on p. 11, the samples were subsequently concentrated before crosslinking. We speculate that concentrating the particles could have induced some degree of oligomerization and interactions with IFT-B, which may in turn explain the small number of crosslinks consistent with IFT-A/IFT-A and IFT-A/IFT-B interactions. However, we have now removed all discussion of specific IFTA/B contacts in the paper and present only the general orientation of the two complexes as determined by cryo-ET.

      2) Given that the crosslink/MS data are unlikely to provide sufficient restraints for IFT-A structure assembly (and may even be misleading), the cryo-ET data become increasingly important. Unfortunately, the 23Å cryo-ET map does not provide sufficient detail to unambiguously fit domains of the IFT-A subunits as several of these have similar architectures consisting of WD-repeats followed by TPRs.

      We now address this comment using a different approach, which we describe in full on p. 5, 14-15, and Figure 2—figure supplements 1-4 of the paper.

      In particular, we used AlphaFold-Multimer (AF-Multimer) to identify confidently-modeled rigid-body domains and domain-domain interactions for directly contacting protein pairs (see Figure 2—figure supplements 1-2), which we used as starting models for integrative modeling (see Figure 2—figure supplements 3-4). We incorporated our cross-links as distance restraints for the modeling. This approach allowed us to model the entire IFT-A complex in a manner compatible both with our experimental structural data and the computationally derived restraints. We suspect this will be a very useful strategy for others to adopt, as the approach should be generalizable to many other large molecular assemblies that are too big to predict using AF-Multimer alone. Importantly, we see high concordance between the AlphaFold intermolecular constraints and our crosslinks (as plotted in the new Figure 2—figure supplement 4), and the models produced by this strategy agree well with the two structures presented in the newly posted preprints, which were arrived at using very distinct methodologies.

      This approach allowed us to withhold the cryo-ET tomogram from the modeling altogether in order to generate a fully independent model. We could then compare the final model to the subtomogram average and, by docking the model into the cryo-ET tomogram, to build a model of polymeric IFT-A, as described on p. 6 and presented in the new Figure 4, Figure 4—figure supplement 1, and Figure 4—animation 1.

      3) Two preprints of the IFT-A structure appeared over the last few weeks. Hesketh et al., (https://www.biorxiv.org/content/10.1101/2022.08.09.503213v1) have obtained a single particle cryo-EM structure of the human IFT-A complex at 3.5Å resolution for the IFT121/122/139 part of the complex providing amino acids side-chain information. In addition, Lacey et al. (https://www.biorxiv.org/content/10.1101/2022.08.01.502329v1) provide a 10-18Å resolution cryo-ET structure of the Chlamydomonas IFT trains containing both IFT-A and IFT-B. It is noteworthy that the model outlined in the current manuscript is very different from the IFT-A models of Hesketh et al., and Lacey et al. (the Lacey et al. manuscript by the way shares an author with the McCafferty et al., manuscript). In both Hesketh et al., and Lacey et al. the IFT121 and IFT122 subunits interact via the N-terminal WD-repeats and the C-terminal TPRs with the beta-propellers (WD-repeats) positioned parallel and in close contact. In the model proposed by McCafferty, the beta-propellers of IFT121 and IFT122 are positioned far away from each other (>50Å) and are perpendicular to each other. Several other large discrepancies are found in the relative positions of IFT-A subunits. This suggests serious problems with the structural model of IFT-A proposed by McCafferty and needs to be addressed with great care.

      This is an important point that we have indeed considered with great care. Our new model now positions the WD-40 domains of IFT121 and IFT122 proximal to each other and broadly matches the 2 preprints in general placement and orientation of all subunits, including the placement of IFT43, for which only we and Hesketh provide models.

      We now include an extensive comparison to the structures reported in the other two preprints. Note that a direct 3D alignment of the structures was not possible, as we were the only group to deposit our atomic coordinates. However, we now include a new Figure 4—figure supplement 2 orienting our structure to match figures appearing in those preprints, and use this as the basis for comparison, which can be found on p. 10. While it is not possible to calculate a quantitative measure of agreement (e.g. RMSD), our IFT43/120/121/139 structure visually agrees with the structure of Hesketh et al., even to the placement of IFT43, which is highly disordered for the most part, and which is omitted from Lacey et al. Our structure also generally agrees with that of Lacey et al. in this region, with the exception of what appears to be a re-orientation of the N-terminus of IFT139 in the Lacey structure relative to that of ours and Hesketh, which appear to be concordant with each other (again, with the caveat that we are limited in the comparisons we can make without having access to atomic coordinates.) Most importantly, all three structures agree with respect to the nature of the IFT-A monomer-monomer interactions in the polymeric train, with IFT140 acting to bridge adjacent monomers. Differences in the resolutions of the cryo-ET subtomogram averages (which range from 18 to 30 Å) are relatively small across the 3 studies and, at least as best we can tell by this necessarily crude comparison at this stage, do not obviously lead to any major changes in the structures of the polymeric assemblies.

      4) The authors observe crosslinks between the IFT-A proteins (IFT122 and IFT140) and IFT-B proteins (IFT70, IFT88, and IFT172) as discussed on pg. 6 and shown in figure 5A. To accommodate these crosslinks into the structural model of the IFT train shown in Figure 5A, the authors place the IFT-B subunits IFT70 and IFT88 far apart in the IFT-B complex. However, these subunits are known to interact directly (Taschner et al. JCB 2014) and indeed sit in proximity to the IFT train structure as observed by Lacey et al. While the crosslinking data may well be correct, the incorrect structural model of IFT-A likely forces an incorrect positioning of IFT-B proteins to fulfill the crosslinking data.

      It is now clearly evident that the earlier segmentation of the monomeric unit from within the polymeric IFT-A chain, which we based on the published segmentation of Jordan et al. Nat Cell Biol. 2018, did not properly capture the true boundaries of an IFT-A monomer, especially with regard to IFT140, which extends outward to connect adjacent monomers. The use of this artificially truncated monomer as a molecular envelope in our initial modeling effectively forced the IFT-A subunits to pack in a reversed orientation in order to fit the truncated density.

      In order to address this issue, we omitted the cryo-ET data from the modeling altogether and instead incorporated evidence capturing domain-domain structures of interacting protein pairs from AlphaFold-Multimer. This substantially reduced the number of degrees of freedom to be explored by the integrative modeling process in order to satisfy the available structural restraints, leading in turn to significantly better convergence of independent modeling runs and high concordance with the input data (Figure 2—figure supplement 3 and Figure 2—figure supplement 4), and a significantly improved structural model of the IFT-A monomer. Docking this refined monomer structure into the (now fully independent) cryoET tomogram produced a model of the polymer that fit well into the cryo-ET density (Figure 4 and Figure 4—figure supplement 1) and agreed in large part with those derived by Hesketh and Lacey, as described above and visualized in Figure 4—figure supplement 2.

    1. Author Resonse

      Reviewer #1 (Public Review):

      The manuscript by Himmel et al is an interesting study representing a topic of substantial interest to the somatosensory neurobiology community. Here, the authors use CIII peripheral neurons to investigate polymodality of sensory neurons. From vertebrates to invertebrates, this is a long-standing question in the field: how is it that the same class of sensory neurons that express receptors for myriad sensory modalities encode different behavioral responses. This system in Drosophila seems to be an intriguing system to study this question, making use of the genetic toolkit in the fly and ease of behavioral assays. In this study, the authors identify a number of channels that are important for cold nociception, and they showed that some of these do not appear to also encode mechanosensation. Despite my initial enthusiasm for this paper, halfway through, it felt as if I were reading two different papers that were loosely tied together. This lack of cohesion significantly reduced my enthusiasm for this work. Below are some of my criticisms:

      We thank Reviewer #1 for their feedback. In addition to the points below, and in accordance with the reviewer’s overall criticisms, we have revised the body text to make it more cohesive. Our main goal with this revision was to better explain to the reader the shift from anoctamins to SLC12 cotransporters.

      1) The first half of the paper is about a role for Anoctamins in cold nociception, but the second half switched somewhat abruptly to ncc69 and kcc. I assumed the authors would connect these genes in a genetic pathway, performing some kind of epistatic genetic interaction studies or even biochemical assays, and that this was the reason to switch the focus of the paper midway through. But this was not the case. Moreover, they performed a different constellation of experiments for the genes in the first half vs the second half of the paper (eg. Showed a role in cold nociception vs mechanosensation or showing phenotype from overexpression). This lack of cohesion made it difficult to follow the work.

      We have edited the text to better explain this shift. Two notable changes are: (1) moving the phylogenetics to Figure 1, to more immediately present and demonstrate that subdued is part of the ANO1/ANO2 family of calcium-activated chloride channels; and (2) a new cartoon schematic in Figure 6 to more strongly communicate to a reader that chloride is a hypothetical mechanism of cold discrimination.

      In short, previous work and our phylogenetic analyses indicate that subdued is a Cl- channel (we have moved the phylogeny earlier in the paper to make this clear from the onset). We were therefore surprised that knockdown/mutation resulted in reduced CT behavior, as neural Cl- currents are often inhibitory. Thus, we looked to known mechanisms of Cl- homeostasis to try to formulate an informed hypothesis about the function of anoctamins in this system; hence the shift in focus to SLC12.

      In response to the second half of the comment: We have in fact performed cold nociception and mechanosensation experiments for both the anoctamins and the SLC12 cotransporters, although the SLC12 mechanosensation results were in a supplemental figure. We have moved the mechanaosensation results to the main Figure 6 to make this clearer. With respect to simple overexpression, the goal of the anoctamin experiments was to test the necessity of anoctamins to cold-evoked behavior, whereas the goal of the SLC12 experiments was to differentially modulate Cl- homeostasis, and this could hypothetically be accomplished by both knockdown and overexpression (hence we performed both knockdown and overexpression).

      2) In Fig1B,C how does one confirm a CIII neuron is being analyzed. It might help the reader if there were at least some zoomed out photos where all the cell types are labeled and potentially compared to a schematic. Moreover, is there a CIII specific marker to use to co-stain for confirmation of neuron type?

      Our CIII fusion is a specific marker for CIII neurons. To better demonstrate this, we have added images of the new CIII fusion expression patterns overlapping with a previously described CIII GAL4 driver (i.e. nompC-GAL4), and provided text describing how the CIII fusion transgene was discovered and generated. Please see the new Figure 1-Figure supplement 1.

      3) As this paper is predicated on detecting differences by behavioral phenotype, the scoring analysis is not as robust as it could be, especially considering the wealth of tools in Drosophila for mapping behaviors. The "CT" phenotype is begging for a richer behavioral quantification. This critique becomes relevant here when considering the optogenetic induced CT behavior in Fig5. If the authors were to use unbiased quantitative metrics to measure behavior, they could show how similar the opto behavior is to the natural cold evoked behavior. Perhaps the two are not the same, although loosely fitting under the umbrella of "CT".

      In accordance with our response above to necessary revisions, we have added one additional metric and reorganized the figures to better demonstrate the complexity of the behavior. We have no further data or new tools at this time.

      To improve our optogenetic analyses, we have added data for Channelrhodopsin-dependent CIII activation, which has been previously shown to induce cold-like behaviors at high levels of activation and innocuous touch-like behaviors at low levels of activation (Turner, Armengol et al 2016). Further, we have added videos (Figure 5—videos 1-3) showing behavior in response to both Channelrhodopsin and Aurora activation.

      With respect to differences in behavior, we have pointed out some differences in the Aurora-evoked behavior from the cold-evoked behavior: chloride optogenetics induces innocuous touch-like behaviors following CT. Please see lines 296-299.

      4) Following on from the last comment, the touch assays in Fig3 have a different measurement system from the other figures. Perhaps touch deficits would be identified with richer behavioral quantification. Moreover, do these RNAi larvae show any responses to noxious mechanical stimulation?

      The touch assays necessarily have different metrics from cold assays, as the touch-evoked behaviors are quite different from cold-evoked change in length (which are relatively simple, prima facie).

      With respect to noxious mechanical stimulation, while Class III neurons have been shown to facilitate this modality and be connected to relevant circuitry (please see Hu et al 2017 https://doi.org/10.1038/nn.4580 and Takagi et al 2017 https://doi.org/10.1016/j.neuron.2017.10.030), Class IV neurons are the primary sensory neuron which initiate the noxious mechanical-induced rolling response. Although this is an interesting question, we believe it is outside the scope of this study.

      Reviewer #2 (Public Review):

      Himmel and colleagues study how individual sensory neurons can be tuned to detect noxious vs. gentle touch stimuli. Functional studies of Drosophila class III dendritic arborization neurons characterized roles in gentle touch and identified a receptor, NompC, and other factors that mediate these responses. Subsequent work primarily from the authors of the current study focused on roles for the same sensory neurons in cold nociception. The two proposed sensory inputs lead to quite distinct sets of behaviors, with touch leading to halting, head turning and reverse peristalsis, and noxious cold leading to whole body contraction. How activity of one type of sensory neuron could lead to such different responses remains an outstanding question, both at the levels of reception and circuitry.

      The cIII responses to noxious cold and innocuous touch raises questions that the authors address here, proposing that studies of this system could advance the understanding of chronic neuropathic pain. A candidate approach inspired by studies in vertebrate nociceptors led the authors to study anoctamin/TMEM16 channels subdued, and CG15270, termed wwk by the authors. The authors focus on a pathway for gentle touch vs. cold nociception discrimination through anoctamins. Several of the experiments in this manuscript are well done, in particular, the electrophysiological recordings provide a substantial advance. However, the genetic and expression analysis has several gaps and should be strengthened. The data also do not provide strong support for some key aspects of the proposed model, namely the importance of relative levels of Cl co-transporters.

      Major comments:

      1) Knockout studies are accomplished using two MiMIC insertions whose effects on subdued or CG15270/wwk are not characterized by the authors. This needs to be established. The MiMIC system is also not well explained in the text for readers.

      We have modified the text to better explain MiMICs (Lines 137-140) and we have verified the mutagenic effects of these MiMIC insertions via RT-PCR (Figure 2 – supplement 1). We believe these data, in conjunction with other converging lines of evidence (e.g. rescue) demonstrate necessity of these genes in cold nociception.

      2) Subdued expression is inferred by a Gal4 enhancer trap. This can be a hazardous way of determining expression patterns given the uncertain relevance of the local enhancers driving the expression. According to microarray analysis subdued is strongly expressed in cIII neurons, but c240-Gal4 is barely present compared to nearby neurons, raising questions about whether this line reflects the expression pattern, including levels, even though the authors suggest that the line is previously validated (line 95; it is unclear what previously validated means). Figure 1B should not be labeled "subdued > GFP" since it is not clear that this is the case. Another more direct method of assessing expression in cIII is necessary. Confidence is higher for wwk using a T2A-Gal4 line, however, Figure 1C might be misleading to readers and indicate that wwk-T2A-Gal4 is cIII specific whereas in supplemental data the authors show how it is much more broadly expressed. The expression pattern in the supplemental figures should be moved to the main figures.

      We have removed the phrase “previously validated” and we have modified Figure 1 to change how we refer to the GFP expression (removed “subdued > GFP”).

      In accordance with the response to necessary revisions above, we make use of several converging lines of evidence to infer expression, including GAL4 expression patterns, microarray, and qPCR (the two latter experiments from isolated CIII samples). That subdued and wwk are expressed in CIII is clearly the most parsimonious hypothesis.

      We have also carefully reviewed our body text to be certain we do not make claims of differential expression between different neural subtypes based on differences in fluorescence in the GAL4-driven GFP imaging. We do not believe that this would be a reasonable way to infer differences in expression levels in any instance.

      With respect to the design of Figure 1, the intent is not to mislead the reader, and we state in the text that wwk is not solely expressed in CIII (lines 120-125). As eLife makes supplemental figures available directly alongside the main figures, we have left the relevant supplemental figures as supplements – we simply think this makes more sense from a standpoint of readability and style.

      3) In figure 8 the authors propose a model in which the relative levels of K-Cl cotransporters Kcc (outward) and Ncc69 (inward) in cIII neurons determine high intracellular Cl- levels and a Cl- dependent depolarizing current in cIII neurons. They test this model using overexpression and loss of function data, but the results do not support their model since for most of the overexpression and LOF of kcc and ncc69 do not significantly affect cold nociception, the exception being ncc69 RNAi. The authors suggest that this could be due to Cl homeostasis regulated by other cotransporters. Nonetheless, it leaves a significant unexplained gap in the model that needs to be addressed.

      We respectfully disagree that our results are not consistent with the stated hypothesis. In fact, it is the lack of change under certain conditions which lend evidence against the alternative hypothesis that CIII neurons maintain relatively low intracellular Cl-. The hypothesis we are testing is that ncc69 expression is driving relatively high intracellular Cl- concentrations, thus resulting in depolarizing Cl- currents.

      Under this hypothesis, we would predict that knockdown of ncc69 and overexpression of kcc would reduce cold sensitivity at 5˚C. That knockdown of ncc69 and overexpression of kcc reduces cold sensitivity is consistent with this hypothesis (and we point out in text that the evidence for kcc is less convincing) – at the least, these results do not disprove it.

      Under this hypothesis, we would also predict that knockdown of kcc and overexpression of ncc69 would not result in reduced cold sensitivity at 5˚C. As there was no phenotype at 5C, our results are likewise consistent with the hypothesis (at the least, they do not disprove it).

      We did find it curious that ncc69 RNAi did not affect neural activity at 10˚C, but speculate that our inability to detect physiological effects for ncc69 knockdown are limitations of our electrophysiology methodology (and we discuss this in the manuscript).

      The only piece of data inconsistent with the hypothesis may be that kcc overexpression may not have affected cold nociception at 5˚C – the data aren’t overwhelmingly convincing. However, this is only one experiment among many, and we believe the preponderance of evidence is consistent with the hypothesis. That is not to say we believe this hypothesis has complete explanatory power, however, as noted by our discussion of both the ncc69 electrophysiological and kcc behavioral data, and by our suggestion that there may be other regulatory mechanisms at work. This latter suggestion is wholly speculative, and we believe appropriate for the discussion section. We agree (and state in the discussion) that this would require further experimentation.

      4) Related to the #3, the authors should verify the microarray data that form the basis for their differential expression model.

      We have performed qPCR for ncc69 and kcc. Although qPCR is semiquantitative when comparing between genes, the Ct value for ncc69 was lower than for kcc, indicating more transcripts were present at the onset (assuming identical efficacy). These data (although semi-quantitative), the microarray, and our behavioral and electrophysiological data are consistent with the stated hypothesis.

      Reviewer #3 (Public Review):

      There are also several modest weaknesses in the paper:

      1) A notable gap remains in the evidence for the hypothesized mechanisms that enhance electrical activity during cold stimulation and the proposed role of anoctamins (Fig. 8) - the lack of evidence for Ca2+-dependent activation of Cl- current. The recording methods used in the fillet preparation should enable direct tests of this important part of the model.

      We have performed an additional experiment at the reviewer’s suggestion. Please see above (in essential revisions) and below (in recommendations for authors).

      2) The behavioral and electrophysiological consequences of knocking down either of the two anoctamins are incomplete (Fig.2), raising the significant question of whether combined knock-down of both anoctamins in the CIII neurons would largely eliminate the cold-specific responses.

      While the results of this experiment would certainly be interesting, we are unsure of how it would be acutely informative in this context and are not convinced that any possible outcomes would disprove any particular hypothesis. In part, this is because we know that blocking synaptic transmission in CIII neurons (via tetanus toxin) does not completely ablate cold-evoked behavior (Turner & Armengol et al 2016 https://doi.org/10.1016/j.cub.2016.09.038). This is also the case for combinatorial mutation of other genes associated with cold nociception (please see Turner & Armengol et al 2016; and more recently, Patel et al 2022 https://doi.org/10.3389/fnmol.2022.942548). Further, the husbandry required to generate the double knockdowns would be quite challenging and might result in GAL4 titration (hypothetically less strongly knocking down each gene). For these reasons, we have not performed this suggested experiment.

      3) Blind procedures were not used to minimize unconscious bias in the analyses of video-recorded behavior, although some of the analyses were partially automated.

      This is correct and a relative weakness of the study. We note it in our methods section. The use of semi-automated data analyses of the behavioral videos is designed to minimize experimenter-specific variability.

      4) The term "hypersensitization" is confusing. Pain physiologists typically use "sensitization" when behavioral or neural responses are increased from normal. In the case of increased neuronal sensitivity, if the mechanism involves an increase in responsiveness to depolarizing inputs or an increased probability of spontaneous discharge, the term "hyperexcitability" is appropriate. Hypersensitization connotes an extreme sensitization state compared to a known normal sensitization state (which already signifies increased sensitivity). In contrast, the effects of ncc69 overexpression in this manuscript are best described simply as sensitization (increased reflexive and neuronal sensitivity to cooling) and hyperexcitability (expressed as increased spontaneous activity at room temperature).

      We have modified the text in accordance with the reviewer’s suggestions (see recommendations for authors section). We have also changed the title of the paper to “Chloride-dependent mechanisms of multimodal sensory discrimination and nociceptive sensitization in Drosophila”

    1. Author Response

      Reviewer #3 (Public Review):

      This paper focuses on characterizing differences between D. suzukii and D. melanogaster preferences for laying eggs on substrates of varying sugar content and stiffness. The authors demonstrate that D. suzukii show a weaker preference for multiple sugars in oviposition choice assays, that D. suzukii show a loss of sugar responsiveness in some labellar sensilla, and that some GR-encoding genes are expressed at much lower levels compared to. D. melanogaster in the legs and labellum. Intriguingly, a number of mechanosensory channel genes are upregulated In D. suzukii legs and labellum. The authors show that D. suzukii females prefer stiffer oviposition substrates compared to D. melanogaster and the balance of sweetness/texture preference differs between the two species. This is consistent with their ecological niches, with D. suzukii generally preferring to lay eggs in ripe fruit and D. melanogaster generally preferring overripe fruit.

      This paper builds on previous work from this group (Dweck et al., 2021) and others (Karageorgi et al., 2017 and others) that previously demonstrated that D. suzukii prefer to lay eggs on stiffer substrates compared to D. melanogaster, will tolerate more bitter substrates and show reduced expression of several bitter GR genes. This manuscript appropriately acknowledges this work and the findings are consistent with these studies.

      The manuscript is well-written, the experiments are well-controlled, the figures clearly convey the experimental findings, the data support the authors conclusions, and the statistical analysis is appropriate.

      The weakest point of the paper is the lack of connection drawn between the sequencing, electrophysiological, and behavioral data. For example, the electrophysiological responses to glucose appear to be the same in both species in Figure 3 but the behavioral responses in Figure 2 are different between the two species. The authors do not provide any speculation as to what could account for this seeming discrepancy.

      The revised ms. contains the following statement: " The weaker behavioral responses to glucose observed in D. suzukii could derive from weaker responses of untested taste neurons. Multiple taste organs, including the pharynx as well as the labellum and legs, contribute to oviposition behavior; sensory neurons of the ovipositor appear to play an important role as well (Yang et al., 2008; Joseph et al., 2012; Chen et al., 2022). The weaker behavioral response to glucose in D. suzukii could also arise from differences in central processing of glucose signals. It will be interesting to determine if there are differences in the connectivity of taste circuits in the two species. Alternatively, taste projection neurons in D. suzukii could have a reduced dynamic range, saturate at lower levels of receptor neuron firing, and be less able to distinguish among higher sugar concentrations."

      Additionally, although Gr64d transcript is almost completely absent in D. suzukii leg RNA seq data in Figure 4B, there are no differences in the electrophysiological responses in leg sensilla in Figure 3.

      This seems to imply that, although there are differences gene expression of some Grs that this does not necessarily lead to a functional difference.

      We have added to the Discussion a statement to clarify that although similar, the sugar responses of leg sensilla are in fact not the same in the two species: "Leg sensilla of D. suzukii responded to sucrose, but dose-response analysis of the f5s sensillum of the leg showed that the response was lower than in its D. melanogaster counterpart to higher concentrations of sucrose (Figure 3—figure supplement 1E) ."

      The authors identify mechanosensory genes that are upregulated in D. suzukii compared to D. melanogaster and suggest that these changes underlie the difference in substrate stiffness. However, it is not immediately clear that high levels of these mechanosensors would impart a new oviposition preference. Although the authors acknowledge that there are likely circuit-level differences between the two species, they do not directly test the role of any of these mechanosensors in oviposition preference in either species.

      See response below to the point about nompC.

      In Figure 3 there are clear differences in some of labellar responses but the leg responses look similar overall. This suggests that the labellum is playing a special role in oviposition evaluation. The paper would be strengthened by providing more insight into which tissues (labellum, legs, wings, ovipositor, etc...) are likely used to sample potential egg laying substrates.

      We agree and have added to the Discussion the following: "Multiple taste organs, including the pharynx as well as the labellum and legs, contribute to oviposition behavior; sensory neurons of the ovipositor appear to play an important role as well (Yang Science 2008; Joseph Genetics 2012; Chen PNAS 2022)."

    1. Author Response

      Reviewer #3 (Public Review):

      Main results:

      1) TCR convergence is different from publicity: The authors look at CDR3 sequence features of convergent TCRs in the large Emerson CMV cohort. Amino usage does not perfectly correlate with codon degeneracy, for example, arginine (which has 6 codons) is less common in convergent TCRs, whereas leucine and serine are elevated. It's argued that there's more to convergence than just recombination biases, which makes sense. (I wonder if the trends for charged amino acids could be explained by the enrichment of convergent TCRs in CD8 T cells, which tend to have more acidic CDR3 loops). There's also a claim that the overlap between convergent and public TCRs is lower in tumors with a high mutational burden (TMB), but this part is sketchy: the definition of public TCRs is murky and hard to interpret, and the correlation between TMB and convergence-publicity overlap is modest (two cohorts with low TMB have higher overlap, and the other three have lower, but there is no association over those three, if anything the trend is in the other direction). It's also not clear why the overlap between COVID19 cohort convergent TCRs and public TCRs defined by the pre-2019 Emerson cohort should be high. A confounder here is the potential association between convergence and clonal expansion since expanded clonotypes can spawn apparently convergent TCRs due to sequencing errors. The paper "TCR Convergence in Individuals Treated With Immune Checkpoint Inhibition for Cancer" (Ref#5 here) gives evidence that sequencing errors may be inflating convergence in this specific dataset.

      We really appreciate the reviewer’s feedback. We respond to each of the reviewer’s points below:

      (1) Amino acid preference of convergent TCRs might be caused by CD8+ T cell enrichment. To test this hypothesis, we performed the same analysis using only CD8+ T cells (using the Cader 2019 lymphoma cohort). The results are shown below. We do not observe significant changes after excluding CD4+ T cells, indicating that this enrichment might be caused by factors other than CD4/CD8 differences.

      (2) Definition of public TCRs. We have changed the definition of public TCRs. Instead of mixing the Emerson cohort into each group and using the mixed cohort to define the public TCRs, we just used the 666 samples of the Emerson cohort to define the same set of public TCRs and applied them to each cohort. Both the dataset and the approach used in this manuscript is consistent with a previous study on the same topic (Madi et al., 2014, elife).

      (3) Convergence-publicity overlap: We agree with the reviewer that some high TMB tumors did not show further decrease of convergence-publicity overlap. One potential explanation is that the correlation between the two is not linear. By adding additional cohorts in this revision (healthy and recovered COVID-19 patients), we confirmed the previously observed overall trend between TMB and the overlap, which supported our conclusions (see figure below). On the other hand, we believe that the high overlap of convergent TCRs among healthy cohorts might result from exposure to common antigens. In the cancer patients, while still exposed, private antigens derived from tumor cells are expected to compete for resources, thus reducing the proportion of these public TCRs in the blood repertoire. The above discussion has been added to the revised manuscript:

      “Healthy individuals are expected to be exposed to common pathogens, which might induce public T cell responses. On the other hand, cancer patients have more neoantigens due to the accumulative mutation, which drives their antigen-specific T cells to recognize these 'private' antigens. This reduces the proportion of public TCRs in antigen-specific TCRs. Furthermore, a higher tumor mutation burden (TMB) would indicate a higher abundance of neoantigens, resulting in a lower ratio of public TCRs.”

      2) Convergent TCRs are more likely to be antigen-specific: This is nicely shown on two datasets: the large dextramer dataset from 10x genomics, and the COVID19 datasets from Adaptive biotech. But given previous work on TCR convergence, for example, the Pogorelyy ALICE paper, and many others, this is also not super-surprising.

      We thank the reviewer for bringing up this related work. In the Pogorelyy ALICE paper, the authors defined TCR neighbors based on one nucleotide difference of a given CDR3, which included both synonymous and non-synonymous changes. In other words, ALICE combines both convergence and mismatched (with hamming distance 1) sequences as neighbors. Although highly relevant, our approach is different by focusing only on the convergence, as mismatch has been extensively investigated by previous studies. We have now added this paper as Ref 27, and discussed the difference between ALICE and our method in the revised manuscript.

      3) Convergent T cells exhibit a CD8+ cytotoxic gene signature: This is based on a nice analysis of mouse and human single-cell datasets. One striking finding is that convergent TCRs are WAY more common in CD8+ T cells than in CD4+ T cells. It would be interesting to know how much of this could be explained by greater clonal expansion of CD8+ T cells, together with sequencing errors. A subtle point here is that some of the P values are probably inflated by the presence of expanded clonotypes: a group of cells belonging to the same expanded clonotype will tend to have similar gene expression (and therefore similar cluster membership), and will necessarily all be either convergent or not convergent collectively since they share the same TCR. So it's probably not quite right to treat them as independent for the purposes of assessing associations between gene expression clusters and convergence (or any other TCR-defined feature). You can see evidence for clonal expansion in Figure 3C, where TRAV genes are among the most enriched, suggesting that Cluster 04 may contain expanded clones.

      (1) We agree with the reviewer that a possible explanation of the CD8/CD4 difference is the larger cell expansion of CD8+ T cells. We tested this hypothesis by counting the number of T cell clones instead of cell number to remove the effect that would have been caused by CD8 T cell expansion. We first investigated the bulk TCR repertoire sequencing samples as Figure 3 - figure supplement 2C-2D (see figure below). We observed higher convergence levels for the CD8+ T cell clones compared to CD4+ T cells. The additional description of this topic was added at the last paragraph of the result section of “Convergent T cells exhibit a CD8+ cytotoxic gene signature” as follows:

      “The results may be explained by larger cell expansions of CD8+ T cells than CD4+ T cells. Therefore, we calculated the number of convergent clones within CD8+ T cells and CD4+ T cells from the above datasets to exclude the effects of cell expansion. As a result, in the scRNA-seq mouse data, while only 1.54% of the CD4+ clones were convergent, 3.76% of the CD8+ clones showed convergence. Likewise, 0.17% of convergent CD4+ T cell clones and 1.03% of convergent CD8+ T cell clones were found in human scRNA-seq data. In the bulk TCR-seq lymphoma data, similar results were also observed, where the gap between the convergent levels of CD4+ and CD8+ T cells narrowed but remained significant (Figure 3—figure supplement 2C-2D). In conclusion, these results suggest that CD8+ T cells show higher levels of convergence than CD4+ T cells, which substantiated our hypothesis that convergent T cells are more likely antigen-experienced. This observation has been tested using multiple datasets with diverse sequencing platforms and sequencing depth to minimize the impact of batch or other technical artifacts.”

      (2) We next investigated the effect of cell expansion in the single cell analysis. We agree with the reviewer that some highly-expanded convergent clones could inflate the p-value. Therefore, we revised the calculation of TCR convergence by using the T cell clone instead of individual cells. We observed that the clusters of interest mentioned in the paper (for both mouse and human data) remain at the top convergent level among all clusters (see table below), with p values estimated using Binomial exact test. These results supported our hypothesis that TCR convergence is enriched for T cell clusters that are more likely antigen-experienced.

      4) TCR convergence is associated with the clinical outcome of ICB treatment: The associations for the first analysis are described as significant in the text, and they are, but just barely (0.045 and 0.047, but you have to check the figure to see that).

      As suggested by the reviewer, we have added the p-value to the test so that it is easier to see. In this revision, we adopted another definition of convergent level, changing from the ratio of convergent TCR to the actual number of convergent T cell clones within each sample. The p-values were more significant using this new indicator (0.02 and 0.00038). To avoid the effect of other variables that might be correlative with convergent levels, especially the sequencing depth, the multivariate Cox model was used for both datasets tested in the paper, correcting for TCR clonality, TCR diversity and sequencing depth (and different treatment methods for melanomas data). As a result, convergence remains significantly prognostic after adjusting for the additional variables.

      5) Introduction/Discussion: Overall, the authors could do a better job citing previous work on convergence, for example, papers from Venturi on convergent recombination and the work from Mora and Walczak (ALICE, another recombination modeling). They also present the use of convergence as an ICB biomarker as a novel finding, but Ref 5 introduces this concept and validates it in another cohort. Ref 5 also has a careful analysis of the link between sequencing errors and convergence, which could have been more carefully considered here.

      We thank the reviewer for this excellent suggestion. We have added the citation of Venturi on convergent recombination as Ref 43 and we cited it at the last paragraph of the result selection:

      “Convergent recombination was claimed to be the mechanistic basis for public TCR response in many previous studies(Quigley et al., 2010; Venturi et al., 2006).”

      We also included work from Mora and Walczak in the fourth paragraph of the introduction and the third paragraph of the discussion as Ref 27 to introduce this TCR similarity-based clustering method as well as its application in predicting ICB response:

      “This idea has led several TCR similarity-based clustering algorithms, such as ALICE (Pogorelyy et al., 2019), TCRdist (Dash et al., 2017), GLIPH2 (Huang et al., 2020), iSMART (Zhang et al., 2020), and GIANA (Zhang et al., 2021), to be developed for studying antigen-driven T cell expansion during viral infection or tumorigenesis.”

      “In addition, the potential prognostic value of TCR convergence and TCR similarity-based clustering was testified in other studies(Looney et al., 2019; Pogorelyy et al., 2019).”

      Ref 5 was recited while discussing the effect of sequencing error on TCR convergence in the fourth paragraph of discussion:

      “Improper handling of sequencing errors may result in the overestimation of TCR convergence (Looney et al., 2019).”

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript by Borsatto et al describes atomic-level structural details of the central core domain of non-structural protein 1 (Nsp1) of SARS-CoV-2, the virus responsible for the ongoing COVID-19 pandemic. Authors combined X-ray crystallography, fragment screening, computational modelling, and molecular dynamics simulation approaches to characterize potentially druggable pockets in Nsp1 core (aa 10-126). This study presents several notable strengths. For example, authors screened and tested 60 fragments from the Maybridge Ro3 library and solved a co-crystal structure of Nsp1 core with one such fragment 2E10 (N-(2,3dihydro-1H-inden-5-yl) acetamide) to 1.1Å resolution. The molecular dynamics simulation and other computational experiments were performed rigorously.

      Nsp1 blocks the path of mRNA in ribosomes to modulate protein synthesis in the host cell. Nsp1 also binds the first stem-loop (SL1) of SARS-CoV-2 mRNA. The authors used a molecular docking program (HADDOCK) to build models of the Nsp1/RNA complex and predicted two modes of Nsp1 binding to SL1 RNA. A comparative structural analysis of Nsp1/2E10 experimental structure with Nsp1/SL1 (model) reveals that small molecule compounds occupying this site may block RNA binding of Nsp1. Given the established role of this interface in modulating the host and viral gene expression programs, this finding provides an important framework for designing the small molecules capable of completely blocking this interface.

      A weakness of this study is the lack of experimental validation of the two modes of Nsp1 binding to SL1 RNA.

      The mechanism of binding, in particular whether Nsp1 binds to the ribosome first and then to the SL1 or the other way round, is still debated. Moreover, to the best of our knowledge, to this day there is no structure of the N-terminal region Nsp1 bound to the ribosome. Thus, we expect that obtaining a structure of the binary and possibly ternary complex to validate the predicted binding mode will necessitate considerable time and efforts and will hopefully be the focus of a follow up study.

      Reviewer #2 (Public Review):

      In this manuscript, the authors have identified cryptic pockets in the Nsp1 protein of the SARSCoV-2 virus. The authors used computational methods to identify these pockets and demonstrate drug binding via simulation studies. The authors also show that such cryptic pockets exist in other beta-coronaviruses as well.

      The authors carried out fragment-based screening using macromolecular crystallography and confirmed the presence of drug bound in one of the pockets identified. However, the binding assays showed a weak binding with high error.

      The weak binding is typical for fragments, however we agree that the error was high, therefore, we re-measured the data (both for Nsp1N and full-length Nsp1) to bring the error down. The new values can be found in Figure 6 – Figure supplement 2.

      Further, the authors perform Nsp1-mRNA simulation studies to identify how Nsp1 binds to the 5'UTR of SARS-CoV-2 mRNA and mention that targeting the identified pocket in Nsp1-N could disrupt the SARS-CoV-2 Nsp1-mRNA complex. However, there are conflicting reports on direct binding between the SARS-CoV-2 Nsp1-mRNA (See references 17 & 29).<br /> Nsp1 helps establish viral infection in the host, and hence identifying the druggable site in this protein is important. Therefore, this study is important and exciting.

    1. Author Response

      Reviewer #1 (Public Review):

      Yanis Zekri et al have addressed an important question of the possible role of thyroid hormone (T3) and its nuclear receptor (TR) on local BAT thermogenesis and energy expenditure. In this well-written manuscript and well-carried work, the authors address the above question by A) by generating the BATKO mice by selectively eliminating TR signaling in BAT by knocking-in a TRα1L400R, a dominant negative version of the TRa1 receptor, and by floxing the ThRb gene. They characterized this mouse thoroughly to show that they totally lacked T3 responsiveness. Using qPCR they evaluated the selective abrogation of Thrb and Hr expression in BAT tissue relative to other tissue sites. B) Using time-course transcriptome analysis they then go on to enlist all the T3/TR direct target genes using well-defined criteria and further linking with their ChipSeq data they identified 639 putative target genes which are under the direct control of T3/TR signaling. Interestingly their gene analysis lead them to some target genes directly involved with UCP1 and PGC1α in addition to genes of many other metabolic processes related to BAT thermogenesis. The experiments on denervated BAT on wild-type PTU-fed was a rather neat experiment to eliminate the influence of noradrenergic terminal BAT target genes. Furthermore, the cold exposure experiments and the high-fat diets feeding with the series of complex analyses led them to the conclusion that BAT KO animals suffered from reduced efficiency of BAT adaptive thermogenesis. By comparing the BAT transcriptome of BATKO and CTRL mice after 24h at 4{degree sign}C, the authors further go on to show how BAT TR signaling controls other subsets of genes, especially a wide variety of metabolic regulations, especially lipolysis/fatty acid oxidation. Finally, EdU injection experiments showed a direct effect of T3 on BAT proliferation.

      I think it was well thought and well-designed study for understanding the complex action of cell-autonomous T3 regulation of adaptive thermogenesis. The conclusions of this paper are well supported by the data provided.

      Thank you very much for this very pertinent and kind summary of our work.

      Reviewer #2 (Public Review):

      The authors designed this study to identify the direct T3 target genes that underlie the T3 actions in the brown adipose tissue (BAT). The unique model used (dominant negative TRa knock-in and a TRb knock-out) allows for the isolation of BAT-specific actions from other well-known systemic effects on thermogenesis, including the central nervous system. The strengths of the study reside in the novel methodological approach. Previous studies of T3 actions in the BAT used animal models that did not allow for full isolation of BAT-specific effects of T3. A limitation however is the combination of TRa knock-in (which causes permanent suppression of TRa-dependent genes) with the TRb knockout, which only prevents T3 induction of TRb-dependent genes. Nonetheless, the results were impressive with the identification of about 1,500 genes differentially expressed in the BAT, among which UCP1 and PGC1a were the two main ones. Although it has been known that both UCP1 and PGC1a are downstream targets of T3, the work establishes through an ingenious approach the critical direct role played by T3 in BAT thermogenesis. In addition, the genetic approach utilized here is of great value and could be easily expanded to other tissues and systems.

      Thank you for this very pertinent summary of our work. We just want to clarify one point: we do not believe that Ucp1 is, quantitatively, one of the main genes regulated by T3 in the BAT. First, it does not belong to the set of the most induced genes after T3/T4 injection in mice. Most importantly, Ucp1 expression was not altered in BATKO mice exposed at 4°C according to unbiased RNAseq analysis. Only targeted qRT-PCR analysis could evidence a modest change. We do not call into question the crucial role of Ucp1 in BAT thermogenesis. However, we think that our approach put into perspective the relevance of Ucp1 in the T3-dependent control of BAT thermogenesis, suggesting that other mechanisms might be more directly linked to T3 activity.

      Reviewer #3 (Public Review):

      This paper details the importance of thyroid hormone signaling in BAT in response to environmental and nutritional stress. The authors utilize a novel genetic model to specifically target BAT and impair thyroid hormone signaling. The physiologic insight is of great interest. The role of the sympathetic nervous system in the BAT response is not fully addressed but it appears that cell-autonomous signaling mediates TH signaling in response to hyperthyroidism. The link cistromically between the TR and PGC1 is also novel and of interest.

      Thank you very much for your kind comments that are highly appreciated.

    1. Author Response

      Reviewer #1 (Public Review):

      This work addresses a long-standing question about how tolerance develops at the presynaptic level. That the number of receptors is unchanged following the treatment of animals with opioids was known since the early work using receptor binding assays. The conclusion was that receptor/effector coupling was disrupted was thought to be the primary mechanism underlying tolerance. This work indicates that the location of receptors is critically important in the development of tolerance. This work is groundbreaking and a game changer in the understanding of tolerance at the cellular level.

      We appreciate that the Reviewer is positive about the potential impact of our study.

      Reviewer #2 (Public Review):

      Jullie et al addressed the long-standing question of how presynaptic desensitization of opioid receptor signaling can occur on the timescale of hours despite the fact that it does not occur on the timescale of minutes. They also compared the mu and delta opioid receptors in this context and asked whether their desensitization occurs in a homologous or heterologous manner when co-expressed in the same neurons.

      A major strength of the work is the use of a relatively high-volume imaging assay of synaptic transmission based on VAMP2-SEP to detect exocytosis of synaptic vesicles and its modulation by heterologously expressed opioid receptors in cultured neurons. This allowed for large data sets to be acquired and analyzed with good statistical power. It also reports on a validated metric of synaptic transmission.

      A significant weakness arises from the need to overexpress opioid receptors in cultured striatal neurons in order to conduct the experiments with high reliability. Because the authors did not attempt to address receptor expression levels and relate overexpression to endogenous receptor expression levels in axons, the physiological significance of the findings remains, to some extent, in doubt.

      Using heterologously expressed receptors, the primary finding that slow desensitization (of presynaptic suppression of neurotransmission) occurs via endocytosis of membrane-localized opioid receptors, is well supported by multiple lines of evidence. 1) Blocking receptor endocytosis, either via mutation of GRK2/3 phosphorylation sites or pharmacological block with compound 101 prevents slow desensitization of MOR. ) SEP-MOR and SEP-DOR fluorescence (indicative of plasma membrane localization) is reduced by chronic agonist treatment.

      The secondary findings that MOR and DOR do not desensitize or undergo endocytosis in a heterologous manner, and that DOR-depletion from the plasma membrane is more facile than MOR and independent of C-terminus phosphorylation, are well supported by the data and analyses.

      Despite the reliance on heterologously expressed opioid receptors, the findings are likely to have a high impact on the fields of GPCR trafficking and opioid signaling, as they address a major outstanding question with direct relevance to opioid drug tolerance and may generalize to other GPCRs.

      The findings also evoke new questions that will spur further work in the field. For example, just focusing on DOR, by what mechanism does agonist-driven DOR endocytosis occur not via GRK2/3 phosphorylation? By extension, would G protein-biased DOR agonists be expected to produce less tolerance? To be clear these are not to be addressed in this manuscript.

      We appreciate that this Reviewer found that the current manuscript addresses long standing questions in the field and that our results are well supported by the data, acknowledging the strength of the presented method. We agree that our methods and results do have some associated limitations, particularly with respect to linking the present mechanistic findings to true physiology, and that the question of receptor expression level is pertinent to this link. We have attempted to address this to the best of our ability in the revised manuscript, as summarized below. We agree with the Reviewer that there remain many interesting questions for further study, and have modified the Discussion to more clearly point this out.

      Reviewer #3 (Public Review):

      The studies in the manuscript "Endocytic trafficking determines cellular tolerance of presynaptic opioid signaling" use a novel approach to assess the signaling of presynaptic opioid receptors that inhibit the release of neurotransmitters. Historically, studies have used whole-cell patch-clamp electrophysiology studies of spontaneous and evoked neurotransmitter release to measure the presynaptic effects of opioid receptors. Since the recordings were made in postsynaptic cells that expressed receptors for the released neurotransmitter, the electrophysiological measurements are indirect with respect to the presynaptic receptors under study. The technique used in this manuscript is based on a pHlorin-based unquenching assay that is a measure of synaptic vesicle exocytosis. In this case, the super-ecliptic pHluorin (SEP) is a pH-sensitive GFP that increases fluorescence as the synaptic vesicle protein that it is attached to (VAMP2-SEP) relocates from the acidic synaptic vesicle to the plasma membrane. Opioid agonists inhibit this activity with acute administration and this inhibition is reduced with prolonged, or chronic administration (hours), demonstrating tolerance. The SEP protein can also be conjugated to opioid receptors and used to measure the proportion of receptors on the plasma membrane compared to internalized receptors. The studies show that agonist activation of mu-opioid receptors (MORs) induces endocytosis that is dependent on phosphorylation of the C-terminus and that the development of tolerance is correlated with the loss of MORs at the surface. The results are different for the delta-opioid receptor (DOR) which is also internalized with acute agonist administration but that loss of receptors on the membrane occurs more rapidly and is not dependent on phosphorylation of the C-terminus.

      The results in the studies are clearly presented and clearly substantiate the prior work using electrophysiology to show the late development of tolerance of presynaptic opioid receptor signaling. The studies extend prior work by showing that endocytosis of both MOR and DOR occurs in presynaptic locations but that the cellular mechanisms underlying the maintenance of these receptors on the plasma membrane differ. The imaging results show convincing effect sizes, even with genetic and pharmacological manipulations, that will allow for even further investigation into the cellular mechanisms underlying the development of tolerance. Since these studies transfected the cultured striatal neurons with both the opioid receptors and the VAMP2-SEP, one question that remains is whether imaging of the VAMP2-SEP has the resolution to detect inhibition of endocytosis by endogenous opioid receptors. Since the authors make the point that this technique has advantages over traditional electrophysiological approaches, it is important that the technique allows for the measurement of endogenous levels of receptors. There are minor questions about the statistics used in some of the graphs, and the utility of the presentation of p values on the right-hand axis but these concerns do not alter the overall significance of the studies, which are high impact.

      We are pleased that this Reviewer found our results generally convincing and impactful. We are grateful for the critical comments and suggestions, particularly with regard to improving the statistical analysis and simplifying / removing speculation from our model. We have done our best to address both important aspects in the revised manuscript, as detailed below.

    1. Author Response

      Reviewer #3 (Public Review):

      1) This work focuses exclusively on excitatory input. However, as the authors mention, LGMD neurons also receive inhibitory inputs, and these inputs also appear to segregate to different areas of the dendritic tree depending on the pathway. The contribution of inhibition is mostly ignored throughout the manuscript, but I think that it would be beneficial to discuss how inhibitory inputs fit into the story. For example, if OFF inhibition maps onto the C field, then presumably when there is mixed ON/OFF stimulation there is inhibition of the ON excitation onto the C field? If so, how much excitation of the C field is left? How much does the retainment of spatial coherence sensitivity with mixed stimuli arise from the fact that OFF excitation might dominate because it inhibits the C field? I don't think that additional experiments are needed, but a discussion would be useful. Related, does the model include inhibitory synapses?

      We have not elaborated more specifically on inhibition, as the experimental characterization of its interaction with excitation has not yet been investigated experimentally. We agree that the interaction between excitation and inhibition for mixed ON/OFF stimuli in field C is an interesting topic, but it is unlikely to affect substantially responses to ON stimuli alone. We added a paragraph on E-I integration to the discussion (lines 461-473). The model does include inhibitory synapses which are now more clearly described.

      2) The argument that the cellular organization found here is good because it allows grasshoppers to be sensitive to white approaching stimuli while disregarding spatial coherence and saving energy seems plausible. But it's not clear to me why this is 'optimal' (from the title - 'optimizes neuronal computation'). What exactly is being optimized here? And why is it good that grasshoppers can't discriminate the spatial coherence of ON looming stimuli? Is everything that approaches a grasshopper fast and white always a bad thing, but not the case if the approaching thing is black? Some further placement of these findings into an ecological setting might be helpful here.

      Our thinking is not that there is an advantage to responding to incoherent white looms (on the contrary), but that white looming stimuli in nature are likely less frequent than black/white mixtures or than all dark stimuli. Thus, the inability to discriminate white spatial coherence might have been sacrificed to decrease energy expenditure. We agree that ‘optimal’ might be too strong a wording and we have modified the title and text accordingly. Hopefully the text is now clearer on this point.

    1. Author Response

      Reviewer #1 (Public Review):

      Abdel-Hag, Reem et al. investigated the beneficial effects of a fiber-rich diet in the pathology of α-synuclein overexpressing (ASO) mice, a preclinical model of Parkinson's disease. They found that a prebiotic intervention attenuates motor deficits and reduces microglial reactivity in the substantia nigra and striatum. They extended these findings by doing scRNA sequencing, and they identified the expansion of a protective disease-associated microglia (DAM), a microglial subset previously described during the early stages of disease in several mouse models. Interestingly, the data indicate that microglia do not influence the behavior of ASO mice in the early stages of disease progression. However, microglia are the key mediators of the protective effects of prebiotic treatment in ASO mice. Overall, the conclusions of this paper are well supported by data, but some aspects should be considered to improve the manuscript.

      1) Colony-stimulating factor 1 receptor (CSF1R) inhibition has been widely used as a method for microglia depletion, however, the impact of this approach on peripheral immune cells is controversial. The authors elegantly showed that most gut-associated immune cell populations were unaffected by PLX5622. However, CSF1R signaling has been implicated in the maintenance of gut homeostasis. Could it be possible that PLX5622 treatment affects directly the gut microbiome composition? Are the beneficial changes in the gut microbiome composition of a prebiotic diet still maintained in combination with PLX5622? CSF1R inhibitors with low brain penetration such as PLX73086 and therefore unable to deplete resident microglia (Bellver- Landete, Victor et al., Nat Commun, 2019) would be helpful to rule out peripheral off-target effects.

      We agree that loss of benefits by the prebiotic diet following PLX5622 treatment is possibly due to changes to the microbiome, and cannot exclude this possibility. The mechanism of action of PLX5662 in reshaping the microbiome would most likely involve effects through changes in immune (or other) cell types in the gut, as the drug is not known to have direct effects on the microbiome. As described by the referee, we carefully profiled the mucosal immune system of mice treated with PLX5622 and control chow, and show minor changes associated with the drug. These are control experiments that very few previous studies using PLX5622 have performed, and suggest immune-mediated microbiome changes may be subtle. Further, we do not suggest in the manuscript that microbiome changes, in the first place, mediated the benefits of the prebiotic diet but rather focus the current study on the well-known effect of microglia depletion by PLX5622.

      Microbiome profiling and additional experiments transferring microbiota from diet-treated animals, with and without PLX5622, to naïve mice would be needed to determine the functional effects of gut bacteria on microglial activation and motor symptoms. The use of PLX73086 is also an excellent way to address this point, as are several additional approaches. Comprehensively investigating the indirect contributions of the microbiome to motor symptoms in ASO mice represent a separate series of studies, in our respectful opinion. Nonetheless, this is an important caveat of our work and we now include the following text in the Discussion section to address this point: “Our study does not rule out indirect effects of PLX5622 that include reshaping the microbiome to promote motor symptoms in prebiotic diet-fed mice”. We thank the referee for this comment.

      2) The authors claimed that microglial depletion eliminates the protective effects of the prebiotic diet in ASO mice by showing increased levels of aggregated aSyn in the SN (Fig 5G). However, microglial depletion also has the same effect on WT mice. How do authors interpret this result?

      The referee raises an astute point. Microglia appear to play a complex role in PD and mouse models, with both positive and negative effects demonstrated in various context (for example, PMIDSs: 29401614, 32086763). A primary and non-exclusive function of microglia is the removal of -synuclein accumulations (PMIDs: 32170061, 34555357). Importantly, there is no change in motor behavior in prebiotic-fed WT mice with or without PLX5622 treatment, as expected (see Figs. 5D-F). We have been careful in the manuscript to not suggest that microglia effects on motor symptoms are via a process that include -synuclein aggregates, as this has not been convincingly shown in this mouse model at the time point we are studying (ie., 22 weeks of age). While it would be straightforward to add a statement suggesting why -synuclein levels increase in WT mice on drug, our preferred remedy here is to point out this observation so it does not go unnoticed, but refrain from speculation in the absence of data since this is not a major point of the study. We have now inserted the statement “However, in prebiotic-fed WT and ASO mice, depletion of microglia significantly increased levels of aggregated αSyn in the SN, while levels in the STR remained unchanged (Figure 5G-H).” We thank the referee for this important comment.

      3) What is the rationale for doing a long-term (17 weeks) prebiotic intervention? Have the authors considered doing a short-term intervention? The prebiotic diet should change quickly the gut microbiome composition within a few days or weeks.

      We have previously shown that long-term microbiome depletion is required to impact motor performance in ASO mice (similar timeline as current prebiotic study) (Sampson et al., Cell, 2016). In unpublished data, short-term antibiotic treatment (4 weeks before motor testing) is unable to improve motor symptoms in ASO mice. Thus, we chose a timeframe for the current prebiotic studies guided by empiric data, but further details on dose intervals remain unknown. We agree that the microbiome should rapidly respond to the prebiotic diet, but it is unknown if this response is durable or would the ‘pre-treated’ microbiome profile re-establish at some time after removal of the experimental diet. We respectfully suggest that these more specialized studies are better suited for future projects.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript applies the framework of information theory to study a subset of cellular receptors (called lectins) that bind to glycan molecules, with a specific focus on the kinds of glycans that are typical of fungal pathogens. The authors use the concentration of various types of ligands as the input to the signaling channel, and measure the "response" of individual cells using a GFP reporter whose expression is driven by a promoter that responds to NFκB. While this work is overall technically solid, I would suggest that readers keep several issues in mind while evaluating these results.

      1) One of the largest potential limitations of the study is the reliance of the authors on exogenous expression of the relevant receptors in U937 cells. Using a cell-line system like this has several advantages, most notably the fact that the authors can engineer different reporters and different combinations of receptors easily into the same cells. This would be much more difficult with, say, primary cells extracted from a mouse or a human. While the ability to introduce different proteins into the cells is a benefit, the problem is that it is not clear how physiologically relevant the results are. To their credit, the authors perform several controls that suggest that differences in transfection efficiency are not the source of the differences in channel capacity between, say, dectin-1 and dectin-2. As the authors themselves clearly demonstrate, however, the differences in the properties of these signaling system are not based on receptor expression levels, but rather on some other property of the receptor. Now, it could be that the dectin-2 receptor is somehow just more "noisy" in terms of its activity compared to, say, dectin-1. This seems a somewhat less likely explanation, however, and so it is likely that downstream details of the signaling systems differ in some way between dectin-2 and the more "information efficient" receptors studied by the authors.

      The channel capacity of a cell signaling network depends critically on the distributions of the downstream signaling molecules in question: see the original paper by Cheong et al. (2011, Science 334 (6054), 354-8) and subsequent papers (notably Selimkhanov et al. (2014) Science 346 (6215), 1370-3 and Suderman et al. (2018) Interface Focus 8 (6), 20180039). The U937 cells considered here clearly don't serve the physiological function of detecting the glycans considered by the authors; despite the fact that this is an artificial cell line, the fact the authors have to exogenously express the relevant receptors indicates that these cells are not necessarily a good model for the types of cells in the body that actually have evolved to sense these glycan molecules.

      Signaling molecules readily exhibit cell-type-specific expression levels that influence cellular responses to external stimuli (Rowland et al.(2017) Nat Commun 8, 16009). So it is unclear that the distributions of downstream signaling molecules in U937 cells mirror those that would be observed in the immune cell types relevant to this response. As such, the physiological relevance of the differences between dectin-2 channel capacities and those exhibited by the other receptors are currently unclear.

      We appreciate Reviewer #1’s in-depth comments related to physiological relevance of the U937 cell. A big benefit of using information theory to investigate a biological communication channel is the realization of quantitative measurement of information that the channel transmits without having detailed measurement of spatiotemporal dynamics of receptors and downstream signaling cascades. In addition, the quantity of measured information itself in turn gives us a decent prediction about detailed signaling mechanisms by comparing the information quantity difference. For example, we investigated how transmission of glycan information from dectin-2 is synergistically modulated in the presence of either dectin-1, DC-SIGN or mincle. Our approach allows to investigate how individual lectins on immune cells contribute to glycan information transmission and be integrated in the presence other type of lectins. Therefore, the findings describe how physiologically relevant lectins are integrating the extracellular signal in a more defined way. Furthermore, we found that our model cell line has one order of magnitude higher expression of dectin-2 compared with primary human monocytes and exhibits a similar zymosan binding pattern (will be described in Recommendations for the authors and Figure R8).

      We fully agree that acquiring more information on the information transmission capability of primary immune cells would increase physiological relevance. In the revised manuscript we addressed this concern by comparing the receptor expression levels of our model cell lines with primary monocytes, for which we find an agreement of cellular heterogeneity. However, we would also like to point out that the very basic nature of our question, of how information stored in glycans is processed by lectins, is not tightly bound to these difference of primary cells and cell lines.

      Line 382: Finally, it is important to take into consideration that our conclusions came from model cell lines, which were used as a surrogate for cell-type-specific lectin expression patterns of primary immune cells. Human monocytes and dectin-2 positive U937 cells have comparable receptor densities and respond similar to stimulation with zymosan particles (SI Fig. 6A and B).

      2) Another issue that readers might want to keep in mind is that the details of the channel capacity calculation are a bit unclear as the manuscript is currently written. The authors indicate that their channel capacity calculations follow the approach of Cheong et al. (2011) Science 334 (6054), 354-8. However, the extent to which they follow that previous approach is not obvious. For instance, the calculations presented in the 2011 work use a combined bootstrapping/linear extrapolation approach to estimate the mutual information at infinite population size in order to deal with known inaccuracies in the calculation that arise from finite-size effects. The Cheong approach also deals with the question of how many bins to use in order to estimate the joint probability distribution across signal and response.

      They do this by comparing the mutual information they calculate for the real data with that calculated for random data to ensure that they are not calculating spuriously high mutual information based on having too many bins. While the Cheong et al. paper does a great job explaining why these steps need to be undertaken, a subsequent paper by Suderman et al. (2017, PNAS 114 (22), 5755-60) explains the approach in even greater detail in the supporting information. Those authors also implemented several improvements to the general approach, including a bootstrap method for more accurately estimating the error in the mutual information and channel capacity estimates.

      The problem here is that, while the authors claim to follow the approach of Cheong et al., it seems that they have re-implemented the calculation, and they do not provide sufficient detail to evaluate the extent to which they are performing the same exact calculation. Since estimates of mutual information are technically challenging, specific details of the steps in their approach would be helpful in order to understand how closely their results can be compared with the results of previous authors. For instance, Cheong et al. estimate the "channel capacity" by trying a set of likely unimodal and bimodal distributions for the input to the channel, and choosing the maximal value as the channel capacity. This is clearly a very approximate approach, since the channel capacity is defined as the supremum over an (uncountably infinite) set of input probability distributions. In any case, the authors of the current manuscript use a different approach to this maximization problem. Although it is a bit unclear how their approach works, it seems that they treat the probability of each input bin as an independent parameter (under the constraint that the probabilities sum to one) and then use an optimization algorithm implemented in Python to maximize the mutual information. In principle, this could be a better approach, since the set of input distributions considered is potentially much larger. The details of the optimization algorithm matter, however, and those are currently unclear as the paper is written.

      We thank Reviewer #1’s recommendation for increasing the legitimacy of the calculation. In the revised manuscript we tried to explain channel capacity calculation procedures in more detail with statistical approaches that adopted from Cheong et al. (2011) and Suderman et al. (2018) (SI section 1 and 2). Furthermore, we decide the number of binning from not only random dataset but also the number of total samples as shown below:

      Figure R1. A) Extrapolated channel capacity values of random dataset at infinitely subsampled distribution under various total number of samples and output binning. The white line in the heatmap represents the channel capacity value at 0.01 bit. B) Extrapolated channel capacity values at infinite subsample size of U937 cells’ input (TNF-a doses) and output (GFP reporter) response.

      Figure R1 describes channel capacity values from random (A) and experimental dataset (B, TNFAR + TNF-a). The channel capacity values from random data indicates the dependence of channel capacity on the number of the output binning and total number sample. According to this heatmap, we decided the allowed bias as 0.01 bits as shown in contour line shown in Figure R1A. Since our minimum dataset that used for channel capacity calculation in the absence of labelled input is near 90,000, the expected bias in channel capacity calculation is therefore less than 0.01 bits in binning range from 10 to 1000 as shown in Figure R1A.

      Furthermore, we demonstrated mutual information maximization procedure using predefined unibimodal input distribution and compared with the systematic method that we used in the work. We found that there is no noticeable difference in channel capacity value between two approaches (SI Figure 3M).

      3) Another issue to be careful about when interpreting these findings is the fact that the authors use logarithmic bins when calculating the channel capacity estimates. This is equivalent to saying that the "output" of the cell signaling channel is not the amount of protein produced under the control of the NFκB promoter, but rather the log of the protein level. Essentially, the authors are considering a case where the relevant output of the system is not the amount of protein itself, but the fold change in the amount of protein. That might be a reasonable assumption, especially if the protein being produced is a transcription factor whose own promoters have evolved to detect fold changes. For many proteins, however, the cell is likely responsive to linear changes in protein concentration, not fold changes. And so choosing the log of the protein level as the output may not make sense in terms of understanding how much information is actually contained in this particular output variable. Regardless, choosing logarithmic bins is not purely a matter of convenience or arbitrary choice, but rather corresponds to a very strong statement about what the relevant output of the channel is.

      We understand Reviewer #1’s concern regarding the choice of log binning. We found that if the number of binning is higher than 200, no matter the binning methods, including linear, logarithmic or equal frequency, the estimated channel capacities in each binning number are converged into the same value. The only difference is how quickly the values approach the converged channel capacity as increasing the binning number (shown in Figure R2). In the revised manuscript, we used linear binning to represent more relevant protein signaling as the Reviewer mentioned. Note that the channel capacity values calculated from linear binning do not show noticeable different from our previously calculated channel capacity values.

      On the other hand, linear binning generates significant bias, if we consider labelled input (i.e., continuous input) into channel capacity calculation, due to the increase of binning in input region.

      Figure R2. Output binning number and binning method dependence of channel capacity value for experimental dataset. The inset plots show the relative difference of channel capacity value to the maximum channel capacity value in the entire binning range (i.e., from 10 to 1000) of the corresponding binning method.

      According to Reviewer #1’s comment we have changed the binning method from logarithmic binning to linear binning in the whole experimental dataset except in the presence of labelled input (i.e., dectin-2 antibody). If we consider channel capacity between labelled input and NF-kB reporter, equal frequency binning is used for every layer of the channel capacity (i.e., labelled input-binding, binding-GFP, labelled input-GFP)

      Reviewer #2 (Public Review):

      My expertise is more on the theoretical than the experimental aspects of this paper, so those will be the focus of these comments.

      Signal transduction is an important area of study for mathematical biologists and biophysicists. This setting is a natural one for information-theoretic methods, and such methods are attracting increasing research interest. Experimental results that attempt to directly quantify the Shannon capacity of signal transduction are particularly interesting. This paper represents an important contribution to this emerging field.

      My main comments are about the rigorousness and correctness of the theoretical results. More details about these results would improve the paper and help the reader understand the results.

      We understand reviewer #2’s comment related with rigorousness and correctness of the theoretical results of this work. In the revised manuscript, we added following contents to help the reader to better understand the channel capacity calculation procedures.

      • General illustrative introduction regarding how we measured input and output dataset and how we handle those data to prepare joint probability distribution shown in SI section 1.1 and 1.2.

      • Exemplified mutual information maximization procedure using experimental and arbitrary dataset shown in SI section 1.3.

      The calculation of channel capacity, given in the methods, is quite a standard calculation and appears to be correct. However, I was confused by the use of the "weighting value" w_i, which is not specified in the manuscript. The input distribution appears to be a product of the weight w_i and the input probability value p_i, and these appear always to occur together as a product w_i p_i. (In joint probabilities w_i p(i,j), the input probability can be extracted using Bayes' rule, leaving w_i p_i p(j|i).) This leads met wonder two things. First, what role does w_i play (is it even necessary)? Second, of particular interest here is the capacity-achieving input distribution p_i, but w_i obscures it; is the physical input distribution p_i equal to the capacity-achieving distribution? If not, what is the meaning of capacity?

      We thank Reviewer #2’s comment regarding the arbitrariness of the weightings. We realize there was a lack of explanation on the weighting values in the original manuscript. 𝑃x(𝑖) is a marginal probability distribution of input from the original dataset and 𝑃x'(𝑖) is the marginal probability distribution of modified input that maximize the mutual information. In usual case 𝑃x(𝑖) is not equal to 𝑃x'(𝑖) and therefore one needs to find 𝑃x'(𝑖) from 𝑃x(𝑖). Because 𝑃x'(𝑖) is a linear combination of 𝑃x(𝑖), it can be expressed as 𝑤(𝑖)𝑃x(𝑖) , where 𝑤(𝑖) is the weightings, under constraint ∑input/i 𝑤(𝑖)𝑃x (𝑖) = 1 . The changed input distribution, in turn, modifies the joint probability distribution as 𝑃'xy (𝑖, 𝑗) = 𝑤(𝑖)𝑃xy)(𝑖, 𝑗). To help readers understand of this work we expanded the Appendix with illustrative descriptions.

      A more minor but important point: the inputs and outputs of the communication channel are never explicitly defined, which makes the meaning of the results unclear. When evaluating the capacity of an information channel, the inputs X and outputs Y should be carefully defined, so that the mutual information I(X;Y) is meaningful; the mutual information is then maximized to obtain capacity. Although it can be inferred that the input X is the ligand concentration, and the output Y is the expression of GFP, it would be helpful if this were stated explicitly.

      We agree with Reviewer’s suggestion for better description of input and output in the manuscript. Therefore, we have modified Figure 1 A and B and the main text to describe the source of input and output much clearly, as follows:

      Line 92: Accounting for the stochastic behavior of cellular signaling, information theory provides robust and quantitative tools to analyze complex communication channels. A fundamental metric of information theory is entropy, which determines the amount of disorder or uncertainty of variables. In this respect, cellular signaling pathways having high variability of the initiating input signals (e.g. stimulants) and the corresponding highly variable output response (i.e. cellular signaling) can be characterized as a high entropy. Importantly, input and output can have mutual dependence and therefore knowing the input distribution can partly provide the information of output distribution. If noise is present in the communication channel, input and output have reduced mutual dependence. This mutual dependence between input and output is called mutual information. Mutual information is, therefore, a function of input distribution and the upper bound of mutual information is called channel capacity (SI section 1) (Cover and Thomas, 2012). In this report, a communication channel describes signal transduction pathway of C-type lectin receptor, which ultimately lead to NF-κB translocation and finally GFP expression in the reporter model (Fig. 1A). To quantify the signaling information of the communication channels, we used channel capacity. Importantly, the channel capacity isn’t merely describing the resulting maximum intensity of the reporter cells. The channel capacity takes cellular variation and activation across a whole range of incoming stimulus of single cell resolved data into account and quantifies all of that data into a single number.

    1. Author Response

      Reviewer #1 (Public Review):

      The software presented in this paper is well documented and represents a significant achievement that breaks new ground in terms of what is possible to render and explore in the web browser. This tool is essential for the exploration of SC2 data, but equally useful for the tree of life and other tree-like data sets.

      Thank you for reviewing my work and for this generous assessment.

      Reviewer #2 (Public Review):

      This manuscript describes a web-based tool (Taxonium) for interactively visualizing large trees that can be annotated with metadata. Having worked on similar problems in the analysis and visualization of enormous SARS-CoV-2 data sets, I am quite impressed with the performance and "look and feel" of the Taxonium-powered cov2tree web interface, particularly its speed at rendering trees (or at least a subgraph of the tree).

      Thank you for the kind words.

      The manuscript is written well, although it uses some technical "web 2.0" terminology that may not be accessible to a general scientific readership, e.g., "protobuf" (presumably protocol buffer) and "autoscaling Kubernetes cluster". The latter is like referring to a piece of lab equipment, so the author should provide some sort of reference to the manufacturer, i.e., https://kubernetes.io/.

      Thank you for flagging this. I have now replaced the colloquial "protobuf" with "protocol buffer". I have now provided a URL for Kubernetes. It is always difficult to judge how much to explain technical terms. I certainly agree that many people will be unfamiliar with, for instance, protocol buffers, but an explanation of what they are (which may not be particularly important for understanding Taxonium) can sometimes overshadow more important details. So my preference in that particular case is for an interested reader to research the unfamiliar term.

      In other respects, the manuscript lacks some methodological details, such as exactly how the tree is "sparsified" to reduce the number of branches being displayed for a given range of coordinates.

      This is an important point also raised by Reviewer 3. I have added a new section in the Materials and Methods which discusses this in some detail.

      Some statements are inaccurate or not supported by current knowledge in the field. For instance, it is not true that the phylogeny "closely approximates" the transmission tree for RNA viruses.

      I agree that this was an overly broad claim, and have softened it, now saying:

      "The fundamental representation of a viral epidemic for genomic epidemiology is a phylogenetic tree, which approximates the transmission tree and can allow insights into the direction of migration of viral lineages."

      Mutations are not associated with a "point in the phylogeny", but rather the branch that is associated with that internal node.

      I have changed this as suggested.

      A major limitation of displaying a single phylogenetic tree (albeit an enormous one) is that the uncertainty in reconstructing specific branches is not readily communicated to the user. This problem is exacerbated for large trees where the number of observations far exceeds the amount of data (alignment length). Hence, it would be very helpful to have some means of annotating the tree display with levels of uncertainty, e.g., "we actually have no idea if this is the correct subtree". DensiTree endeavours to do this by drawing a joint representation of a posterior sample of trees, but it would be onerous to map a user interface to this display. I'm raising this point as something for the developers to consider as a feature addition, and not a required revision for this manuscript.

      I entirely agree with this point. I have added a sentence in the discussion:

      "Even where sequences are accurate, phylogenetic topology is often uncertain, and finding ways to communicate this at scale, building on prior work [Densitree citation] would be valuable."

      The authors make multiple claims of novelty - e.g., "[...] existing web-based tools [...] do not scale to the size of data sets now available for SARS-CoV-2" and "Taxonium is the only tool that readily displays the number of independent times a given mutation has occurred [...]" - that are not entirely accurate. For example, RASCL (https://observablehq.com/@aglucaci/rascl) allows users to annotate phylogenies to examine the repeated occurrence of specific mutations. Our own system, CoVizu, also enables users to visualize and explore the evolutionary relationships among millions of SARS-CoV-2 genomes, although it takes a very different approach from Taxonium. Taxonium is an excellent and innovative tool, and it should not be necessary to claim priority.

      I agree that comparisons with existing tools are difficult and often provide a sense of unnecessary competition. I attempted to be quite careful in the specific section focused on comparison, but may have been less careful earlier on. The intent with this first sentence in the abstract was to provide a succinct description of the gap that Taxonium was developed to fill with "however, existing web-based tools for analysing and exploring phylogenies do not scale to the size of datasets now available for SARS-CoV-2". I have now removed the words "analysing and", focusing on the exploration of phylogenies. I think this new sentence is defensible in that valuable tools such as CoVizu intentionally do not explore a phylogeny directly but instead take a multi-level approach, and this new sentence better matches the comparisons in the paper. In the second sentence, I have removed the phrase "is the only tool that", which I agree adds little and may not be accurate, depending on one's interpretation of "readily". Thank you for these points.

      Although the source code (largely JavaScript with some Python) is quite clean and has a consistent style, there is a surprising lack of documentation in the code. This makes me concerned about whether Taxonium can be a maintainable and extensible open-source project since this complex system has been almost entirely written by a single developer. For example, usher_to_taxonium.py has a single inline comment (a command-line example) and no docstring for the main function. JBrowsePanel.jsx has a single inline comment for 293 lines of code. There is some external documentation (e.g., DEVELOPMENT.md) that provides instructions for installing a development build, but it would be very helpful to extend this documentation to describe the relationships among the different files and their specific roles. Again, this is something for the developers to consider for future work and not the current manuscript.

      This is an entirely fair comment. The version of Taxonium presented in the manuscript is "2.0", which is a new version built from scratch with considerably less technical debt than the version that preceded it. Its technical strengths are that (with the exception of the backend) it is relatively well-modularised into functional components. But the limitations that the reviewer notes with respect to commenting are entirely fair. What I would say is that in the time since this manuscript was submitted, several important features have been added by an external collaborator, Alex Kramer, most notably the Treenome Browser (https://www.biorxiv.org/content/10.1101/2022.09.28.509985v1). I hope that the ability of Alex to add these features with little need for support provides some evidence of Taxonium's extensibility. But I acknowledge there is room for improvement.

      Reviewer #3 (Public Review):

      The paper succinctly provides an overview of the current approaches to generating and displaying super-large phylogenies (>10,000 tips). The results presented here provide a comprehensive set of tools to address the display and exploration of such phylogenies. The tools are well-described and comprehensive, and additional online documentation is welcome.

      The technical work to display such large datasets in a responsive fashion is impressive and this is aptly described in the paper. The author rightly decides that displaying large phylogenies is not simply a matter of rendering "more nodes", and so in my eyes, the major advancement is the approach used to downsample trees on-the-fly so that the number of nodes displayed at one time is manageable. This is detailed only briefly (Results section, 1st paragraph, 2 sentences). I would like to see more discussion about the details of this approach.

      Thank you for this point, also raised by Reviewer 2. I have now added a lengthy section on this in the Materials and Methods, which I hope is helpful. The approach is not especially sophisticated, but it does the job and runs quickly.

      Examples that came up while exploring the tool: the (well implemented) search functionality reports results from the entire tree (e.g. in Figure 4, the number of red circles is not a function of zoom level), how does this interact with a tree showing only a subset of nodes?

      Yes, this is an important feature which I perhaps did not do justice to in the write-up. I have included in the new section in the Materials and Methods a paragraph discussing search results:

      "In order to ensure that search results are always comprehensive, but at the same time to avoid overplotting, we take the following approach::

      ● Searches are performed across every single node on the tree to select a set of nodes that match the search. The total number of matches is displayed in the client.

      ● If fewer than 10,000 matches are detected, these are simply displayed in the client as circles

      ● If more than 10,000 matches are detected, the results are sparsified using the method above, and then displayed.

      ● Upon zooming or panning, the sparsification is repeated for the new bounding box."

      How is the node order chosen with regards to "nodes that would be hidden by other nodes are excluded" and could this affect interpretations depending on the colouring used?

      This perhaps was slightly sloppy language which did not directly describe the implementation. I have now rephrased this to "only nodes that overlap other nodes are excluded", as we don't in fact consider a notion of z-index when doing this. The way the sparsification works (now better described) means that the nodes excluded are determined essentially by position and I don't foresee this introducing particular biases, but this was an insightful point to raise.

      Taxonium takes the approach of displaying all available data (sparsification of nodes notwithstanding). Biases in the generation of sequences, especially geographical, will therefore be present (especially so in the two main datasets discussed here - SARS-CoV-2 and monkeypox). This caveat should be made explicit.

      This is certainly true. I have added this new paragraph in the Discussion:

      "A further challenge is the vastly different densities of sampling in different geographic regions. Because Cov2Tree does not downsample sequences from countries which are able to sequence a greater proportion of their cases, the number of tips on a tree is not indicative of the size of an outbreak and in some cases even inferences of the directionality of migration may be confounded. There would be value in the development of techniques that allow visual normalisation of trees for sampling biases, which might allow for less biased phylogenetic representations without downsampling."

      Has the author considered choosing which nodes to exclude for sparsified trees in such a way as to minimise known sampling biases?

      The last sentence of the new paragraph above alludes to a sort-of-similar approach. I hadn't directly considered the approach the reviewer suggests. It is an interesting idea. The downsampling algorithm has to be very computationally inexpensive but it would be interesting to explore ways to do this. I am tracking this in https://github.com/theosanderson/taxonium/issues/437.

      Interoperability between different software tools is discussed in a technical sense but not as it pertains to discovering the questions to ask of the data. As an example, spotting the specific mutations shown in figure 3 + 4 is not feasible by checking every position iteratively; instead, the ability to have mutations flagged elsewhere and then seamlessly explore them in Taxonium is a much more powerful workflow. This kind of interoperability (which Taxonium supports) enhances the claim of "providing insights into the evolution of the virus".

      Thank you for flagging this point -- I am very excited by the growing ecosystem of interoperable tools. You are absolutely right that most of the insights Taxonium can bring into evolution rely also on this broader ecosystem. I have added a florid sentence in the concluding paragraph: "It forms part of an ecosystem of open-source tools that together turn an avalanche of sequencing data into actionable insights into ongoing evolution."

      The prosaic reason I don't discuss Taxonium's interoperability features in more detail in this manuscript is that it aims to describe the version of Taxonium I initially developed, and these features were developed collaboratively by a broader group later on (and after deposition of this preprint).

      Taxonium has been a fantastic resource for the analysis of SARS-CoV-2 and this paper fluently presents the tool in the context of the wider ecosystem of bioinformatic tools in use today, with the interoperability of the different pieces being a welcome direction.

    1. Author Response

      Reviewer #1 (Public Review):

      “This manuscript reports the results of studies on the effects of an ActRIIB-Fc ligand trap inhibitor of myostatin on muscle contractures that develop when brachial plexus nerve roots are severed at 6 after birth. One component of this pathological response seems to be a failure to add sarcomeres as the skeleton grows resulting in short muscles. The authors use a carefully performed set of animal studies to test the effects of the ligand trap on denervation-induced limitations in range of motion in young mice. They also investigate several biochemical mechanisms that might contribute to contractures and be modified by the ligand trap. Finally, the test for gender discordance in the protective effect of a proteasome inhibitor against contractures. The major finding of these studies is that the ligand trap improved the range of motion at the elbow and shoulder in female mice but not in males. The major caveat to interpreting the data is that group sizes are relatively small such that the study may have been underpowered to detect smaller effects on a range of motion and biochemical endpoints.”

      Thank you very much for your thoughtful review of our manuscript. We have taken your feedback regarding the interpretation of our data into consideration, and revised our manuscript accordingly.

      We appreciate the reviewer’s careful scrutiny of our group sizes. As mentioned in the Statistical analysis section of our Materials and Methods, we included at least 6 mice per group for all range of motion and physiological endpoints. Based on an a priori power analysis, this is the number of mice per group necessary to detect a 10° difference in contractures and a 0.2 µm difference in sarcomere lengths at 80% power between experimental conditions. However, the small size of the forelimb muscles, especially following denervation, precluded the investigation of all biochemical parameters in each muscle. Therefore, we used we used smaller subgroup sizes for certain biochemical endpoints (Akt, Smad2/3, and Atrogin-1). In our revised Discussion, we acknowledge our study may be underpowered to detect smaller effects in these parameters of protein dynamics.

      Discussion (lines #461-468): First, the small size of our denervated muscles precluded the use of the same muscles for all analyses, instead requiring smaller subgroup sizes as well as different muscles for certain biochemical endpoints (Akt, Smad2/3, MuRF1, and Atrogin-1). We therefore acknowledge that our study may be underpowered to detect smaller effects in certain parameters of protein dynamics, specifically signaling proteins and ubiquitin ligases. We also acknowledge that the precision of our findings would be further enhanced with the use of the same muscle type across all of our morphological, physiological, and biochemical analyses.

      Reviewer #2 (Public Review):

      “The manuscript by Emmert et al. describes an original and straightforward study demonstrating the utility of targeted therapy in a neonatal brachial plexus injury (NBPI) mouse model. The authors sought to investigate whether pharmacologic inhibition of MSTN signaling using a soluble decoy receptor (ACVR2B-Fc) could preserve longitudinal muscle growth and prevent contractures after NBPI. More specifically, through in vivo experiments using wild-type female and male mice, the authors assessed the impact of inhibiting the MSTN signaling in basal and pathophysiological conditions, on developmental, morphological, and biomechanical parameters, and on several biochemical markers of protein synthesis, protein degradation, and their associated signaling pathways, in forelimb skeletal muscle.

      The authors provide multiple lines of compelling evidence that ACVR2B-Fc improves skeletal muscle biology and function in NBPI mice, provokes hypertrophy, rescues longitudinal growth, and impedes neuromuscular contractures in denervated muscles. Rather than improving the condition independently of the sex, it appears selective to the muscles of female mice showing thus a sex-specific improvement, and therefore the discovery of a sex dimorphism. The experiments also try to provide a mechanistic explanation, though it is incompletely clear why and how it is happening at the end.

      Overall, the study details a promising intervention in NBPI mice and begins to highlight a pathway that can be exploited for this goal. While the reviewer did enjoy the manuscript, and the conclusions of this paper are mostly well supported by data, there are certain deficiencies that cannot be overlooked.

      Strengths:

      A) This study includes a clear-cut demonstration leading to a coherent narrative of a potential intervention for children affected by NBPI, which is well supported by prior literature mentioning the effects of palliative mechanical solutions and investigating the effects of pharmacologic strategies for the prevention of muscle contractures.

      B) This study uses a pharmacologic chronic treatment, in vivo, on female and male neonatal mice to investigate the effects and relevant mechanisms of the MSTN signaling inhibition, using a soluble decoy receptor (ACVR2B-Fc), from the whole organism into the skeletal muscle and further into cellular signaling pathways.

      C) This study provides promising data about the effects of the MSTN signaling inhibition on developmental, morphological, and biomechanical parameters, as well as biochemical markers in the NBPI mice.

      D) This study underlines the importance of using female and male mice during experimental procedures, clearly showing that sex dimorphism can produce very different results.

      E) The manuscript is well written, well organized, and cogent.

      Weaknesses and Limitations:

      A) This study attempts to provide mechanistic information to support and explain the results observed. However, the analysis remains superficial and should go further into detail especially in investigating completely the different molecular pathways considered, and the non-canonical alternatives.

      B) The use of different muscles for biochemical analyses compared to the muscles used for developmental, morphological, and biomechanical parameters limits the interpretation of data, which could be due to muscle differences instead for example.

      C) The interpretation of the findings should be done carefully, knowing that it is an MSTN/Activin A signaling blockade and not an MSTN inhibition alone.

      D) The conclusion would be reinforced with data obtained at later time points (8 and/or 12 weeks).”

      Thank you very much for your comprehensive and insightful review. Your detailed comments and suggestions have not only allowed us to improve our current manuscript with greater clarity and additional data, but they also reinforce our plan to elucidate biochemical mechanisms more completely in future studies. In this revision, we have provided additional experiments to strengthen our analysis of known pathways downstream of MSTN signaling, addressed the use of different muscles as well as the four-week time point, and discussed the potential implications stemming from the broad specificity of the ligand trap. We certainly share your enthusiasm about dissecting the different molecular pathways and non-canonical alternatives. Indeed, we intend to interrogate these mechanistic underpinnings with the same rigor with which we obtained our physiological and translational findings, which cannot be completed within the scope of the current study. Follow-up studies will focus on exploring non-canonical alternatives and investigating long-term effects at skeletal maturity and beyond.

      Reviewer #3 (Public Review):

      “This timely manuscript describes the sex dimorphisms in neonatal development as it applies to muscle injury and denervation. More and more studies are identifying sex differences in skeletal muscle function and dysfunction. This is one more study to point out differences. A missing piece to the field and this study are the mechanistic links between skeletal muscle function/dysfunction and sex differences. This paper starts to point to a mechanism highlighting the non-canonical AKT pathway. This is a very wellwritten manuscript with a clear experimental plan and workflow. I have no major concerns.

      My biggest question is the molecular mechanism linking sex differences and skeletal muscle function and dysfunction. However, this is perhaps a follow-up study to the already complete study the authors present.”

      Thank you very much for your kind words and enthusiasm! We likewise find it important to improve our understanding of sex differences in muscle function/dysfunction, and are committed to unraveling the molecular mechanism(s) that link them in future studies.

    1. Author Response

      Reviewer #1 (Public Review):

      According to the space-time wiring hypothesis proposed by (Kim, Greene et al. 2014), the BC-off SAC circuit mimics the structure of a Reichardt detector; BCs closer to SAC soma have slower dynamics (they can be more sustained, have a delay in activation or slower rise time), while BCs further away are more transient. Later studies confirmed the connectivity and expanded the model on SACs (Ding, Smith et al. 2016, Greene, Kim et al. 2016). However, physiological studies that used somatic recordings to assess the BC properties at different dendritic distances were inconclusive (Stincic, Smith et al. 2016, Fransen and Borghuis 2017). Here, the authors used iGluSnFR, a glutamate sensor to measure the signals impinging on SAC dendrites. Their experimental findings align with the space-time wiring hypothesis, revealing sustained responses closer to SAC soma (mediated by prolonged release from type 7 BCs, and only slightly affected by amacrine cells), which according to their simulated SAC should produce a substantial increase in direction selectivity (DS).

      I find the work to be clear and well presented. However, I do have some reservations with the findings:

      Main points:

      1) Very low number of cells examined in the key experiment presented in the first figure. The authors used a viral approach to express flex- iGluSnFR in SACs in Chat-Cre mice. Sometimes (apparently twice) the construct was expressed in individual SACs - this is a very underpowered experiment! The low number of successes precludes adequately judging the validity of the findings.

      We agree with the reviewer that measuring iGluSnFR signals from single starburst dendrites is a powerful approach to confirm space-time wiring hypothesis. To bolster our data, we doubled our n number (updated Figure 1C and D, n = 66 ROIs; 20 dendrites and 4 retinas/FOVs). It should be noted that the results from these experiments are also validated on a larger scale across the starburst plexus (Figure 2).

      2) The model doesn't represent key known properties of BC-SACs and the interactions within SAC dendrites. First, the authors decided to construct a ball and stick model that doesn't consider the dendritic morphology of the starburst cell. A stimulus moving over a SAC is expected to engage multiple dendrites with complex spatiotemporal patterns that are expected to have a substantial effect on the voltages recorded on the investigated dendrite (Koren, Grove et al. 2017). For example, the dendrites in the orthogonal orientation will be activated at about the same time as the proximal dendrites; how such strong input will affect dendritic integration is unclear but should be taken into account in the model. Second, the authors assume a similar peak BC drive between proximal and distal inputs. However, a recent study found an enhanced glutamate release from proximal BCs, mediated by cholinergic SAC drive ((Hellmer, Hall et al. 2021); not cited). How different release amplitude would affect the conclusions of the model?

      It is well established that individual starburst dendritic sectors are relatively electrically isolated from each other (Miller & Bloomfield, 1983; Euler et al., 2002; Tukker et al., 2004, Poleg-Polsky et al., 2018) and thus we used a simple ball and stick to model direction selectivity in starburst dendrites.

      Related to the Reviewer’s point, in the paper we explicitly acknowledge that the simple ball and stick model will not capture important network interactions that are expected to impact direction selectivity (e.g. SAC-SAC inhibition). We suggest this as a future line of investigation.

      The idea of different synaptic weights across the starburst dendrite is an interesting one. If the proximal inputs are stronger relative to distal ones as the Reviewer suggests, it might be expected that the direction selectivity will be further enhanced. However, in a preliminary analysis, we did not find strong evidence for directionselectivity or sensitivity to MLA, to support the idea of cholinergic modulation.

      3) Another reason for including an accurate dendritic morphology is in the differences in the number of BCs that target a cell. Because SAC dendrites cover the entire receptive field area, type 7 BCs, which occupy the proximal third of the dendrites (Ding, Smith et al. 2016, Greene, Kim et al. 2016), are expected to cover only 11% of the area covered by SAC dendrites (1/3 x 1/3 = 1/9) and correspondingly mediate just 11% of the BC drive. A nonbifurcating model presented here would dramatically overrepresent their contribution to SAC responses. ??

      We have estimated BC numbers directly from the connectomics data which takes into account starburst morphology (Ding et al., 2016). To capture the heterogeneity of BCs that might be encountered at the level of single dendrites, in the revised manuscript, we have averaged responses over many trials in which the precise BC numbers varied according to the probability density functions observed in the connectomics data set (Ding et al., 2016). The details of the model parameters are now provided in the Methods section.

      4) (Fransen and Borghuis 2017) found that off-SACs have a more pronounced distinction in the time to peak than on-SACs. I found it surprising that given the large body of work demonstrating the effectivity of the viral approach in expressing iGluSnFR in off BC (Borghuis, Marvin et al. 2013, Franke, Berens et al. 2017, Szatko, Korympidou et al. 2020, Gaynes, Budoff et al. 2021, Strauss, Korympidou et al. 2021), that the authors did not compare between on and off SAC populations.

      It is possible that the kinetic differences are more pronounced for inputs to OFF starbursts. However, we observed a weaker iGluSnFR expression in the OFF starburst layer and the S/N was below what was required for our analysis. Therefore, we focused on the ON starburst.

      5) Recent work (Gaynes, Budoff et al. 2021) suggests that BCs' responses to motion and to static flashes have distinct dynamics. However, the current manuscript tests responses to flashed stationary stimuli experimentally, and then combines them in a simulation modeling a moving stimulus. This potential limitation of the study should at least be discussed.

      The Reviewer correctly points out that static and motion stimuli might have distinct dynamics (especially ‘emerging’ stimuli). We now describe this limitation of our study and discuss the findings of Gaynes et al. (2022)

      We have revised our model to take BC release rates truncated according to stimulus velocity and size to more appropriately represent the duration of the stimulus.

      Reviewer #2 (Public Review):

      The authors present a nice series of imaging experiments confirming previous anatomical and electrophysiological evidence for the "space-time wiring" model for directionally selective responses in SAC dendrites. Fluorescence measurements with a genetically encoded glutamate indicator show that excitatory inputs to proximal SAC dendrites are more sustained than distal dendrites. Although the signals are shaped by surround inhibition, the fundamental differences persist with inhibition blocked, suggesting intrinsic differences in the synaptic release processes in different cone bipolar cell types.

      The authors examine iGluSnFR dynamics in individual SACs (Figure 1) and in a population of SACs (Figure 2). The latter is possible because distal inputs to all SACs occur deeper in the IPL and so can be imaged separately from the proximal inputs, and it permits the measurement of many more synapses in each experiment. The former approach is particularly powerful, however, because it allows careful mapping of the different types of inputs along the dendritic axis of individual SACs. This experiment was performed in only seven dendrites in two retinas, however; consequently, the confidence intervals for any spatial fitting would be quite broad. This experiment would be strengthened with additional data from more dendrites.

      We have now the increased n number for Figure 1 in the revised manuscript (updated Figure 1C and D, n = 66 ROIs; 20 dendrites and 4 retinas/FOVs). Please see response to the Reviewer #1.

      It is very interesting that white noise stimuli do not pull out the kinetic differences - interesting enough to merit inclusion in the primary figures rather than as a supplement. These results seem valuable to our understanding of DS processing, but the implications remain unclear. Is it really the case that DS is eliminated - or even substantially degraded - when motion stimuli are presented atop some background (i.e., conditions in which the circuit is continuously stimulated)? Are the distinct kinetics are brought about by abrupt, large changes in luminance - if so, wouldn't one expect much weaker DS in response to drifting sinusoidal gratings?

      The question of how direction is encoded in the natural scene is very interesting but beyond the scope of this study. We presented a preliminary white noise analysis to show that our recordings are consistent with other recent reports (e.g. Strauss et al., 2022) which cast doubt on the space-time wiring model, rather than to directy address this specific issue. It should also be noted that complementary inhibitory and/or other intrinsic dendritic mechanisms may ensure that dendrites continue to remain DS in regimes in which BC mechanisms appear to be ineffective.

      In the introduction (p. 3A) the authors suggest that space clamp errors could distort EPSC kinetics, causing EPSCs arriving distally to appear more transient than those arriving more proximally. This seems contrary to what one would typically expect: cable theory would predict that more distal inputs ought to be filtered more, therefore appearing more prolonged, not more transient, than proximal inputs. It does not seem necessary to cast doubt on the previous results (Fransen and Borghuis, 2017) to motivate sufficiently the present experiments. One might simply point out that electrophysiological recordings do not provide precise information regarding the anatomical location of synaptic inputs.

      In the revised manuscript, we changed the text according to the Reviewer’s suggestion.

      Reviewer #3 (Public Review):

      In the study "Spatiotemporal properties of glutamate input support direction selectivity in the dendrites of retinal starburst amacrine cells", Srivastava, deRosenroll, and colleagues study the role of excitatory inputs in generating direction selectivity in the mouse retina. Computational and anatomical studies have suggested that the "space-time-wiring" model contributes to direction-selective responses in the mammalian retina. This model relies on temporally distinct excitatory inputs that are offset in space, thereby yielding stronger responses for motion in one versus the other direction. Conceptually, this is similar to the Reichardt detector of motion detection proposed many decades ago. So far, however, there is little functional evidence for the implementation of the space-time-wiring model. Here, Srivastava, deRosenroll and colleagues use local glutamate imaging in the ex-vivo mouse retina combined with biophysical modeling to test whether temporally distinct and spatially offset excitatory inputs might generate direction-selective responses in starburst amacrine cells (SACs). Consistent with the space-time-wiring model, they find that glutamatergic inputs at proximal SAC dendrites are more sustained than inputs at distal dendrites. This finding was consistent across different sizes of stationary, flashed stimuli. They further linked the sustained input component to the genetically identified type 7 bipolar cell and showed that the difference in temporal responses across proximal and distal inputs was independent of inhibition, but rather relied on excitatory interactions. By estimating vesicle release rates and building a simple biophysical model, the authors suggest that next to already established mechanisms like asymmetric inhibition, excitatory inputs with distinct kinetics contribute to direction-selective responses in SACs for slow and relatively large stimuli.

      In general, this study is well-written, the data is clearly presented and the conclusion that (i) the temporal kinetics of excitatory inputs varies along SAC dendrites and that (ii) this might then contribute to direction selectivity is supported by the data. The study addresses the important question of how excitation contributes to the generation of direction-selective responses. There have been several other studies published on this topic recently, and I believe that the results will be of great interest to the visual neuroscience community.

      However, the authors should address the following concerns:

      • They should demonstrate that differences in response kinetics between proximal and distal dendrites are unrelated to differences in signal-to-noise ratio.

      In response to the Reviewer’s comment, we have now added new plots to supplementary Figure S1 (A, B) that show that the response kinetics are not strongly related to signal strength.

      • To demonstrate consistency across recordings/mice, the authors should indicate data points from different recordings (e.g. Fig. 2C).

      In the updated Figure 2C-E, we have now added the average values for each recording to indicate the consistency/variability in the data.

      • The authors mention in the introduction that the space-time-wiring model is conceptually similar to other correlation-type motion detectors that have been experimentally verified in different species. It would be great to expand on the similarity and differences of the different mechanisms in the Discussion, especially focusing on Drosophila where experimental evidence at the synaptic level exists.

      It should be noted that the results of the influential Nature paper describing the spacetime wiring of inputs to T4 DS neurons in the fly system were not reproduced by the same group. A new paper from Axel Borst’s group, however, shows a distinct source of spatially offset excitation (glutamate-mediated by disinhibition) may underlie the multiplicative operation. Nevertheless, to our knowledge, no studies have mapped out the spatiotemporal properties of inputs across single T4/T5 DS neurons as we have done for the starburst. In the revised manuscript we briefly summarize the fly literature in response to the Reviewer’s suggestion.

      • The authors use stationary spot stimuli of different sizes to characterize the response kinetics of excitatory inputs to SACs. I suggest the authors add an explanation for choosing only stationary stimuli for studying the role of excitatory inputs in direction selectivity/motion processing.

      Please see the above response to Reviewer #1.

      In addition, the authors use simulated moving edges to stimulate the model bipolar cells. They should provide details about the size of the stimulus and the rationale behind using this size, given their previous results.

      For simulation experiments, bipolar cell inputs were triggered by a 400 µm wide bar moving over a range of velocities (0.1 – 2 mm/s). We have now added more details in the Methods section and main text in the revised manuscript.

      • Using the biophysical model, the authors show that converting sustained bipolar cell inputs to transient ones reduces direction selectivity in SACs. I suggest the authors also do the opposite manipulation/flip the proximal and distal inputs or provide a rationale why they performed this specific manipulation.

      Thanks for the suggestion. We have now updated Figure 6B showing DSi vs velocity plots for several different bipolar cell input distributions – (i) sustained-transient, (ii) transient-sustained, (iii) all transient, and (iv) all sustained.

      • In each figure, the authors should note whether traces show single trial responses or mean across how many trials. If the mean is presented (e.g. Suppl. Fig. 2a), the authors should include a measure of variability - either show single ROIs in addition and/or add an s.d. shading to the mean traces.

      In the revised manuscript, we have now indicated the mean and number of trials for each figure. We have also added S.E.M values to the mean traces in the figures.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript by Kim et al., the authors use live-cell imaging of transcription in the Drosophila blastoderm to motivate quantitative models of gene regulation. Specifically, they focus on the role of repressors and use a 'thermodynamic' model as the conceptual framework for understanding the addition and placement of the repressor Runt, i.e. synthetic insertion of Runt repressor sites into the Bicoid-activated hunchback P2 enhancer. Coupled with kinetic modeling and live-cell imaging, this study is a sort of mathematical enhancer bashing experiment. The overarching theme is measuring the input/output relationship between an activator (bicoid), repressor (runt), and mRNA synthesis. Transcriptional repression is understudied in my opinion. One finding is that the inclusion of cooperativity between trans-acting factors is necessary for understanding transcriptional regulation. Most, if not all, of the tools used in this paper have been published elsewhere, but the real contribution is a deep, quantitative dissection of transcriptional regulation during development. As such, the only real questions for this referee are whether the modeling was done rigorously to produce some general biological conclusions. By and large, I think the answer is yes.

      We thank the reviewer for this thoughtful evaluation of our work. We agree with the reviewer’s assessment that transcriptional repression, especially the quantitative dissection of transcriptional repression, is understudied compared to transcriptional activation.

      Comments:

      Fig. 6 was the most striking figure for this referee, specifically that different placements of Runt molecules on the enhancer lead to distinct higher order interactions. I am wondering if the middle data column in Fig. 6 represents a real difference from the other two, and if so, it seems that the positioning - as opposed to simply the stoichiometry - is essential in cooperativity. This conclusion implies that transcriptional regulation is more precise than what some claim is just a mushy ball of factors close to a promoter. In other words, orientation may matter. Proximity may matter. Interactions in trans matter.

      We thank the reviewer for pointing out a feature of our data that we did not emphasize enough originally. Indeed, the construct in the middle column, which we termed [101], could be better recapitulated with the simplest model of zero free parameters than the other two constructs. As the reviewer pointed out, this raises an interesting question about the “grammar” of an enhancer: the placement and orientation of binding sites for transcription factors might matter yet we do not have a clear understanding of the logic. We have now incorporated a discussion of this topic in the Discussion section.

      There needs to be at least one prediction which is validated at the level of smFISH / mRNA in the embryo. Without detracting from the effort the authors have expended in looking directly at transcription, if the effects can't be felt by the blastoderm at the level of mRNA/cell, it becomes difficult to argue for the relevance to development. Also, I feel there is little chance that these measurements can be quantitatively replicated unless translated to the level of total protein or mRNA. Such a measurement (orthogonal quantitative confirmation of the repressor cooperativity result) would also assuage my concern about the time averaging as shown in Fig. S3.

      Our study focused on predicting the initial rate of transcription because it is the measurable quantity that most directly relates to the binding and action of the transcriptional activators and repressors used in this study. We argue that the action of transcription factors would be more accurately assessed by monitoring the rate of transcription, rather than the accumulated mRNA, which could be confounded by the dynamics of the whole transcription cycle—initiation, elongation and termination—as well as nuclear export, diffusion and degradation of transcripts. We are, of course, excited to eventually be able to predict a whole pattern of cytoplasmic mRNA over space and time from knowledge of the enhancer sequence. However, if we cannot predict the initial rate of RNA polymerase loading dictated by an enhancer, we argue that there is little hope in predicting such cytoplasmic patterns. We emphasized this point in the Discussion (Line XX-YY). Regardless, to assuage the reviewer’s concern, we have performed additional analyses to assess the effect of repression at the level of accumulated mRNA.

      First, we have quantified the accumulated mRNA during nuclear cycle 14, which is the time window that we have focused on in this study. To make this possible, we have integrated the area under the curve of MS2 time traces which has been already shown to be a reporter of the total amount of mRNA produced by FISH (Garcia et al., Current Biology 23:2140, 2013;Lammers et al., PNAS 17:836, 2020). This integration reporting on accumulated mRNA is now shown for all constructs in the presence and absence of Runt protein in the new Figure S17. This figure clearly shows that the consequences of repression are present in the blastoderm, not just at the level of transcriptional initiation, but also at the level of accumulated mRNA.

      We then compared the accumulated mRNA profiles shown in Figure S17 to the initial rate of RNAP loading at each position of the embryo along the anterior-posterior axis for all constructs in the presence and absence of Runt protein. These new results are shown in a new figure, Figure S19. Interestingly, we saw a good correlation (Pearson correlation coefficient of 0.90) between these two metrics. Thus, we argue that our conclusion that higher-order cooperativity is necessary to account for the initial rate of RNA polymerase loading would still hold for predicting the accumulated mRNA.

      Reviewer #3 (Public Review):

      The authors have presented results from carefully planned and executed experiments that probe enhancer-drive expression patterns in varying cellular conditions (of the early Drosophila embryo) and test whether standard models of cis-regulatory encoding suffice to explain the data. They show that this is not the case, and propose a mechanistic aspect (higher order cooperativity) that ought to be explored more carefully in future studies. The presentation (especially the figures and schematics) are excellent, and the narrative is crisp and well organized. The work is significant because it challenges our current understanding of how enhancers encode the combinatorial action of multiple transcription factors through multiple binding sites. The work will motivate additional modeling of the presented data, and experimental follow-up studies to explore the proposed mechanisms of higher order cooperativity. The work is an excellent example of iterative experimentation and quantitative modeling in the context of cis-regulatory grammar. At the same time, the work as it stands currently raises some doubts regarding the statistical interpretation of results and modeling, as outlined below.

      We thank the reviewer for noting the significance of our work. We tried our best to address the concerns of the reviewer regarding the statistical interpretation of results and theoretical modeling throughout our responses below.

      The results presented in Figure 5 are used to claim that the data support (i) an unchanging K_R regardless of the position of the Runt site in the enhancer and (ii) an \omega_RP that decreases as the site goes further away from the promoter, as might be expected from a direct repression model. This claim is based on only testing the specific model that the authors hypothesize and no alternative model is compared. For instance, are the fits significantly worse if \omega_RP is kept constant and the K_R allowed to vary across the three sites. If different placements of the Runt site can result in puzzling differences in RNAP-promoter interaction, it seems entirely possible that the different site placements might result in different K_R, perhaps due to unmodeled interference from bicoid binding. Due to these considerations, it is not clear if the data indeed argue for a fixed K_R and distance-dependent \omega_RP.

      We apologize for the lack of justification in assuming that Kr remains constant and wrp varies depending on the position of the Runt binding sites. Following the reviewer’s suggestion, we tested the alternative scenarios where we either fix or vary different combinations of wrp and Kr for our one-Runt binding site constructs. The result is now shown in a new figure, Figure S16. In short, as reported by the Akaike Information Criterion (AIC) in Figure S16F, the MCMC fit explains the data best in the scenario of fixed Kr and different wrp values for one-Runt binding site constructs. Furthermore, we also performed the MCMC inference in the case where we varied both Kr and wrp values across constructs. From this analysis, we obtained similar values of Kr while having different values of wrp across constructs as shown in Figure S16G. Overall, we believe that this evidence strongly supports our assumption of having consistent Kr values but different wrp values for the one-Runt binding site constructs.

      Results presented in Figure 6 make the case that higher order cooperativity involving two DNA-bound molecules of Runt and the RNAP is sufficient to explain the data. The trained values of such cooperativity in the three tested enhancers appear orders of magnitude different. As a result, it is hard to assess the evidence (from model fits) in a statistical sense. Indeed, if all of the assumptions of the model are correct, then using the high-order cooperativity is better than not using it. To some extent, this sounds statistically uninteresting (one additional parameter, better fits). It is not the case that the new parameter explains the data perfectly, so some form of statistical assessment is essential.

      The inferred cooperativity values are indeed orders of magnitude different. However, the cooperativity terms can be also written as “w = exp(-E/(kBT))”, where the E is the interaction energy, kB is the Boltzmann constant, and T is the temperature. As a result, we should compare the magnitude of the different cooperativities on a log-scale. In brief, the interaction energies wrr from the three two-Runt binding site constructs range between 0 and 1kBT, and the higher-order cooperativity wrrp has an energy between -2 and 4kBT. Interestingly, these energies are of the same order of magnitude as the interaction energies typically reported for bacterial transcription factors (e.g., Dodd et al., Genes and Development 18:344-54, 2004). It is important to note that our inferred interaction energies could be either positive or negative, suggesting that both cooperativity and anti-cooperativity can be at play depending on the architecture of the two Runt binding sites. We now report on these interactions in the language of energies Table S1 and elaborate on this in the Discussion section (Line XX-YY).

      Finally, following the reviewer’s suggestion on statistical assessment of whether addition of parameters indeed explains the data better, we adopted the Akaike Information Criterion (AIC) as a metric to compare different models used in Figure 6 and now show the results in a new panel, panel G. Briefly, AIC is calculated by assessing the model’s ability to explain the data while penalizing for having more parameters. The smaller the AIC value is, the better the model explains the data. As we have claimed, the AIC showed a dramatic decrease when adopting the higher-order cooperativity as shown in Figure 6G. Thus we argue that the addition of higher-order cooperativity, while not being able to completely explain the data, is indeed capable of increasing the agreement between experiments and theory across all our two-Runt site constructs.

      Moreover, it is not the case that the model structure being tested is the only obvious biophysics-driven choice: since this is the first time that such higher order effects are being tested, one has to be careful about testing alternative model structures, e.g., repression models that go beyond direct repression and pairwise cooperativity that goes beyond the traditional approach of a single (pseudo)energy term.

      We agree with the reviewer that alternative models with different mechanisms of repression should be mentioned. We have clarified this point further in Discussion (Line XX -YY). In summary, we tested both “competition” and “quenching” models of repression as proposed in Gray et al, (Genes and Development 8:1829, 1994). Interestingly, Figure S5 shows that the “competition” model gives a worse fit compared to the “direct repression” and “quenching” models for the one-Runt binding site cases. We further tried to test these alternative models in the case of two-Runt binding sites constructs. The result is shown in Figure S7 (competition) and S8 (quenching). These figures also reveal that the “competition” model underperformed compared to the “direct repression” or “quenching” models. For the “quenching” model to fit the data, we also had to invoke higher-order cooperativity that is beyond pairwise cooperativity. Thus, we believe that the requirement of higher-order cooperativity holds regardless of the choice of the specific model. Of course, our models of repression are very likely an oversimplification of how repressors actually work. However, given that these simple models have been a prevalent choice of proposed mechanisms for repression in the field of transcriptional repression for the past decades, we believe that the significance of our work lies in the fact that we challenged these models by turning them into precise mathematical statements (in the form of widespread thermodynamics models) and confronting them with quantitative data.

      The general theme seen in Figure 6 is seen again in Figure 7, when a 3-site construct is tested: model complexities inferred from all of the previous analyses are insufficient at explaining the new data, and new parameters have to be trained to explain the results. The authors do not seem to claim that the higher order cooperativity terms (two parameters) explain the data, rather that such terms may be useful.

      We agree that our previous approach was confusing. Figure 7A indeed incorporated all inferred parameters from the previous rounds of inference (Kb, wbp, p, R, as well as Kr, wrp, wrr, and wrrp). However, it is clear that this set of parameters, even including the higher-order cooperativity from two-Runt binding sites cases, was not enough to explain the data from three-Runt binding sites case. Thus, we had to invoke another free parameter, which we termed wrrrp, to explain the data. We have revised Figure 7B such that it is now showing the “best” MCMC fit which explains the data quite well (instead of just showing the “improvement” of fits).

    1. Author Response

      Reviewer #1 (Public Review):

      Xu et al show that mutants in three DNA replication proteins, Mcm2, Pole3, and Pole4 have defects in differentiation in a mouse embryonic stem cell (ESC) model. The Mcm2 mutant (called Mcm2-2A), which specifically blocks the interaction of Mcm2 with histones, has defects in multilineage differentiation and neural differentiation, despite having minimal effect on ESC proliferation or gene expression. Mcm2-2A fails to fully silence ESC genes and activate appropriate differentiation genes. Chromatin profiling analyses show Mcm2 binds many promoters. During differentiation, the Mcm2-2 mutant retains K3K27me3 at differentiation gene promoters and reduced accessibility, consistent with the observed defects in gene expression.

      The findings that Mcm2-2A has minimal effect on proliferation and gene expression in ESCs, but impairs differentiation are interesting, particularly since this mutant seems to separate the histone binding roles of Mcm2 and its roles in DNA replication. Furthermore, the fact the histone binding function is only necessary when cells exit the pluripotent state is of interest. The studies were reasonably thorough and generally support the conclusions that Mcm2 is important for reshaping histone modifications during differentiation, although the details by which this occurs are not clear. Although the authors used two different strategies for identifying the direct binding sites of Mcm2 on chromatin, Mcm2 enrichment at individual loci was relatively weak, suggesting Mcm2 may localize somewhat diffusely. This somewhat weakens the conclusions about the direct vs indirect effects of Mcm2 on chromatin structure and gene expression.

      Overall, this paper reports an interesting set of findings that have a few caveats/limitations regarding how Mcm2 mediates these effects on chromatin during ESC differentiation.

      My biggest question is about the Mcm2 CUT&RUN data, which appears to have low signal-to-noise. The authors appear to be aware of this issue, as they also used an Mcm2-FLAG line for CUT&RUN studies, with similarly low signal to noise. To be clear, this may be due to the binding properties of Mcm2, which may bind chromatin relatively broadly, causing few highly enriched peaks to be observed (similar to cohesin complex in the absence of CTCF). However, it makes the Mcm2 binding data difficult to interpret. First, most Mcm2 peaks seem to be near promoters. Promoters often have a small amount of signal in negative control (IgG or irrelevant antibody) CUT&RUN experiments, presumably due to their MNase accessibility. It is not clear to what extent Mcm2 peaks exceed background because no negative control CUT&RUN was performed. The high correlation of FLAG and Mcm2 CUT&RUN libraries might still be evident if some of this signal is due to background at TSSs. Second, the authors call 13,742 peaks, but browser tracks of some example peaks at the Pax6 and Nanog promoters show minimal enrichment relative to surrounding regions (Fig. 5I, 5S1B). I have concerns that some of these peaks called statistically significant are not biologically meaningful.

      We thank the reviewer for his/her time to review this story and for his/her positive comments. We shared the reviewers’ concern about low signal to noise for Mcm2 CUT&RUN. However, the Mcm2 CUT&RUN signals most likely reflect Mcm2 binding.

      Reviewer #2 (Public Review):

      It is established that different histone chaperones not only facilitate the assembly of DNA into nucleosomes following DNA replication and transcription but also are essential to stem cell maintenance and differentiation. Here the authors Xiaowei Xu et al. propose a novel role for Mcm2 DNA helicase, a subunit of the origin licensing complex Mcm2-7 in stem cell differentiation in addition to or in connection to maintaining genomic integrity in DNA replication. This study is a continuation of the authors' previously published work implicating Mcm2-Ctf4-Polα axis in the parental histone H3-H4 transfer to lagging strands. The present study is elegantly executed with a systemic analysis of the role of Mcm2 in the ES differentiation to neuronal lineage.

      We thank the reviewer for his/her time to review the manuscript and for his/her positive comments.

      Major questions

      1) Mouse ES cells with a mutation at the histone binding motif of Mcm2 (Mcm2-2A) grew normally, but exhibited defects in differentiation. Also, the Mcm2-2A mutation linked global changes in gene expression, chromatin accessibility and histone modifications were not apparent to the similar degree in mouse ES cells compared to NPCs. The authors suggest that the excessive amount of Mcm2 in ES cells, similar to DNA replication, safeguards the chromatin accessibility and gene expression in mouse ES cells resulting in Mcm2-2A mutant ES cells being able to restore the symmetric distribution of parental histones before cell division. What is underlying the mechanism of this difference since overabundant Mcm2 is present in both ES cells and NPCs?

      This is an excellent good question that we can only speculate. As discussed above and below, our results indicate that Mcm2 functions with Asf1a to resolve the bivalent chromatin domains during pluripotency exit. Therefore, it is highly likely that Mcm2’s role in differentiation is independent of its role in DNA replication. Therefore, in the revised manuscript, we downplayed this possibility and suggested that the differentiation defects in Mcm2-2A mutant cells may arise from the involvement of Mcm2 in resolving bivalent chromatin domains (p24).

      2) CAF-1, Asf1a, and Mcm2 partake in similar or redundant chromatin regulation during differentiation with silencing of pluripotent genes and induction of lineage-specific genes. These processes were found commonly dysregulated in both Mcm2-2A cells and Asf1a KO ES cells, albeit with varying degrees. How can authors exclude the possibility of Mcm2 affecting the differentiation via Asf1 with which it forms a complex, as a potentially redundant mechanism in the deposition of newly synthesized or recycled histones?

      To address this question, we performed the following experiment. First, we overexpressed Asf1a in both WT and Mcm2-2A mutant ES cells and determined whether Asf1a overexpression suppress the differential defects in Mcm2-2A mutant cells (Figure 2- figure supplement 2A). We observed that Asf1a overexpression did not rescue the differential defects of Mcm2-2A mutant cells based on analysis of cell morphology (Figure 2- figure supplement 2B) as well as the expression of Oct4 and lineage specific genes during differentiation (Figure 2- figure supplement 2C-E).

      Second, we knocked out Asf1a in both WT cells and Mcm2-2A mutant cells using CRISPR/Cas9 (Figure 2- figure supplement 2F and 2G). and compared the effects of Asf1a KO, Mcm2-2A and Mcm2-2A Asf1a KO double mutation on differentiation. As detailed above, these results indicate that Mcm2’s function in the induction of lineage specific genes is dependent on Asf1a. However, Mcm2 also has independent role on the regulation of pluripotency genes which might through its unique roles on parental histone deposition and gene expression regulation. We discussed these points in the results (p10-13) and discussion (p22-23).

      It is known that CAF-1 and Mcm2 are involved in deposition of new H3-H4 and parental H3-H4, respectively. Further, there is little evidence that CAF-1 interacts with Mcm2 in the literature. Therefore, we did not analyze the relationship between CAF-1 and Mcm2 during differentiation. In the revised manuscript, we discussed these points to address the reviewer’s concern.

      Can authors test potential redundancy between Mcm2 and other histone chaperones and modifiers? Can the authors rescue the NPC phenotype induced by Mcm2 -2A mutant? Can the authors rescue the Mcm2-2A phenotype by overexpression of another histone chaperone or modifier?

      As stated above, we overexpressed Asf1a, which is known to interact with Mcm2, and found that overexpression of Asf1a did not rescue differentiation defects of Mcm2-2A mutant cells. On the other hand, overexpression of Mcm2 in Mcm2-2A cells did rescue defects in differentiation (Figure 2E-G). As discussed above, our results indicate that Mcm2 and Asf1a function in the same pathway for resolving bivalent chromatin domains based on analysis of Asf1a KO Mcm2-2A double mutant as well as RNA-seq datasets of Asf1a KO and Mcm2-2A during differentiation. However, the defects of Mcm2-2A on silencing of Oct4 was not observed in Asf1a mutant cells. Together, these results indicate that the defects in differentiation of Mcm2-2A cells are, at least in part, due to a reduced interaction with Asf1a. Furthermore, Mcm2 also has its unique role in promoting the silencing of pluripotency genes.

      3) Authors argue that Mcm2 may regulate the deposition of newly synthesized or recycled histones via the ability to recycle 1. parental H3.1 and H3.3, 2. via binding directly H3-H4, and/or via 3. Pol II transcription. Which of these mechanisms may be more unique to Mcm2 compared to the other histone chaperones and modifiers?

      This is a very interesting, but challenging question to address for the following reasons. First, while Mcm2-2A mutant showed defects in binding to both H3.1 and H3.3, it is almost impossible to identify a Mcm2 mutant that bind H3.1 and H3.3 differently. Based on our recent studies, our results indicate that the defects in the induction of lineage specific genes are likely due to a loss of Asf1a interaction. However, the defects in silencing of pluripotent gene such as Oct4 is unlikely due to a loss of interaction with Asf1a. Therefore, we suggest that defects in silencing of pluripotent genes in Mcm2-2A mutant cells are likely due to Mcm2’s role in parental histone transfer and/or gene transcription. In the revised manuscript, we dramatically modified the discussion section to reflect the new results as well as to further mitigate the concerns of the reviewer.

      4) Authors observed that in the ES cells the majority of Mcm2 CUT&RUN peaks were enriched with H3K4me3 CUT&RUN signals and ATAC-seq peaks and a small fraction of Mcm2 CUT&RUN peaks were engaged at the bivalent chromatin domains (H3K4me3+ and H3K27me3+). In contrast, in wild-type NPCs all the Mcm2 peaks co-localized with H3K4me3 and ATAC-seq peaks (H3K4me3+, H3K27me3-). The authors thus argued that Mcm2 binding to chromatin is rewired during differentiation citing this differential engagement of Mcm2 with the bivalent chromatin domains in ES and NPCs. What is the mechanism of Mcm2 differential engagement with the bivalent chromatin domains?

      As stated above, the original discussion may be misleading. In the revised manuscript, we dramatically rewrote the discussion based on the new results indicating that Mcm2 and Asf1a function similarly for the induction of lineage specific genes marked by bivalent promoters during pluripotency exit.

      5) Authors indicated that in mouse ES cells Mcm2 CUT&RUN peaks exhibited low densities at the origins. DNA replication origins are licensed by the MCM2-7 complexes, with most of them remaining dormant. Dormant origins rescue replication fork stalling in S phase and ensure genome integrity. It is reported that ESs contain more dormant origins than progenitor cells such as NPCs and that may prevent the replication stress. Also, partial depletion of dormant origins does not affect ECs self-renewal but impairs their differentiation, including toward the neural lineage. Moreover, reduction of dormant origins in NPCs impairs their self-renewal due to accumulation of DNA damage and apoptosis. Can authors exclude the role of reduced dormant origins reflected in the reduced density of Mcm2 at the origins in the differentiation to neuronal lineages?

      Thank the reviewer for excellent suggestions. We have now discussed these points about the potential role of Mcm2 in dormant origins and differentiation defects in the discussion (p24). However, I would like to point out that based on the new results, this is an unlikely mechanism. Supporting this idea, it is known that Mcm2-2A mutant cells from yeast and mouse ES cells are not sensitive to replication stress, such as HU (Foltman, Evrin et al. 2013, Huang, Stromme et al. 2015).

    1. Author Response

      Reviewer #1 (Public Review):

      This paper introduces a new statistical framework to study cellular lineages and traits. Several new measures are introduced to infer selection strength from individual lineages. The key observation is that one can simply relate cumulants of a fitness landscape to population growth, and all of this can be simply computed from one generating function, that can be inferred from data. This formalism is then applied to experimental cell lineage data.

      I think this is a very interesting and clever paper. However, in its current form the paper is very hard to read, with very few explanations beyond the mathematical observations/definitions, which makes it almost unreadable for people outside of the field in my opinion. Some more intuitive explanations should be given for a broader audience, on all aspects : definitions of fitness « landscape », selection strength(s), connections between cumulants and other properties (including skewness) etc... There are many new definitions given with names reminiscent of classical concepts in evolutionary theory, but the connection is not always obvious. It would be great to better explain with very simple, intuitive examples, what they mean, beyond maths, possibly with simple examples. Some of this might be obvious to population geneticists, and in fact some explanations made in discussion are more illuminating, but earlier would be much better. I give more specific comments below.

      We thank the reviewer for calling our attention to the lack of accessible explanations on the significant terms and quantities in this framework. Following the suggestion in the comments below, we added Box 1, providing intuitive and plain explanations on the terms of fitness, fitness landscape, selection, selection strength, and cumulants. In each section, we explain the standard usage of these terms in evolutionary biology and clarify the similarities and differences in this framework. We also added a figure to Box 1 and provided a schematic explanation of the relationships among chronological and retrospective distributions, fitness landscapes, and selection strength. We believe that these explanations and a figure would better clarify the meanings and functions of these quantities.

      Major comments :

      1) the authors give names to several functions, for instance before equation (1) they mention « fitness landscape », then describe « net fitness » , which allows the authors to define « fitness cumulants ». Later on, a « selection » is defined. Those terms might mean different things for different authors depending on the context, to the point there are sometimes almost confusing. For instance, why is h a « landscape » ? For me, a landscape is kind of like a potential, and I really do not see how this is connected to h. « fitness cumulants » is particularly jargonic. There are also two kinds of selection strengths, which is very confusing. I would recommend that the authors make a glossary of the term, explain intuitively what they mean and maybe connect them to standard definitions.

      We appreciate the suggestion of making a glossary of the terms. Following the suggestion, we added Box 1 to provide intuitive and plain explanations of the terms used in this framework.

      In Box 1, we explain why we called h(x) a fitness landscape, referring to its standard usage in evolutionary biology. In evolutionary biology, fitness landscapes (also called adaptive landscapes) are visual representations of relationships between reproductive abilities (fitness) and genotypes. The height of landscapes corresponds to fitness. Since constructing "genotype space" is usually difficult, fitness is often mapped on an allele frequency or phenotype (trait) space to depict a "landscape." Fitness landscapes introduced in our framework are analogous to those in evolutionary biology in that fitness differences are mapped on trait spaces. Although fitness landscapes in evolutionary biology are usually metaphorical or conceptual tools for understanding evolutionary processes, the landscapes in our framework are directly measurable from division count and trait dynamics on cellular lineages.

      We also explain "selection" and "selection strength" in Box 1. As pointed out, we define three kinds of selection strength measures. These three measures share a similar property of reporting the overall correlations between traits and fitness. However, they also have critical differences regarding additional selection effects they represent: S_KL^((1)) for growth rate gain, S_KL^((2)) for additional loss of growth rate under perturbations, and their difference S_KL^((2))-S_KL^((1)) for the effect of selection on fitness variance. We restructured the sections in Results and clarified these important meanings of the different selection strength measures.

      We removed the term "fitness cumulants" as this is non-general and might cause confusion to readers. We now rephrased this more precisely as "cumulants of a fitness landscape (with respect to chronological distribution)." Besides, we added a general explanation of "cumulants" to Box 1 and clarified what first, second, and third-order cumulants represent about distributions.

      2) Along the same line, it would be good to give more intuitive explanations of the different functions introduced. For instance I find (2) more intuitive than (1) to define h . I think some more intuition on what the authors call selection strengths would be super useful . In Table 1 selection strengths are related to Kublack Leibler divergence (which does not seem to be defined), it would be good to better explain this.

      In addition to Box 1, we included more intuitive explanations on fitness landscapes and selection strength where they first appear in the Theoretical background section. As pointed out, descriptions of the linkage between the selection strength measures and Kullback-Leibler divergence were only in the Supplemental Information in the original manuscript. We now explicitly show this linkage where we first define the selection strength.

      Following this comment, we also changed the definition of a fitness landscape from the original one to h(x)≔τΛ+ln⁡〖Q_rs (x)/Q_cl (x)〗 (Eq. 1), using the chronological and retrospective distributions introduced in the preceding paragraph. This definition is mathematically equivalent to the previous one, but we believe it is more intuitive.

      3) It seems to me the authors implicitly assume that, along a lineage, one would have almost stationary phenotypes (e.g. constant division rate) . However, one could imagine very different situations, for instance the division rates could depend on interactions with other cells in the growing population, and thus change with time along a lineage. One could also have some strong random components of division rate over time . I am wondering how those more complex cases would impact the results and the discussion

      We thank the reviewer for pointing out our insufficient explanation of an essential feature of this framework. As we now explain in the "Examples of biological questions" section (L62-65) and Discussion (L492-493), this framework does not assume stationary phenotypes (traits) on cellular lineages. On the contrary, we developed this framework so that one can quantify fitness and selection strength even for non-stationary phenotypes (traits) due to factors such as non-constant environments and inherent stochasticity.

      In fact, if traits are stationary in cellular lineages, this framework becomes essentially identical to the individual-based evolutionary biology framework (see ref. 26, for example). Our framework assumes a cell lineage as a unit of selection and any measurable quantities along cellular lineages as lineage traits, whether they are stationary or non-stationary. Therefore, our framework can evaluate fitness landscapes and selection strength without explicitly taking the environmental conditions around cells into account. This means that h(x) and S[X] in this framework extract the correlations between the traits of interest and division counts among various factors that could potentially influence division counts. On the other hand, this framework has a limitation due to this design: it cannot say anything about the influence of factors such as non-quantified traits and potential variations in environmental conditions. We now explain these important points explicitly in the revised manuscript (L493-496).

      Likewise, stochasticity in division rate does affect division count distributions, and its influence appears as differences in the selection strength of division count S[D]. As stated in the text, S[D] sets the maximum bound for the selection strength of any lineage trait (L143-145). Therefore, S_rel [X]≔S[X]/S[D] reports the relative strength of the correlation between the trait X and lineage fitness in a given level of S[D] in each condition.

      To clarify the influence of stochasticity in division rate, we present a cell population model in which cells divide stochastically according to generation time (interdivision time) distributions in Appendix 2 (we moved this section from the Supplemental Information with modifications). We can confirm from this model that the shapes of generation time distributions influence the selection strength S[D]. Importantly, one can understand from this model that stochasticity in generation times constantly introduces selection to cell populations and modulates the growth rate and selection strength even in the long-term limit. We now clarify this important point in the Discussion (L519-526).

      4) « Therefore, in contrast to a common assumption that selection necessarily decreases fitness variance, here we show that under certain conditions selection can increase fitness variance among cellular ». This is a super interesting statement, but there is such a lack of explanations and intuition here that it is obscure to me what actually happens here.

      When a decrease in fitness variance by selection is mentioned in evolutionary biology, an upper bound and inheritance of fitness across the generations of individuals are usually assumed. In such circumstances, selection drives the fitness distribution toward the maximum value, and the selection eventually causes fitness variance to decrease. However, even in this process, a decrease is not assured for every step; whether selection reduces fitness variance at each step depends on the fitness distribution at that time.

      In our argument, we compared fitness variances between chronological and retrospective distributions. We showed both theoretically and experimentally that there are cases where the variances of the retrospective distributions (distributions after selection) become larger than those of the chronological distributions (distributions before selection). The direction of variance change depends on the shape of chronological distributions, primarily on the skewness of the distributions (positive skew for increasing the variance and negative skew for decreasing the variance). The direction of variance changes can also be probed by the difference between the two selection strength measures S_KL^((2))-S_KL^((1)). Notably, we can demonstrate that there are cases where retrospective fitness variances are larger than chronological fitness variances even in the long-term limit, as shown by a cell population model in Appendix 2.

      We now explain what kind of situations are usually premised when reduction of fitness variance is mentioned and clarify that, in our framework, we compare the fitness variances between chronological and retrospective distributions (L542-548). We also explain that a selection effect on fitness variance generally depends on fitness distribution and that a larger fitness variance in retrospective distribution is possible even in the long-term limit (L548-557).

      Reviewer #2 (Public Review):

      The paper addresses a fundamental question: how do phenotypic variations among lineages relate to the growth rate of a population. A mathematical framework is presented which focuses on lineage traits, i.e. the value of a quantitative trait averaged over a cell lineage, thus defining a fitness landscape h(x). Several measures of selection strengths are introduced, whose relationships are clarified through the introduction of the cumulant generating function of h(x). These relationships are illustrated in analytical mathematical models and examined in the context of experimental data. It is found that higher than third order cumulants are negligible when cells are in early exponential phase but not when they are regrowing from a stationary phase.

      The framework is elegant and its independence from mechanistic models appealing. The statistical approach is broadly applicable to lineage data, which are becoming increasingly available, and can for instance be used to identify the conditions under which specific traits are subject to selection.

      We appreciate the reviewer for the positive evaluation. We will reply to your specific comments below.

      Reviewer #3 (Public Review):

      In this work the authors have constructed a useful mathematical framework to delineate contributions leading to differences in lineages of populations of cells. In principle, the framework is widely applicable to exponentially growing populations. An attractive feature is that the framework is not tailored to particular growth models or environmental conditions. I expect it will be valuable for systems where contributions from phenotypic heterogeneity overwhelm contributions from intrinsic stochasticity in cellular dynamics.

      I am generally very positive about this work. Nevertheless, a few specific concerns:

      1) In here, lineages are considered as fitter if they have more division events. But this consideration neglects inherent stochasticity in division events. Even in a completely homogeneous population, the number of division events for different lineages is different due to intrinsic stochasticity, but applying the methods discussed in this manuscript may lead to falsely assigning different fitness levels to different lineages. The reason why (despite having different number of division events) these lineages ought be assigned the same fitness level is that future generations of these cells will have identical statistics, in contrast with those of cells that are phenotypically different. Extending the idea to heterogeneous populations, the actual difference in fitness levels may be significantly different from what is obtained from the mathematical framework presented here, depending on the level of inherent stochasticity.

      We thank the reviewer for the comment on the point of which our explanation was insufficient in the original manuscript. Intrinsic stochasticity in interdivision time (generation time) is, in fact, critical for selection. For example, if a cell divides with a generation time shorter than the average due to stochasticity, this cell is likely to have more descendant cells in the future population on average than the other cells born at the same timing, even if the descendants follow identical statistics. Therefore, the properties of intrinsic stochasticity, including shapes of generation time distributions and transgenerational correlations, significantly affect the overall selection strength S_KL^((1)) [D] (and also S_KL^((2)) [D]). We now explain this important point in the Results section, referring to the analytical model in Appendix 2 (L327-334), and also in Discussion (L519-524).

      Importantly, even when cell division processes seem purely stochastic, different states in some traits might underlie these variations in generation times. In such cases, evaluating h(x) and S_rel [X] can still unravel the correlations between the trait values and fitness. Especially, the relative selection strength S_rel [X]≔S_KL^((1) ) [X]/S_KL^((1) ) [D] extracts the correlation of the trait values in a given level of division count heterogeneity in each condition. We now clarify this important aspect of the framework in Discussion (L524-526).

      When a cell population is composed of heterogeneous subpopulations each of which follows a distinct statistical rule, our framework evaluates the combined effects from the heterogeneous rules and the inherent stochasticity of each subpopulation. Untangling these two contributions is generally challenging unless we have appropriate markers for distinguishing the subpopulations. However, when the subpopulations follow significantly distinct statistics, the division count distribution should become skewed or multimodal, and the difference between the two selection strength measures S_KL^((2) ) [D]-S_KL^((1) ) [D] can suggest the existence of such subpopulations. Therefore, detailed analyses using all the selection strength measures and the fitness landscapes can provide insights into cell populations’ internal structures and selection.

      We now explain the effect of inherent stochasticity in generation times (L327-334 and L519-524) and discuss how we can probe the existence of subpopulations based on the selection strength measures (L508-512). Please also refer to our reply to the comment 3 of reviewer #1.

      2) In one of the sections the authors mention having performed analytical calculations for a cellular population in which cells divide with gamma distributed uncorrelated interdivision times. It's unclear if 1) within specific sub-populations, cells with the sub-population divide with the same division time, and the distribution of division times is due to the diverse distribution of sub-populations; or 2) if there are no such sub-populations and all cells stochastically choose division time from the same distribution irrespective of their past lineage. If the latter, then I do not see the need for a lineage-based mathematical formulation when the problem can dealt with in much simpler traditional ways which so not keep track of lineages.

      We dealt with the situation of 2) in this model. As noted by the reviewer, we can calculate the chronological and retrospective mean fitness and the population growth rate by a simpler individual-based age-structured population model (see ref. 10, for example). However, applying this framework to this model can clarify the utility of the cumulant generating function, the meaning of the differences between these fitness measures, and the effect of statistical properties of intrinsic stochasticity on long-term growth rate and selection. Therefore, we kept this model in Appendix 2 (the section is moved from Supplemental Information) with additional clarification of our motivation for analysis and the implication of the results.

      3) The analytical calculations provided seem to be exact only for trajectories of almost infinite duration (or in practice, duration much greater than typical interdivision time). For example, if the observation time is of the order of division time, this would create significant artifacts / artificial bias in the weights of lineages depending on whether the cell was able to divide within the observation time or not. Thus, the results claiming that contributions of higher order cumulants become significant in the regrowth from a late stationary phase are questionable, especially since authors note that 90% of cells showed no divisions within the observation time.

      We thank the reviewer for an insightful comment. It is true that the duration of observation influences the results. In the regrowing experiments with E. coli, we aimed to compare the two cell populations regrowing from different stages of the stationary phase. Therefore, it is appropriate to fix the time windows between the two conditions. Even though a significant fraction of cell lineages remains undivided, the regrowing cells already divide several times within this time window. Therefore, the results are valid if we compare and discuss the selection levels in this time scale. However, clarification of the selection in the longer time scales requires a more detailed characterization of lag time distributions under both conditions.

      We now clarify the range of validity of the results and the limitations on prediction for the long-term selection without knowing the details of the lag time distributions in Discussion (L536-539).

    1. Author Response

      Reviewer #2 (Public Review):

      There is emerging evidence that connexin43 hemichannels localized to mitochondria can influence their function. Here the authors demonstrated using an osteocyte cell model that connexin43 is localized to mitochondria and that this is enhanced in response to oxidative stress. Several lines of evidence were presented showing that mitochondrial connexin43 forms functional hemichannels and that connexin43 is required for optimal mitochondrial respiration and ATP generation. These aspects were major strengths of the study.

      The authors also show that connexin43 is recruited to mitochondria in response to oxidant stress, as a cell protective mechanism. This was primarily done using hydrogen peroxide to generate oxidant stress; primary osteocytes from Csf-1+/- mice, which are prone to Nox4 induced oxidant stress, also show enhanced mitochondrial connexin43 when compared with wild type osteocytes.

      Several approaches were used to demonstrate that connexin43 interacts with the ATP synthase subunit, ATP5J2, suggesting a direct role for connexin43 in the control of ATP synthesis by mediating mitochondrial ion homeostasis. Several experiments were done using a series of pHluorin fusion protein constructs as a proton sensor, these experiments hint at a potential role for connexin43 in regulating H+ permeability to support ATP production. However, the effects of inhibiting connexin43 on pH were modest, suggesting that additional roles for mitochondrial connexin43 in ATP generation should be considered.

      Thank you for your positive and thoughtful comments. We agree that additional roles for mitochondrial Cx43 may be possible. As an example, we consider that there may be a change in the stability of ATP synthase that occurs after mtCx43 deficiency. This and other possible roles of mtCx43 ought to be investigated in the future.

      Reviewer #3 (Public Review):

      This manuscript should be of broad interest to readers not only in the field of gap junction (GJ) mediated cell-to-cell communication but also to scientists and clinicians working on the function of mitochondria and metabolism. Their data elucidates a new function of Cx43 in regulating the energy (ATP) generation of mitochondria, e.g., under oxidative stress.

      The canonical function of gap junctions is in direct cell-to-cell communication by forming plasma membrane traversing channels that electrically and chemically connect the cytoplasms of adjacent cells. These channels are assembled from connexin proteins, connexin 43 (Cx43). However, more recently new, non-canonical cellular locations and functions of Cx43 have been discovered, e.g. mitochondrial Cx43 (mtCx43). However, very little is known about where Cx43 transported into mitochondria is derived from, how Cx43 is transported into mitochondria, where it is located in mitochondria, in which form Cx43 is present in mitochondria, (polypeptides, hemi-channels (HCs), complete GJ channels), and what the function of mtCx43 is. The authors addressed the latter question. The authors provide convincing evidence that mtCx43 modulates mitochondrial homeostasis and function in bone osteocytes under oxidative stress. Together, their study suggests that mtCx43 hemi-channels regulate mitochondrial ATP generation by mediating K+, H+, and ATP transfer across the mitochondrial inner membrane by directly interacting with mitochondrial ATP synthase (ATP5J2), leading to an enhanced protection of osteocytes against oxidative insult. These findings provide important information of a role of Cx43 functioning directly in mitochondria and not at the canonical location in the plasma membrane. While most of the functional assays presented in Figures 2-8 appear solid, the mitochondrial localization of Cx43, its translocation into mitochondria under oxidative stress, and its configuration as hemi-channels (Figure 1) is less convincing. I have five general comments that should be addressed:

      1) This study was performed in MLO-Y4 osteocyte cells. Is the H2O2 induced increase of mitochondrial Cx43 MLO-Y4 cell type or osteocyte specific, or is Cx43 playing a more general role in mitochondrial function, e.g. under oxidative stress? Osteoblasts such as MC3T3-E1 and MG63, and many other cell types endogenously express Cx43, and oxidative stress is a general physiological stressor, not only for osteocytes and bone cells. Attending to this question would address the generality of the findings for mitochondrial function.

      We thank the reviewer for bringing up these valid points; seeing the phenotype displayed in secondary cell types, such as osteoblasts, would be of great relevance and interest. To address this, we conducted new experiments on MC3T3-E1 cells (Figure 1-figure supplement 2). After 2 hrs of H2O2 treatment, Cx43 accumulated on the mitochondria, marked by Mitotracker. Statistical analysis also showed a significant increase of the localization between Cx43 and Mitotracker (Figure 1-figure supplement 2B). The colocalization coefficient is higher in the Ctrl group in MC3T3-E1 cells when compared with the MLO-Y4 Ctrl group, indicating a different response level in other cell lines. Osteoblasts seemed to be more sensitive to redox interference. Overall, proving the point that under oxidative stress, mtCx43 may display a similar phenotype, across multiple cell lines, although the degree of sensitivity may differ.

      2) The images of MLO-Y4 cells (Figure 1A) and the primary osteocytes isolated from Csf-1+/- and control mice (Figure 8) do not show visible gap junctions. I guess this is due to the fact that slides were stained with the Cx43(E2) antibody. I feel, staining of these cells in addition with the Cx43(CT) antibody would be helpful to get a better understanding on the distribution of Cx43 in gap junctions and undocked/un-oligomerized Cx43 in these cells.

      Thank you for the suggestion. To get a better understanding of the distribution of Cx43, either in GJ or HC form, we performed additional experiments in MLO-Y4 cells using the Cx43(CT) antibody and data are shown below. With Cx43(CT) staining, we observed more signals in the cells and on the plasma membrane. After H2O2 treatment, we observed increased and stronger signals localized on the mitochondria compared with the untreated control group. Stronger signals observed in the plasma membrane indicate the gap junction stained by Cx43(CT) antibody.

      3) The images of cells presented in Figure 1A are quite fussy. No mitochondria are visible, and the Cx43 staining is hazy and does not localize to any subcellular structures. Also, it is not clear if the higher resolution image presented in Figure 1C actually represents a mitochondrion. A good DIC image, or co-staining with another mitochondrial marker such as MitoTracker (as shown in Figure 4-S1) would make the localization and translocation of Cx43 into mitochondria upon oxidative stress more convincing. This is especially important as the translocation, although statistically significant, increases only by about 10% or less (Figure 1B). Such a small difference (also represented in the Western analyses presented in Figure 1D) could easily be artefactual, depending on how the correlation coefficient was generated. Of note in this respect is that control cells in Figure 1A appear larger (compare the size of the nuclei) and are spread out more than the H2O2 treated cells. Better, more clear images would make the mitochondrial localization/translocation more convincing.

      The reviewer made great points. To improve the image clarity, we redid the staining/imaging and determined the colocalization of SDHA and MitoTracker Deepred. The result (shown below) suggested that under normal conditions without H2O2 treatment, SDHA and MitoTracker merged perfectly, while after H2O2 treatment for 2 hrs, mitochondria became fragmented and the SDHA signal exhibited a more dotted pattern compared to the MitoTracker. Overall, we feel that MitoTracker represents the distribution of mitochondria better. SDHA is a subunit of mitochondrial complex II, and the images we presented in Figure 1C were captured from isolated mitochondria under a confocal microscope with SDHA and Cx43(CT) co-staining. Considering the specificity of SDHA (see images below), we believe the Cx43 signal we captured demonstrates the mitochondrial localization/translocation. After using MitoTracker as a mitochondrial marker and higher magnificent images, the correlation coefficient increased from 0.35 to 0.47, a 32% increment with statistical significance. As to the nuclei size, some cells indeed have smaller sizes, which may be affected by varied local cell density. The new images represented in Figure 1A are much more consistent in the nuclei size.

      4) How pure are the mitochondria that were probed for Cx43 by Western shown in Figure 1D? The preparation method described is relatively simple, collecting the 10,000xg supernatant (here 9,000xg supernatant) as mitochondrial fraction. Is it possible that the Cx43 signal, at least in part, is derived from other, contaminating membranes, such as PM, Golgi, or ER? Testing the mitochondrial preparation by Western with marker proteins specific for these compartments would strengthen the author's results.

      The reviewer made a great suggestion. To address this, we did a western blot to test the mitochondrial purity. Indeed, this method using centrifugation is simple, and as expected there were some contamination of ER (marked by PDI) and Golgi (marked by STX6). However, to further confirm the purity of the mitochondrial fraction, fluorescent dyes for mitochondria (MitoTracker Deepred), ER (ER-Tracker Blue-White), and nuclei (Hochest) were used. The organelle-specific dyes indicated most parts of the fraction were mitochondria. There were some contaminations with ER fragments and minimal nuclear contamination. Combining our western blot and immunofluorescence data, it can be concluded that our Cx43 signal is primarily derived from mitochondria.

      5) The authors rely on previous studies to postulate that Cx43 in mitochondria forms hemichannels in their system, is localized in the inner membrane, and is oriented with the Cx43 C-termini facing the inter-membrane space (as schemed in Figure 8C). The authors use lucifer yellow (LY) dye transfer and carbenoxolone, but both are not hemi-channel specific probes. They are transferred by, and block GJ channels as well. Experiments, using hemi-channel specific probes would be more convincing. This is important, as the information cited is based on only two references (Boengler et al., 2009; Miro-Casas et al., 2009), and it still is highly unclear how a membrane protein that is co-translationally inserted into the ER membrane, then traffics through the Golgi to be inserted into the plasma membrane is actually imported into mitochondria and in which state (monomeric, hexameric). Why the Cx43(CT) specific antibody traverses the outer mitochondrial membrane and reaches the Cx43CT while the Cx43(E2) specific antibody is not described and clear either. Where are these mitochondria permeabilized with Triton X-100 as described in M&M?

      We edited the Methods section. We did not use Triton X-100 to permeate mitochondria. PMP appeared to preserve mitochondrial inner membrane integrity allowing us to assess the localization of Cx43(CT) antibody on mitochondria. We showed these new immunofluorescence images in Figure 5- figure supplement 2. PMP used as a plasma membrane permeabilizer has a 6x affinity with MOM compared with MIM. Meanwhile, no Cx43(E2) Ab signal was detected in mitochondria, suggesting the extracellular loop of Cx43 faces the matrix and cannot be accessed by Cx43(E2) antibody.

      The translocation of Cx43 to mitochondria was reported to involve the chaperone Hsp90-dependent TOM complex pathway (Rodriguez-Sinovas et al., 2006). After the translocation, if mtCx43 forms gap junctions in mitochondria is unclear. Lucifer yellow is widely used in hemichannel-mediated dye uptake or gap junction-mediated dye transfer. In our case, considering the channel orientation, mtCx43 should form hemichannels, and Cx43(CT) Ab could be used as a specific Cx43 HCs blocker like the study reported in cardiomyocytes (Lillo et al., 2019).

    1. Author Response:

      Reviewer #1 (Public Review):

      Here, Servello et al explore the role of temperature and the temperature-sensing neuron AFD in promoting protection against peroxide damage. Unlike many other environmental threats, peroxide toxicity is expected to be temperature-dependent, since its chemical reactivity should be enhanced by higher temperatures. The authors convincingly and rigorously show that transient exposure to 25C, a condition of mild heat stress in C. elegans, activates animals' defenses against peroxides but potentially not other agents. Interestingly, this response requires the temperature-sensing AFD neurons, though whether temperature-dependent AFD activity is itself involved in this regulation is not explored. Further, the authors find that temperature regulates AFD's expression of the insulin ins-39 and provide evidence supporting the idea that repression of ins-39 at 25C contributes to enhanced peroxide defense. The authors use transcriptomic approaches to explore gene expression changes in animals in which AFD neurons are ablated, providing evidence that the FoxO-family transcription factor DAF-16 potentiates AFD signaling. However, because AFD ablation triggers effects broader than transient 25C exposure, the significance of these findings for temperature-dependent peroxide defense is somewhat unclear. Additionally, the possibility that DAF-16 (as well as another protective factor, SKN-1) function in parallel to temperature stress is consistent with many of the results shown but is not as thoroughly considered. Together, these studies identify a fascinating example of pre-emptive threat response triggered by the detection of a potentiator of that threat, a phenomenon they term "enhancer sensing." While some predictions of the specificity of this phenomenon remain untested, the paper provides intriguing insight into the potential mechanisms by which it may occur.

      Major issues:

      The dependence of the enhancer-sensing phenomenon on AFD leads the authors to conclude that the 25C stimulus is sensed by AFD itself, but this needs to be directly tested. To do this, they could ask whether tax-4 function is required in AFD, or use mutants in which AFD's thermosensory function is compromised.

      We thank the reviewer for suggesting these experiments. As requested, we determined whether previously identified mechanisms for temperature perception by the AFD neurons were required for the temperature-dependent regulation of peroxide resistance using gcy-18 gcy-8 gcy-23 triple mutants and the respective single mutants. The findings from the new experiments lead us to conclude that temperature perception by AFD via the GCY-8, GCY-18, and GCY-23 receptor guanylate cyclases, which are exclusively expressed in the AFD neurons, contributes to the temperature-dependent regulation of peroxide resistance in C. elegans. These experiments are detailed in the following new paragraph in the results section:

      “Last, we determined whether previously identified mechanisms for temperature perception by the AFD neurons were required for the temperature-dependent regulation of peroxide resistance. The AFD neurons sense temperature using receptor guanylate cyclases, which catalyze cGMP production, leading to the opening of TAX-4 channels (Goodman and Sengupta, 2019). Three receptor guanylate cyclases are expressed exclusively in AFD neurons: GCY-8, GCY-18, and GCY-23 (Inada et al., 2006; Yu et al., 1997) and are thought to act as temperature sensors (Takeishi et al., 2016). Triple mutants lacking gcy-8, gcy-18, and gcy-23 function are behaviorally atactic on thermal gradients and fail to display changes in intracellular calcium or thermoreceptor current in the AFD neurons in response to temperature changes (Inada et al., 2006; Ramot et al., 2008; Takeishi et al., 2016; Wang et al., 2013; Wasserman et al., 2011). We found that when grown and assayed at 20°C, gcy-23(oy150) gcy-8(oy44) gcy-18(nj38) triple null mutants survived 43% longer in the presence of tBuOOH than wild-type controls (Figure 3J). In contrast, at 25°C, the gcy-23 gcy-8 gcy-18 triple mutants showed a 12% decrease in peroxide resistance relative to wild-type controls (Figure 3K). Therefore, the three AFD-specific receptor guanylate cyclases influenced the temperature dependence of peroxide resistance, lowering peroxide resistance at 20°C and slightly increasing it at 25°C. At 20°C, the gcy-8(oy44), gcy-18(nj38), and gcy-23(oy150) single mutants increased peroxide resistance by 10%, 51%, and 21%, respectively, relative to wild-type controls (Figure 3L). Therefore, each of the three AFD-specific receptor guanylate cyclases regulates peroxide resistance. We conclude that temperature perception by AFD via GCY-8, GCY-18, and GCY-23 enables C. elegans to lower their peroxide resistance at the lower cultivation temperature.”

      The enhancer-sensing model is fascinating, but as it stands it is somewhat oversold. The authors could tone down the writing, indicating that this model is suggested rather than shown. Alternatively, they could more carefully test some of its predictions - for example by exploring the response to other threats (e.g. some of the toxicants described in Fig. S5) at 20C and 25C in WT and AFD-ablated animals.

      We edited the manuscript and expanded the manuscript’s discussion to address these concerns as well as similar concerns from reviewer #3. In the paper we show that the regulation of the induction of H2O2 defenses in C. elegans is coupled to the perception of temperature (an inherent enhancer of the reactivity of H2O2). To understand the significance of this finding in an evolutionary context, and to explain why such a regulatory system would evolve, we introduced in the discussion a new conceptual framework, “enhancer sensing,” and devoted a section of the discussion to demonstrating that the phenomenon that we observed could not be adequately explained by existing frameworks used to understand the evolutionary origins of the regulatory systems for defense responses.

      We now realize that we did not sufficiently and clearly explain the scope for the criterion for establishing a phenomenon represents enhancer sensing, leading to incorrect predictions by reviewer’s 1 and 3 about (a) whether what we observed in C. elegans is an instance of enhancer sensing (or more proof is needed) and (b) what the enhancer sensing model for the coupling of temperature perception to H2O2 defense would predict about how temperature and the AFD neurons would affect resilience to other chemicals. We regret failing to adequately explain the model’s scope and predictions and believe that we have now explicitly addressed the scope of what constitutes enhancer sensing and the predictions of the model. In particular, we previously did not spell out (a) the distinction between the enhancer sensing strategy and the mechanistic implementation of that strategy; and, importantly, (b) we did not discuss what the enhancer sensing strategy coupling temperature perception to H2O2 defense in C. elegans predicted (and did not predict) about whether a similar strategy would be expected to be used by C. elegans to deal with other temperature-dependent threats. We now address these issues in two new paragraphs in the discussion that read:

      “We show here that C. elegans uses an enhancer sensing strategy that couples H2O2 defense to the perception of high temperature. We expect this strategy’s output (the level of H2O2 defense) to provide the nematodes with an evolutionarily optimal strategy across ecologically relevant inputs (cultivation temperatures) (Kussell and Leibler, 2005; Maynard Smith, 1982; Wolf et al., 2005). This strategy is implemented at the organismic level through the division of labor between the AFD neurons, which sense and broadcast temperature information, and the intestine, which responds to that information by providing H2O2 defense (Figure 9D). Ascertaining that C. elegans relies on this enhancer sensing strategy does not depend on the temperature information broadcast by AFD exclusively regulating defense responses to temperature-dependent threats, because the regulation of defenses towards temperature-insensitive threats could affect defenses towards temperature-dependent threats; for example, suppressing defenses towards a temperature-insensitive threat would be beneficial if those defenses interfered with H2O2 defense or depleted energy resources contributing to H2O2 defense.

      As with any sensing strategy, enhancer sensing strategies are more likely to evolve when sensing is informative and responding is beneficial. In their natural habitat, C. elegans encounter many environmental chemicals that, like H2O2, are inherently more reactive at higher temperatures. It will be interesting to determine the extent to which C. elegans uses enhancer sensing strategies coupling temperature perception to the induction of defenses towards those chemicals, and whether those strategies rely on temperature perception and broadcasting by the AFD neurons. We expect that sensing strategies regulating defense towards those chemicals would be more likely to evolve when those chemicals are common, reactive, and cause consequential damage.”

      We note that our ability to predict survival to other toxicants, such as those that trigger specific gene-expression responses that are AFD-dependent but are unaffected between 20C and 25C (as proposed by the reviewer), is limited not only by our lack of knowledge about the specific mechanisms that protect worms from those toxicants, but also by our lack of knowledge about whether defense towards hydrogen peroxide interferes (or synergizes) with defense towards each of those toxicants and whether defense towards those toxicants interferes (or synergizes) with H2O2 defense. We therefore think that those experiments would be better addressed in future studies.

      The role of ins-39 remains somewhat speculative. Fig 4F shows that ins-39 mutants have a reduced induction of peroxide defense, but it seems that this could be the result of a ceiling effect. The authors' model predicts that overexpression of ins-39, particularly at 25C, should sensitize animals to peroxide damage, a prediction that should be tested directly. Further, the authors seem to assume that AFD is the relevant site of ins-39 function, but this needs to be better supported.

      As requested by all three reviewers, we determined whether ins-39 gene expression in AFD was sufficient to lower peroxide resistance by restoring ins-39(+) gene expression only in the AFD neurons using the AFD-specific gcy-8 promoter. As predicted by the reviewer, these worms were more sensitive to peroxide than wild-type worms. The findings from this experiment lead us to conclude that expression of ins-39 in the AFD neurons was sufficient to regulate the nematode’s peroxide resistance. The new section reads:

      “Next, we determined whether the INS-39 signal from AFD regulated the nematode’s peroxide resistance. The tm6467 null mutation in ins-39 deletes 520 bases, removing almost all the ins-39 coding sequence (Figure 5A), and inserts in that location 142-bases identical to an intervening sequence located between ins-39 and its adjacent gene. In nematodes grown and assayed at 20°C, ins-39(tm6467) increased peroxide resistance by 26% relative to wild-type controls (Figure 5F). To determine whether ins-39 gene expression in AFD was sufficient to lower peroxide resistance, we restored ins-39(+) expression only in the AFD neurons using the AFD-specific gcy-8 promoter (Inada et al., 2006; Yu et al., 1997) in ins-39(tm6467) mutants. Expression of ins-39(+) only in AFD eliminated the increase in peroxide resistance of ins-39(tm6467) mutants (Figure 5F). Notably, the peroxide resistance of the two independent transgenic lines was 28% and 30% lower than that of wild-type controls, likely due to overexpression of the gene beyond wild-type levels. We conclude that the gene dose-dependent expression of ins-39 in the AFD neurons regulated the nematode’s peroxide resistance.”

      The temperature-shift experiments in figure 5G (formerly 4F) indicated that the effect on peroxide resistance at 20C of growth at 25C and of the ins-39 mutation were non additive. We interpreted this epistatic interaction to be due to action in a common pathway. It is possible that while growth at 25C increases the subsequent peroxide resistance at 20C, it could limit the nematodes’ subsequent peroxide resistance at 20C (beyond those peroxide-resistance increasing effects) when in combination with another intervention, even if those interventions acted via parallel mechanisms—a ceiling effect, as proposed by the reviewer. We favor the alternative interpretation, that the mechanisms act sequentially, because of our findings that ins-39 gene expression within AFD was lower at 25C than at 20C, leading us to propose the sequential model in figure 5H (formerly 4G).

      Most of the daf-16 and skn-1 experiments are carried out in AFD-ablated animals, making the relevance of these findings for the 25C-dependent induction of peroxide defense somewhat unclear. As the authors show, AFD ablation causes much more extensive changes than transient 25C exposure, clearly seen in slope of the line in 3C. Further, unlike 25C exposure, AFD ablation is a chronic and non-physiological state. It would be useful for the authors to be cautious in their interpretation of these findings and to be clearer about how strongly they can connect them to the "enhancer sensing" phenomenon. Along these lines, the potentiation idea could be toned down a bit. Much of the data is consistent with parallel function for daf-16 (and skn-1) - for example, Fig 5C indicates additive effects of daf-16 and 25C exposure; 6C shows that AFD ablation still has a clear effect on peroxide sensitivity in the absence of both daf-16 and skn-1; and Fig S8a shows that much of the transcriptional response to AFD ablation (along PC1) is intact in daf-16 animals.

      We have made several adjustments in the text to address these concerns. As the reviewer noted, the experiments with skn-1 were performed only in AFD ablated worms. We have renamed the section heading to “SKN-1/NRF and DAF-16/FOXO collaborate to increase the nematodes’ peroxide resistance in response to AFD ablation” to make that clear.

      In contrast, the peroxide resistance experiments with daf-16 were done also in worms grown at 25C and then shifted to 20C during the peroxide resistance assay. The connection of daf-16 with the temperature dependent regulation of peroxide resistance was established in temperature shifts experiments in daf-16 single mutants (Figure 6C, formerly 5C) and in transgenic worms rescuing the daf-16 mutant only in the intestine (Figure 6F). In the revised text we make it clearer that the effect of the daf-16 mutation is bigger when the nematodes are shifted from 25C to 20C: “The daf-16(mu86) null mutation decreased peroxide resistance in nematodes grown at 25°C and assayed at 20°C by 35%, a greater extent than the 21% reduction in peroxide resistance induced by that mutation in nematodes grown and assayed at 20°C (Figure 6C).”

      As the reviewer noted, daf-16 and skn-1 have a role in peroxide resistance when the AFD neurons are not ablated (albeit a smaller one than when those neurons are ablated). We have made several changes and additions to the text to make that explicit. Most notably, the revised last paragraph of the SKN-1 section now reads: “We propose that when nematodes are cultured at 20°C, the AFD neurons promote signaling by the DAF-2/insulin/IGF1 receptor in target tissues, which subsequently lowers the nematode’s peroxide resistance by repressing transcriptional activation by SKN-1/NRF and DAF-16/FOXO. However, this repression is not complete, because both daf-16(mu86) and skn-1(RNAi) lowered peroxide resistance at 20°C when the AFD neurons were present. It is also likely that DAF-16 and SKN-1 are not the only factors that contribute to peroxide resistance in AFD-ablated nematodes at 20°C, because AFD ablation increased peroxide resistance in daf-16(mu86); skn-1(RNAi) nematodes, albeit to a lesser extent than in daf-16(+) or skn-1(+) backgrounds.”

      The potentiation idea was specific to the effects of DAF-16 on gene expression. As the reviewer noted, much of the transcriptional response to AFD ablation is intact (albeit reduced in magnitude) in AFD-ablated daf-16 mutants, leading to a shift in the PC1 score for the mutant. At the level of the expression of individual genes, we quantified those effects in Figure 8G (formerly 7D). When we did the RNAseq experiments we had expected that lack of daf-16 would eliminate either all the changes in gene expression induced by AFD ablation or eliminate those changes for a subset of genes. Instead, what we found was much more subtle, and unexpected: the size of the gene expression change induced by AFD ablation was reduced by the daf-16 mutation, and that reduction was systematic. Specifically, we found that the bigger the change in gene expression induced by AFD ablation, the bigger the effect of daf-16 in the AFD ablated animals (that is, potentiation), leading to a change in the slope in the regression line in Figure 8G. We revised the paper to ensure we only used the word potentiation in this context (gene expression), even though formally DAF-16 also potentiated the effects of AFD ablation (and temperature shift from 25C to 20C) on peroxide resistance.

      Reviewer #3 (Public Review):

      This paper offers novel mechanistic insights into how pre-exposure to warm temperature increases the resistance of C. elegans to peroxides, which are more toxic at warmer temperature. The temperature range tested in this study lies within the animal's living conditions and is much lower than that of heat shock. Therefore, this study expands our understanding of how past thermosensory experience shapes physiological fitness under chemical stress. The paper is technically sound with most experiments or analyses carried out rigorously, and therefore the conclusions are solid. However, it challenges our current understanding of the role of the C. elegans thermosensory system in coping with stress. The traditional view is that the AFD thermosensory neuron is activated upon sensing temperature rise, and that temperature sensation through AFD positively regulates systemic heat shock response and promotes longevity in C. elegans. Thus, it is quite unexpected that AFD ablation activates DAF-16 and improves peroxide resistance. It also appears counterintuitive that genes upregulated at 25 degrees overlap extensively with those upregulated by AFD ablation at 20 degrees. I feel that it is premature to coin the term "enhancer sensing" for such a phenomenon, as their work does not rule out the possibility that AFD ablation increases resistance to other stresses that are independent of temperature regarding their toxicity or magnitude of hazard. Additional work is necessary to clarify these issues.

      1. Whether the role of AFD in inhibiting peroxide resistance is related to AFD activity needs further clarification. AFD activity depends on the animal's thermosensory experience. As animals in this study are maintained at 20 degrees unless indicated specifically, the AFD displays activities starting around 17 degrees and peaks around 20 degrees. Under such condition, the AFD displays little or no activity to thermal stimuli around 15 degrees. It will be important to test whether cultivation of animals at 20 degrees improves peroxide resistance at 15 degrees, compared to 15 degrees-cultivation/15 degrees peroxide testing. The authors should also test whether AFD ablation further improves survival under peroxides at 15 degrees for animals grown at 20 degrees, whose AFD should show little or no activities at 15 degrees.

      The reviewer raises an interesting point about the relation between the mechanisms that determine AFD activity in response to temperature and those that enable AFD to regulate peroxide resistance. In the revised manuscript we tested whether known mechanisms enabling AFD to sense changes in temperature acutely (receptor guanylate cyclases GCY-8, GCY-18, and GCY-23) played a role in the temperature dependence of peroxide resistance. We found that they did, as detailed in our response to reviewer #1’s point 1.

      As noted by reviewer #2 in their point 1, and in our reply to that comment (and in a new discussion paragraph in the revised manuscript), the relationship between the known mechanisms the acutely regulate the activity of AFD in response to temperature and the mechanisms by which constant cultivation temperature regulates gene expression in AFD (and therefore the expression of peroxide resistance regulating signals like INS-39) is not well understood. Therefore, it is difficult to predict which temperatures will cause induction of peroxide defenses via AFD-dependent mechanisms, or via other mechanisms. While we agree with the reviewer that it will be interesting to characterize the extent to which other cultivation temperatures besides 25C lead to increased peroxide resistance at lower temperatures (including the proposed shifts from 20C to 15C), we think that those questions will be better addressed in future studies.

      2. The importance of the thermosensory function of AFD should be verified. In the current study, the tax-4 mutation was used to infer AFD activity, but tax-4 is expressed in sensory neurons other than AFD. In addition to AFD, AWC can sense temperature and it also expresses tax-4. Therefore, influence on AFD from other tax-4-expressing neurons cannot be excluded. On the other hand, ablation of AFD removes all AFD functions, including those that are constitutive and temperature-independent. Therefore, the authors should test the gcy-18 gcy-8 gcy-23 triple mutant, in which the AFD neurons are fully differentiated but completely insensitive to thermal stimuli. These three thermosensor genes are exclusively expressed in AFD. Compared to the tax-4 mutant that is broadly defective in multiple sensory modalities, this triple gcy mutant shows defects specifically in thermosensation. They should see whether results obtained from the AFD ablated animals could be reproduced by experiments using the gcy-18 gcy-8 gcy-23 triple mutant. The authors are also recommended to investigate ins-39 expression in AFD and profile gene expression patterns in the gcy-18 gcy-8 gcy-23 triple mutant.

      We thank the reviewer for this suggestion. We have performed the requested experiments, as detailed in our response to reviewer #1’s point 1. Briefly, we determined found that gcy-18 gcy-8 gcy-23 triple mutants increased peroxide resistance at 20C but not at 25C, and found that the respective gcy single mutants affected peroxide resistance at 20C. In light of these findings, we concluded that temperature perception by AFD via GCY-8, GCY-18, and GCY-23 enables C. elegans to lower their peroxide defenses at the lower cultivation temperature.

      3. The literature suggests that AFD promotes longevity likely in part through daf-16 (Chen at al., 2016) or independent of daf-16 (Lee & Kenyon, 2009). Whatever it is, various studies show that activation of AFD and daf-16 promote a normal lifespan at higher temperature, and AFD ablation shortens lifespan at either 20 or 25 degrees. Therefore, the finding that DAF-16-upregulated genes overlap extensively with those upregulated by AFD ablation is quite unexpected (Figure 5B). The authors should perform further gene ontology (GO) analysis to identify subsets of genes co-regulated by DAF-16 and AFD ablation, whether these genes are reported to be involved in longevity regulation, immunity, stress response, etc.

      We thank the reviewer for this interesting comment about the complex mechanisms by which AFD regulates longevity. We note that AFD also has additional temperature-dependent roles in lifespan regulation, as Murphy et al. 2003 found that RNAi of gcy-18 increased lifespan in wild-type worms at 20C but not at 25C. Therefore, AFD-specific interventions can also be lifespan extending at 20C.

      We performed WormCat analysis, which is similar to gene ontology, in Figure 8-figure supplement 2 (formerly Figure S8G), which we described in the results section: “we found that the extent to which AFD ablation affected the average expression of sets of genes with related functions (Higgins et al., 2022; Holdorf et al., 2020) was systematically lower in daf-16(mu86) mutants than in daf-16(+) nematodes (R_2 = 86%, slope = 0.67, _P < 0.0001, Figure 8—figure supplement 2).” Visual inspection of the plot and the very high coefficient of determination of 86% indicate that the size of the effect of AFD ablation on gene expression was systematically smaller when the contribution of DAF-16 to gene expression was removed.

      In the revised manuscript we also moved the three panels quantifying the expression of DAF-16 targets and daf-16-regulated genes from the supplement to the main figure. One of those panels (Figure 8F) shows that genes upregulated by daf-16(+) in daf-2 mutants were disproportionally affected by lack of daf-16 in AFD-ablated worms, as we described in the results section: “In addition, in AFD ablated nematodes, lack of daf-16 lowered the expression of genes upregulated in a daf-16-dependent manner in daf-2(-) mutants (Murphy et al., 2003) to a greater degree than in unablated nematodes (Figure 8F).”

      4. I feel that "enhancer sensing" is an overstatement, or at least a premature term that is not sufficiently supported without further investigations. The authors should explore whether AFD ablation or pre-exposure to warm temperature specifically enhances resistance to a stressor the toxicity of which is increased at higher temperature, but does not affect the resistance to other temperature-insensitive threats.

      We edited the manuscript and expanded the manuscript’s discussion to address these concerns as well as similar concerns from reviewer #1. For clarity, we repeat much of our response to reviewer #1’s point 2 here, with the last paragraph of this response specific to this reviewer’s comment.

      In the paper we show that in C. elegans the regulation of the induction of H2O2 defenses is coupled to the perception of temperature (an inherent enhancer of the reactivity of H2O2). To understand the significance of this finding in an evolutionary context, and to explain why such a regulatory system would evolve, we introduced in the discussion a new conceptual framework, “enhancer sensing,” and devoted a section of the discussion to demonstrating that the phenomenon that we observed could not be adequately explained by existing frameworks used to understand the evolutionary origins of the regulatory systems for defense responses.

      We now realize that we did not sufficiently and clearly explain the scope for the criterion for establishing a phenomenon represents enhancer sensing, leading to incorrect predictions by reviewer’s 1 and 3 about (a) whether what we observed in C. elegans is an instance of enhancer sensing (or more proof is needed) and (b) what the enhancer sensing model for the coupling of temperature perception to H2O2 defense would predict about how temperature and the AFD neurons would affect resilience to other chemicals. We regret failing to adequately explain the model’s scope and predictions and believe that we have now explicitly addressed the scope of what constitutes enhancer sensing and the predictions of the model. In particular, we previously did not spell out (a) the distinction between the enhancer sensing strategy and the mechanistic implementation of that strategy; and, importantly, (b) we did not discuss what the enhancer sensing strategy coupling temperature perception to H2O2 defense in C. elegans predicted (and did not predict) about whether a similar strategy would be expected to be used by C. elegans to deal with other temperature-dependent threats. We now address these issues in two new paragraphs in the discussion that read:

      “We show here that C. elegans uses an enhancer sensing strategy that couples H2O2 defense to the perception of high temperature. We expect this strategy’s output (the level of H2O2 defense) to provide the nematodes with an evolutionarily optimal strategy across ecologically relevant inputs (cultivation temperatures) (Kussell and Leibler, 2005; Maynard Smith, 1982; Wolf et al., 2005). This strategy is implemented at the organismic level through the division of labor between the AFD neurons, which sense and broadcast temperature information, and the intestine, which responds to that information by providing H2O2 defense (Figure 9D). Ascertaining that C. elegans relies on this enhancer sensing strategy does not depend on the temperature information broadcast by AFD exclusively regulating defense responses to temperature-dependent threats, because the regulation of defense towards temperature-insensitive threats could affect defenses towards temperature-dependent threats; for example, suppressing defenses towards a temperature-insensitive threat would be beneficial if those defenses interfered with H2O2 defense or depleted energy resources contributing to H2O2 defense.

      As with any sensing strategy, enhancer sensing strategies are more likely to evolve when sensing is informative and responding is beneficial. In their natural habitat, C. elegans encounter many environmental chemicals that, like H2O2, are inherently more reactive at higher temperatures. It will be interesting to determine the extent to which C. elegans uses enhancer sensing strategies coupling temperature perception to the induction of defenses towards those chemicals, and whether those strategies rely on temperature perception and broadcasting by the AFD neurons. We expect that sensing strategies regulating defense towards those chemicals would be more likely to evolve when those chemicals are common, reactive, and cause consequential damage.”

      We note, in the first of the new discussion paragraphs, that the existence of an enhancer sensing strategy is not contingent on whether the AFD neurons (that implement the temperature sensing and temperature-information broadcasting functions regulating peroxide defenses) also do not regulate defense responses to temperature-insensitive threats. For example, it may be beneficial to an animal facing high concentrations of environmental peroxides to suppress defense against a temperature-insensitive threat when those defenses are detrimental towards defense towards hydrogen peroxide. This could occur, for example, because there is an energetic trade off when mounting multiple defense responses, or because specific defenses towards temperature-insensitive threats interfere with peroxide defense. As we noted in our response to reviewer #1’s point 2, our ability to predict survival to threats other than H2O2 (including temperature-independent threats) is limited not only by our lack of knowledge about the specific mechanisms that protect worms from those threats, but also by our inability to predict the extent to which defenses towards different threats operate independently, constructively, or destructively with those that provide hydrogen peroxide defense. We therefore think that those experiments would be better addressed in future studies.

    1. Author Response:

      Reviewer #1 (Public Review):

      This manuscript investigates the gene regulatory mechanisms that are involved in the development and evolution of motor neurons, utilizing cross-species comparison of RNA-sequencing and ATAC-sequencing data from little skate, chick and mouse. The authors suggest that both conserved and divergent mechanisms contribute to motor neuron specification in each species. They also claim that more complex regulatory mechanisms have evolved in tetrapods to accommodate sophisticated motor behaviors. While this is strongly suggested by the authors' ATAC-seq data, some additional validation would be required to thoroughly support this claim.

      Strengths of the manuscript:

      1) The manuscript provides a valuable resource to the field by generating an assembly of the little skate genome, containing precise gene annotations that can now be utilized to perform gene expression and epigenetic analyses. The authors take advantage of this novel resource to identify novel gene expression programs and regulatory modules in little skate motor neurons.

      2) Cross-species RNA-seq and ATAC-seq data comparisons are combined in a powerful approach to identify novel mechanisms that control motor neuron development and evolution.

      Weaknesses:

      1) It is surprising that the analysis of RNA-seq datasets between mouse, chick, and little skate only identified 5 genes that are common between the 3 species, especially given the authors' previous work identifying highly conserved molecular programs between little skate and mouse motor neurons, including core transcription factors (Isl1, Hb9, Lhx3), Hox genes and cholinergic transmission genes. This raises some questions about the robustness of the sequencing data and whether the genes identified represent the full transcriptome of these motor neurons.

      To address reviewer #1’s questions, we have generated RNA sequencing data with mouse forelimb MNs and re-analyzed the RNA-seq data using only the homologous MN populations (Figure 3) among different species. As a result, many genes (1038 genes) are commonly expressed in MNs in different species, including many known MN marker genes. In the result section, we have added the following:

      “The evolution of genetic programs in MNs was investigated unbiasedly by comparing highly expressed genes in pec-MNs (percentile expression > 70) of little skate with the ones from MNs of mouse and chick, two well-studied tetrapod species. In order to compare gene expression with homologous cell types from each species, we performed RNA sequencing on forelimb MNs of mouse embryos at embryonic day 13.5 (e13.5) and wing level MNs of chick embryos at Hamburger-Hamilton (HH) stage 26–27…”

      We have also compared our re-analysis with previous results in Figure 2–figure supplement 1, shown above. Most of the fin MN genes (21/24) are highly expressed in pecMNs (percentile > 70), consistent with the previous in situ experiments. In the Results we have added the following:

      “Although the total number of DEGs are different from the previous data (592 vs. 135 genes in pec-MN DEGs), which might be caused by different statistical analysis with different reference genome, previous RNA-seq data based on de novo assembly and annotation using zebrafish was mostly recapitulated in our DEG analysis based on our new skate genome (21 out of 24 previous fin MN marker genes have the expression level ranked above 70th percentile in Pec-MNs; Figure 2‒figure supplement 1).”

      2) The authors suggest based on analysis of binding motifs in their ATAC-seq data that the greater number of putative binding sites in the mouse MNs allows for a higher complexity of regulation and specialization of putative motor pools. This could certainly be true in theory but needs to be further validated. The authors show FoxP1 as an example, which seems to be more heavily regulated in the mouse, but there is no evidence that FoxP1 expression profile is different between mouse and skate. It is suggested in Fig.5 that FoxP1 might be differentially regulated by SnaiI in mouse and skate but the expression of SnaiI in MNs in either species is not shown.

      We have added further discussion and data about differential expression of Foxp1 in mouse and little skate in Figure 5–figure supplement 16 and have discussed as follows:

      “Foxp1, the major limb/fin MN determinant appears to be differentially regulated in tetrapod and little skate. Although Foxp1 is expressed in and required for the specification of all limb MNs in tetrapods, Foxp1 is downregulated in Pea3 positive MN pools during maturation in mice (Catela et al., 2016; Dasen et al., 2008). In addition, preganglionic motor column neurons (PGC MNs) in the thoracic spinal cord of mouse and chick express half the level of Foxp1 expression than limb MNs. Although PGC neurons have not yet been identified in little skate, we tested the expression level of Foxp1 using a previously characterized tetrapod PGC marker, pSmad. We observed that Foxp1 is not expressed in MNs that express pSmad (Figure 5‒figure supplement 3). Since there is currently no known marker for PGC MNs in little skate, our conclusion should be taken with caution.”

      As for Snai1, in the revision we performed a motif enrichment analysis with an unbiased gene list where Snai1 didn’t show up. However, when we performed an RNA in situ hybridization experiment for Snai1 (Figure 5–figure supplement 3), we found that Snai1 is expressed in MNs of both mouse and little skate, but not in chick, which has been shown previously (Cheung et al., 2005). In order to examine the function of Snai1 in the regulation of Foxp1 expression, we ectopically expressed Snai1 in chick spinal cord by performing in ovo electroporation. However, we did not detect any changes in Foxp1. Instead we observed an increase in the number of neurons and abnormal MN exits from the spinal cord, which is the reminiscent of a previous observation (Zander et al., 2014). Although we did not detect any changes in Foxp1 expression, we cannot rule out the possibility that Snai1 regulates Foxp1 in mouse and little skate, which may require a gene knock out experiment. Because binding sites of Snai1 were not enriched in the new gene sets that we analyzed in the revision, we have not further discussed the Snai1 in the text.

      3) In their discussion section the authors state that they found both conserved and divergent molecular markers across multiple species but they do not validate the expression of novel markers in either category beyond RNA-seq, for example by in situ or antibody staining.

      We have added RNA in situ hybridization results in Figure 3C and Figure 3–figure supplement 1 and 2. Most of the genes were expressed in tissues in accordance with the sequencing results (6 out of 9 common MN genes; 4 out of 6 mouse specific genes; 5 out of 7 skate specific genes). Specifcally, Uchl1, Slc5a7, Alcam, and Serinc1 are expressed in MNs of all three species; Coch, Ppp1rc, Ctxn1, and Clmp are expressed in MNs of mouse but not in MNs of other species; Eya1, Etv5, Dnmbp, and Spint1 are expressed in MNs of skate but not in MNs of other species. In the result section, we have summarized the results as follow:

      “These results were validated by performing RNA in situ hybridization in tissue sections on a subset of species-specific genes …”

    1. Author Response

      Reviewer #1 (Public Review):

      Switching between epithelial and mesenchymal populations is an important stage for cancer growth and metastasis but difficult to study as the cells in this transition are rare. In this study Xu et al investigate changes the splicing regulator environment and changes in specific splice events by monitoring colon cancer cell populations that have epithelial and mesenchymal properties (so are potentially in transition) compared their epithelial partners. Using these potentially transitioning cells should reveal new insights into the causative changes occurring during EMT, a key life threatening step in colon cancer progression, and other cancers too.

      The authors were trying to establish if changes in the splicing environment occurred between epithelial and quasi-mesenchymal cells and to what extent this is important for colon cancer in establishing gene expression programs and cell behavior related to metastasis. The take home message is that these more "plastic" mesenchymal cells are expressing the mesenchymal transcription factor ZEB1 and reducing expression of the epithelial splicing factor ESRP1 (as well as some other RBPs). The FACS analysis showing that over-expression of ESRP1 alone can switch cell population ratios is very clear and indicates that reduction of this RBP plays a key role in making cells more metastatic. The lentiviral overexpression of CD44s and NUMB2/4 had very dramatic effects on increasing metastatic cellular properties. The clinical stratification analysis of splice isoforms and ZEB1/ESRP1 expression was very informative for understanding what is happening in actual tumors. The methods used and results from these studies are likely to have an impact on understanding the gene expression changes that take place during EMT.

      Strengths: The authors have used cell lines that model switching cells between epithelial and quasimesenchymal, based on expression of the markers Epcam (epithelial cell adhesion molecule expressed in epithelial cells) and CD44. The study utilizes shRNA-mediated knockdown and lentiviral overexpression of

      ESRP1 and splice isoforms, and monitors endogenous mRNA splice isoforms by RNAseq and qRTPCR, protein isoforms by western, cell surface expression of EpCAM and CD44 using FACS and metastatic potential using a mouse model, and patient gene expression data from TCGA.<br /> Weaknesses: Some of the data here might be novel for colon cancer, but the roles of these RNA binding proteins and ESRP1 target exons are better known in other cancers. Both CD44 and NUMB are known ESRP1 targets already in cells undergoing plasticity (e.g. PMID: 30692202). RBM47 is already known to be downregulated in EMT and quaking upregulated (PMID: 28680090; PMID: 27044866). There is also a lot of literature on ESRP1 expression in cancer and EMT. This should be better discussed.

      Out of the 3 references mentioned, 2 are already discussed in the submitted manuscript, while the third (Rokavec et al.) has now been added to the Discussion. As specified above, we never claimed to be the first to report on these RBPs and downstream AS targets. Unfortunately, it is not clear how the reviewer wants us to improve on these aspects (“should be better discussed” is rather vague) but we have now tried to extend the discussion relative to these issues in the revised manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors sought to establish canine tissue-specific organoids for propagation, storage and potential use in biomedical and translational medicine.

      Strengths - The project is ambitious in aim, seeking to raise 6 tissue-specific, stem cell-derived organoid lines.

      Weaknesses -

      1) While the manuscript refers to stem cell lines, no evidence of progressive organoid morphogenesis has been shown from undifferentiated single stem cells or stem cell clusters. This omission makes it difficult to distinguish true organoids from surviving pieces of parental tissue that the authors actually include within their cultures. The authors infer that high order tissue complexity can be generated within in short term 3D cultures. For example, their kidney organoids contained glomeruli, renal tubules and a Bowman's'capsule. These remarkable findings contrast with a previous study by Chen et al 2019 that showed kidney organoids had restricted morphogenic capacity, forming only simple epithelial dome-like structures (Chen et al 2019). Although the Chen study was cited, the major differences in study findings were not discussed. In the current study, no compelling evidence is provided for the integrated assembly of the glomerular microvascular capillary network, the glomerular epithelial capsule and complex tubular epithelial collecting ducts, during organoid growth.

      Thank you, clarification was made regarding the differences between Chen et al. 2019 and our organoids in our revision of the manuscript (Lines 445-447). The sentence regarding glomeruli, renal tubules, and Bowman's capsules was modified to specifically state the morphological resemblance our organoids have to these structures. We further clarified in the text that we are not stating these structures are complete (glomerular microvascular capillary) (Line 236), as our data are too preliminary to support this statement. However, in future publications we are excited to complete more in-depth characterization and investigation, along with functional assessment. To aid in characterization, three immunohistochemistry (IHC) antibodies were added to Figure 2.

      2) The potential of the organoids for freezing, storage and re-culture is unclear from the data presented.

      We did not present data regarding the re-culturing of organoids from this manuscript. However, we are working on additional publications which have already thawed and regrown multiple cell lines from this manuscript leading us to believe it is possible for all lines cultured in this manuscript. Further investigation into long term expression changes after thawing is warranted in future investigations.

      3) Organoid capacity for regenerative growth in xenograft models has not been tested.

      We did not investigate this in the current manuscript, from our understanding, manuscripts which describe a new organoid model typically do not utilize xenografts to confirm the regenerative growth capacity. The use of organoids in xenograft models is an exciting avenue to explore in the future.

      4) Figure 4 lacks appropriate positive and negative tissue controls.

      Please see Figure 5-figure supplement 1 for all negative control images. The tissues of origin in Figure 4 (now Figure 5) were used as positive controls for the antibody.

      5) Gene expression differences between tissues and organoids are inadequately explained.

      Our apologies for the lack of clarity of our original manuscript. Differences of gene expression between tissues and organoids were compared in the revised Results section of each organ. To better describe the differences seen between tissues and organoids, information was added to the discussion elaborating on cell types present and missing from our samples (Lines 315-321).

      6) Methodological detail is sparse. It is not clear how tissue biopsies are obtained, what size they are and how they are processed for organoid preparation.

      Our apologies. Information regarding biopsies was added in Experimental Procedures specifically in the Tissue collection section (Lines 509-510, 519-535). Additional details on protocols for organoid preparation and culturing were added to the Experimental Procedures and are cited in Gabriel et al. 2022 (Lines 535-548).

      7) The manuscript as a whole is poorly focussed and difficult to follow. The introduction is repetitive with only weak relevance to the main experiments.

      We appreciate the reviewer’s concern. To better focus this manuscript, we re-ordered the introduction to be more linear and improve the focus regarding the main experiments. We hope our revisions will satisfy the reviewer.

      Appraisal - The lack of morphogenesis and xenograft data undermines confidence that the authors have achieved their aims. The above concerns are also likely to hamper utility of the methods for the scientific community.

      We appreciate the listed concerns. These novel organoid models are not limited to applications pertaining to xenografts. Our aim was to develop novel organoid lines that we believe can be of use to a variety of fields including pharmacology, virology, and basic research. The testing of these organoids in xenograft models is outside the scope of the current manuscript.

      Reviewer #2 (Public Review):

      Zydryski et al. develop a comprehensive toolbox of organ-specific canine organoids. Building on previous work on kidney, urinary bladder, and liver organoids, they now report on lung, endometrium, and pancreatic organoids; all six organoid lines are derived from two canines. The authors attempt to benchmark these organoids via histological, transcriptomic, and immunofluorescence characterization to their cognate organs. These efforts are a welcome development for the organoid field, broaden the scope of use to studies with canine models, and seek to establish robust standards. The organ specific RNAseq dataset is also likely to be useful to other researchers working with the canine model.

      A key methodological advance would appear to be that the authors culture these organ-specific organoids using a common cell culture media. This is not the typical protocol in the organoid field; however, the authors do not provide enough information in the manuscript to evaluate if this is a good choice. Furthermore, it is likely that the authors were successful because they included additional tissue components in the co-culture for the organoids which might have provided the necessary tissue specific cues, but the methodological details to reproduce this and the technical evaluation of this approach are missing.

      This is an excellent point and details were added to the methods section to better explain the embedding process in our revision of the manuscript (Lines 519-532). Your hypothesis about the tissue-specific cues is very intriguing and something we should explore in the future. Previous publications have isolated ECM from the native tissue (Giobbe et al. 2019), this may be a similar mechanism as you stated.

      The authors also directly compare the transcriptional responses of the organoids with the organs, but this is a challenging enterprise given that the organoid models do not incorporate resident immune cells and typically are composed only of epithelial cells. This lack of an 'apples to apples' comparison might explain why in many cases the organoids and organs are highly divergent; however, it could also be that the common cell culture media did not lead to specific maturation of cell types.

      We agree, this manuscript aimed to derive epithelial organoids, and we acknowledge the lack of all cell types present in the tissues. The comparison was meant to identify similarities (epithelial cells) and the current limitations of the organoid model. We added to the Discussion, specifically the Insights into organ-specific genes section to further clarify this point (Lines 315-321).

      Reviewer #3 (Public Review):

      Zydrski et al. describe the generation and characterization of multiple adult tissues from canines. While canine derived organoids could potentially be advantageous over murine and human organoids, the novelty of generation and characterization is limited, as organoid systems are now being rapidly genetically editing using CRISPR technologies and modeled within immunocompetent environments. Certain points limit my enthusiasm.

      First, the authors do not support the use of serum (FBS) in their media and why they include the same growth and differentiation factors across all tissue types.

      We added a sentence to the Discussion (Canine organoids as biomedical models) to further clarify the reasoning behind the inclusion of the same growth factors for all tissue types

      “The use of the same media composition lends itself to future applications of co-culture or use in assembloid models where multiple organoid lines are combined and continued growth in a shared media is required”

      As this media is based on canine intestinal organoid media, the FBS was included in case of potential applications require the co-culture of intestinal organoids.

      Second, while bulk RNA sequencing data shows similarity per certain genes to the corresponding tissue, there is a lack of detailed characterization of what passage these organoids were harvested and how they change over time. Do they become more stem like and are they genetically stable?

      The passage number of samples when they were harvested are listed in Supplemental file 1. The question of being genetically stable is an excellent point regarding organoids. We have not examined that yet in these canine organoids; however, we can leverage previous publications regarding organoids and how they are genomically stable over time regarding chromosome number and base pair changes, we added these citations into the introduction (Line 48). However, this current manuscript focuses on the derivation and initial characterization; future work will focus on the re-growth, genetic stability, and functional assessment of canine organoid lines.

      Third, it would be important to demonstrate that these organoids can be genetically manipulated or be exposed to drugs and how they might be beneficial over murine and human organoids.

      The genetic editing of twelve organoid lines is outside the scope of this paper and we plan to include this element in future publications. We believe that the organoids can be useful for veterinary medicine as well as being an important model or human disease as canines typically better represent humans better than mice (Lines 28-34, 77-95).

      Fourth, the organoid complexity is not clear and cannot be ascertained from bulk RNA sequencing- for example, do kidney organoids recapitulate canonical markers at the protein level of proximal tubules, distal convoluted tubules, etc. Are different lung cells represented (AT1/AT2/club) and what is the composition of these cells? Why are these cells selected for?

      Thank you. We agree that bulk RNA sequencing has its limitations when it comes to heterogenous cell populations. This was meant to give a first insight into whether the organoid lines resemble their tissue of origin, the addition of single cell RNA-sequencing in the future is worth investigating.

      Fifth, as the authors note, methodically these canine organoids have been developed before from other tissues. For these reasons, my enthusiasm is diminished and unfortunately many of the necessary experiments for further consideration appear out of the scope of the study.

      Three of these organoid lines have previously been published in canines. However, the growth and characterization of three novel organoid lines is included in this manuscript, while typically a manuscript focuses on one novel organoid line. Furthermore, unique to this study is the multi-organ comparisons of expression across both tissues and organoids from the same animal, with a biological replicate being a related individual which is unique to this study. In human and murine field, the organoid media must be adjusted to each individual organ the stem cells are isolated from. We show that our media composition, which is similar to that which previously supported hepatic and intestinal canine organoids, can now support organoids from six different tissues, bringing a novel approach to the field. To our knowledge, this is the most comprehensive comparison across tissue types of canine organoids. Additionally we have not seen any literature of the comparison of six different organoid lines from the same individual, with a related biological replicate in any other species.

    1. Author Response

      Reviewer #1 (Public Review):

      This study examines whether the D2 receptor antagonist amisulpride and the mu-opioid receptor antagonist naltrexone bias model-based vs model-free behavior in a well-established two-step task of behavioral control. The authors find that amisulpride enhances model-based choices, which is further supported by computational modeling of the data, revealing an increase in the relative contribution of model-based control of behavior. Naltrexon on the other hand had no reliable effect on model-based behavior.

      Overall, this is a very nice study with many strengths, including the task and data analysis. A particular strength of the design is the combination of a between-subject drug administration protocol with two within-subject (baseline vs. drug) sessions. This reduces between-subject variability in baseline model-based vs model-free behavior and enhances the power to detect drug effects.

      The introduction could do a better job articulating the rationale for testing the effect of these two specific drugs. Currently, the rationale is that both transmitter systems targeted by these drugs are involved in drug addiction, which is characterized by an imbalance in model-based vs. habitual control of behavior. This appears somewhat indirect.

      Blood draws were used to determine serum levels for amisulpride and naltrexone but these data are not included as covariates in the analysis.

      We thank the reviewer for the high acclaim of our study, and for the constructive comments to improve it. We acknowledge that the introduction did not motivate the main research goal of the manuscript clearly enough. We have now extended this section and provided further insight into our reasoning behind the study design. Beyond the involvement of opioid and dopamine promoting drugs in addiction, there is abundant evidence from experimental studies showing comparable effects of manipulating receptors of both systems in model-free processes such as reinforcement, and habit formation. Based on this overlap one may predict that both neurotransmitter systems disrupt habit formation in a similar fashion, and that blocking their respective receptors will improve the ability to behave in a model-based manner. However, as we now elaborate in the manuscript, an argument against this could be that disrupting model-free processes might not be enough to promote model-based behaviour, as such behaviour relies heavily on cognitive control. It is therefore especially interesting to compare opioid antagonists, that do not enhance cognitive function, with a D2 antagonist at a dosage that has been shown to increase cognitive control as well as increase the desire to exert cognitive effort.

      This is expressed in the following paragraphs of the Introduction (p.2 §3 and p.3 §1):

      “Opiates, psychostimulants, and most other drugs of abuse increase the release of dopamine along the mesolimbic pathway (Chiara, 1999; Koob & Bloom, 1988), a circuit that plays a central role in reinforcement learning (Schultz, Dayan, & Montague, 1997). On top of this, the reinforcing properties of addictive drugs also depend on their ability to activate the μ opioid receptors (Becker, Grecksch, & Kraus, 2002; Benjamin, Grant, & Pohorecky, 1993; Le Merrer, Becker, Befort, & Kieffer, 2009). This suggests that both the dopamine and the opioid systems might be particularly relevant in model-free reinforcement learning processes that drive the formation of habitual behaviour. Studies in rodents show that activating receptors of both systems across the striatum increases cue-triggered wanting of rewards (Peciña & Berridge, 2013; Soares-Cunha et al., 2016). Conversely, inhibition of both D1-type and D2-type of dopamine receptors (referred to as D1 and D2 from here on) as well as opioid receptors reduces motivation to obtain or consume rewards (Laurent, Leung, Maidment, & Balleine, 2012; Peciña, 2008; Soares-Cunha et al., 2016). This data raises the hypothesis that the drift towards habitual control is enabled by dopamine and opioid receptors via a common neural pathway. Recent work in humans provides some evidence in this direction, whereby systemic administration of opioid and D2 dopamine receptor antagonists causes a comparable reduction of cue responsivity and reward impulsivity (Weber et al., 2016) and decreases the effort to obtain immediate primary rewards (Korb et al., 2020). This suggests that when allocating control between the model-based and model-free system, dopamine or opioid receptor antagonists might comparatively disrupt model-free behavioural strategies and increase model-based behaviour. Yet, no study in humans has directly investigated this. Furthermore, disrupting habit formation might not in itself lead to increased model-based control, without either increasing the perceived value of applying cognitive control or making it easier to do so.”

      We also mention the implications of this direct comparison of the two compounds in the Discussion (p.8 §1):

      “Our findings provide initial evidence for a divergent involvement of the dopamine and opioid neurotransmitter systems in the shift between habitual and goal-directed behaviour. The lack of effects of naltrexone on the model-based/model-free trade-off also provides some support for the notion that simply disrupting neurobiological systems that subserve habitual behaviour might not be enough to increase goal-directed behaviour in this task. An increase in the model-based/model-free weight following amisulpride administration advocates for dopamine playing a decisive role in flexibly applying cognitive control to facilitate model-based behavior and highlights the specific functional contribution of the D2 receptor subtype.”

      Reviewer #3 (Public Review):

      I think this is an interesting study on an important topic. I agree that there is not enough research to understand how the dopaminergic system interfaces with goal-directed planning, and I like the focus on specific types of dopamine receptors. It is interesting that they seem to find a specific effect on just the dopamine antagonist. I also appreciate the clarity with which the authors describe this field of research and their results. However, I also feel that there are several concerns with this paper, both in terms of framing and in terms of the experimental design and analysis. For completeness, I must note that I am not a dopamine expert.

      I felt that the introduction of the paper did not sufficiently motivate the focus on the comparison between neurotransmitters systems, and (for the dopaminergic system) the distinction between D1/D2 receptors. Why is the mapping between stability/flexibility and D1/D2 receptors important? How does this relate to model-based control? Why do the authors predict that model-based control would increase when D2 receptors are blocked? If the hypothesis is about contrasting the contribution of D1 and D2 receptors to goal-directed control, why did the authors not use antagonists directly targeting these two systems?

      In addition, the predictions that are more explicit, for example, that blocking D2 receptors increases MB control by stabilizing goal-relevant information, are fairly specific. However, the current version of the two-step task is not amenable to testing such a specific hypothesis, because it doesn't allow us to measure the specific components of planning (e.g., maintaining goals, the representation of the structure, prospective reasoning). Moreover, MB control in this version of the two-step task is marked by flexibility, because it requires the agent to be sensitive to switching starting states.

      The predictions for the opioid system are also lacking. Why are the authors targeting this system? Why are they comparing the effects of the D2 antagonist with the opioid agonist? Why do the authors predict that amisulpride should have a stronger effect than naltrexone? In my opinion, these predictions were not sufficiently laid out, which made it difficult to appreciate the authors' motivation to run the study.

      We thank the reviewer for their critical take on the manuscript and for clearly pointing out the weaknesses in argumentation. In particular, we appreciate the reviewer’s comment on the lack of clarity in describing why the comparison of dopamine and opioid antagonists’ effects on MB/MF behaviour might be particularly interesting and why we focused on D2 and not D1 receptors. We now extended the introduction section to clarify our rationale for comparing these two compounds (p.2-3). In short, apart from the fact that both systems are implicated in addiction, there is also abundant experimental evidence from human and non-human animal studies that the two systems are involved in processes related to forming habitual responses to primary and secondary rewards. This suggests that blocking receptors of either system might comparatively affect the MB/MF trade-off by impairing model-free processes. We therefore proceeded to compare opioid and dopamine antagonists.

      As we note, using D1 antagonists would likely be detrimental to cognitive control related processes, and therefore more likely to decrease model-based performance. We therefore chose to compare opioid antagonists to D2 receptor antagonists. Another important reason for comparing the effects of opioid and D2 dopamine antagonists is the reasoning that it is not clear whether blocking model-free processes is in itself enough to promote model-based behaviour, without boosting cognitive control related processes. Given the recent evidence for D2 antagonists increasing cognitive effort (Westbrook et al., 2020) and the proposed role of prefrontal D2 receptors in destabilising prefrontal representations (according to the dual state theory of prefrontal dopamine function proposed by Durstewitz & Seamans, 2008)) we reasoned that D2 receptor blockade might also boost the ability (or willingness) to keep the mapping between spaceships and planets online while making choices.

      We incorporated these arguments in the revised Introduction (p.2-3):

      “Opiates, psychostimulants, and most other drugs of abuse increase the release of dopamine along the mesolimbic pathway (Chiara, 1999; Koob & Bloom, 1988), a circuit that plays a central role in reinforcement learning (Schultz et al., 1997). On top of this, the reinforcing properties of addictive drugs also depend on their ability to activate the μ opioid receptors (Becker et al., 2002; Benjamin et al., 1993; Le Merrer et al., 2009). This suggests that both the dopamine and the opioid systems might be particularly relevant in model-free reinforcement learning processes that drive the formation of habitual behaviour. Studies in rodents show that activating receptors of both systems across the striatum increases cue-triggered wanting of rewards (Peciña & Berridge, 2013; Soares-Cunha et al., 2016). Conversely, inhibition of both D1-type and D2-type of dopamine receptors (referred to as D1 and D2 from here on) as well as opioid receptors reduces motivation to obtain or consume rewards (Laurent et al., 2012; Peciña, 2008; Soares-Cunha et al., 2016). This data raises the hypothesis that the drift towards habitual control is enabled by dopamine and opioid receptors via a common neural pathway. Recent work in humans provides some evidence in this direction, whereby systemic administration of opioid and D2 dopamine receptor antagonists causes a comparable reduction of cue responsivity and reward impulsivity (Weber et al., 2016) and decreases the effort to obtain immediate primary rewards (Korb et al., 2020). This suggests that when allocating control between the model-based and model-free system, dopamine or opioid receptor antagonists might comparatively disrupt model-free behavioural strategies and increase model-based behaviour. Yet, no study in humans has directly investigated this. Furthermore, disrupting habit formation might not in itself lead to increased model-based control, without either increasing the perceived value of applying cognitive control or making it easier to do so. Crucially, there are important differences in how each of the two neurochemical systems relate to cognitive control that is pivotal for model-based behaviour. Across a wide range of studies using various dosing schemes, opioid receptor antagonists did not have an effect on tasks that require cognitive control, such as working memory (Del Campo, McMurray, Besser, & Grossman, 1992; File & Silverstone, 1981; Volavka, Dornbush, Mallya, & Cho, 1979), sustained attention(Zacny, Coalson, Lichtor, Yajnik, & Thapar, 1994), or mathematical problem-solving (Del Campo et al., 1992) (see (van Steenbergen, Eikemo, & Leknes, 2019) for a review). Dopaminergic circuits, on the other hand, play a central role in higher cognitive functions and goal-directed behaviour (Brozoski, Brown, Rosvold, & Goldman, 1979). In particular, D1 dopamine receptors in the prefrontal cortex enable maintenance of goal-relevant information and working memory(Goldman-Rakic, 1997; Sawaguchi & Goldman-Rakic, 1991; van Schouwenburg, Aarts, & Cools, 2010; Williams & Goldman-Rakic, 1995), while the D2 dopamine receptor activity disrupts prefrontal representations(Durstewitz & Seamans, 2008). In support of this, decreased working memory performance was observed after blocking prefrontal D1, but not prefrontal D2 receptors (Arnsten, 2011; Sawaguchi & Goldman-Rakic, 1991; Seamans & Yang, 2004). In humans, systemic administration of D2 antagonism increased the ability to maintain and manipulate working memory representations (Dodds et al., 2009; Frank & O’Reilly, 2006) and increased the value of applying cognitive effort (Westbrook et al., 2020). This data suggests that blocking D2 receptors, in contrast to blocking opioid receptors, could further facilitate model-based behaviour through enabling or encouraging flexible use of cognitive control.”

      Another important point that the reviewer stresses is that the two-step task we use does not allow us to draw any conclusions through which mechanisms amisulpride increases model-based behaviour. Although we base our hypothesis that D2 might promote model-based behaviour (on top of disrupting habit formation) on previous work showing D2 blockade increasing cognitive effort and the ability to manipulate working memory representations, we completely agree that our setup does not give any definite answers about which of these cognitive processes mediated the increase in model-based weights. In the discussion we try to interpret our findings in the context of the dual-state hypothesis framework and within the framework of striatal control of adaptive behaviour (p.8 §3-4), whereby we centre our argumentation around dopaminergic circuits that subserve one or the other mechanism.

      We agree with the reviewer that the task requires a high degree of flexible planning and that the dual-state theory might not be enough to account for our effects. We mention this in the Discussion (p. 8 §3):

      “The effects of D2 antagonism on model-based/model-free behaviour in our study can be interpreted within this [dual-state] framework to result from increased ability to maintain prefrontal representation of the mapping between the spaceships and the planets online. However, this is difficult to reconcile with the fact that model-based behaviour in dynamic learning paradigms, such as the one used here, also requires flexible updating of action values.”

      We also elaborate on the general limitations of drawing inference about the underlying cognitive/computational mechanisms in the Discussion (p. 14 §2):

      “Importantly, it should also be acknowledged that the behavioural setup in our study does not allow us to draw definite conclusions about the mechanisms that mediate amisulpride’s effects on model-based or model-free behaviour. For example, it is not clear whether amisulpride increases the perceived benefit of applying cognitive control, or whether it increases the participant’s ability to do so through various possible complementary processes, such as goal maintenance or planning abilities. Future studies should further elucidate the mechanistic contributions of dopamine receptors to the distinct coding and utilisation of task relevant representations (Langdon, Sharpe, Schoenbaum, & Niv, 2018; Stalnaker et al., 2019).”

      Related to this, I felt that the introduction was a bit too quiet on the genetic markers. Their discussion in the results was a bit surprising, and it wasn't quite clear why the authors decided to investigate these interaction effects.

      We appreciate this comment as we were quite uncertain ourselves on how much weight to give to those data. Previous research had indeed shown profound variability in MB/MF behaviour across genotypes related to baseline dopamine function. The main purpose of the genetic analysis was to control for potential baseline differences and to explore the drug genotype interactions. However, including the serum data as a covariate in analyses, as suggested by the other reviewers, made most results relating to the genetic analysis disappear, even when using less conservative priors that likely understate the variance of posterior distributions of group effects. We have therefore opted to keep coverage of the genetic data to a minimum, but still report the results and make the data available online for future studies.

      I found some of the core results confusing. Most importantly, why does amisulpride make people less like to stay after a reward when the first-stage state is the same? When first-stage states repeat, both an MB agent and an MF agent will be more likely to stay after a reward. To me, this kind of behavior doesn't seem particularly model-based. Why does this behavior occur under amisulpride? I was surprised that the authors did not really address it.

      We agree that these results have been somewhat difficult to reconcile. However, adding amisulpride serum levels to our analyses now allow us to get a better understanding. It seems that across both serum groups model-based behaviour was increased, however, only in the high serum group did we additionally observe increased exploration. We also note that increased exploration was related to a reduced effect of previous points in the first same state trials, whereas the interaction term (effect of previous points in diff vs. same state trials) was more strongly associated with the model-based weight. In the manuscript this is described in the results section and in the discussion.

      The following text is included in the Results (p.6):

      “We first observed that the more model-based choices the participants made, the more money they earned (r = 0.65, 95% CI [0.53, 0.76]). This serves as a validity check of the task, which was designed to make cognitive control pay off (literally)45. We then looked at how the model parameters relate to the random slopes from the behavioural analysis of staying behaviour and found that the participant-level (random effect) slope for the effect of previous points on staying behaviour in different vs. same first state trials was most strongly related to ω (d = 0.493, P < 10e-3) and negatively related to the inverse temperature parameter η (d = -0.328, P < 10e-3), and the slope for trials with same first states was mostly related to η (d = 0.822, P < 10e-3), and less so to ω (d = 0.235, P < 10e-3).”

      The following text is included in the Discussion (p.8 §2):

      “Interestingly, amisulpride also increased choice stochasticity parametrised by the softmax inverse temperature parameter. In a paradigm with two choice options, it cannot be definitively determined whether this indicates higher decision-noise or increased exploration of alternative choices. We can however speculate that increased decision noise would lead to overall detrimental effects on learning in both trial types with same and different consecutive first stage states, which we do not observe in our data. The effect on the choice stochasticity parameter was only present in participants with a higher effective dose75, suggesting that the effect was more likely to be post-synaptic. Similarly, in the same effective dose group, we found some evidence that amisulpride reduces response stickiness indicating increased switching between actions. This is well in line with a prominent model of the cortico-striatal circuitry implicating post-synaptic D2 receptors in exploration/exploitation65 and supported by empirical data. In animal studies, activation of D2 receptors was shown to lead to choice perseverance and more deterministic behaviour, whereas D2 receptor inhibition increases the probability of performing competing actions and increases randomness in action selection76. In humans, a recent neurochemical imaging study showed that D2 receptor availability in the striatum correlated with choice uncertainty parameters across both reinforcement learning and active inference computational modelling frameworks77. Increased choice uncertainty was also observed in a social and non-social learning tasks in a study using 800 mg of sulpiride, a dose that is known to exert post-synaptic effects54,78. We note, however, that the evidence for the difference in exploration between the low and high serum groups was not robust (p=0.066). Furthermore, it has been suggested that increased striatal dopamine is also related to tendency for stochastic, undirected exploration79,80, arising due to overall uncertainty across available options79 or through increasing the opportunity cost of choosing the wrong option68,71. This suggests that the same biological signature that leads to increased cognitive effort expenditure also promotes choice exploration. In line with this, both prior studies that investigated the effect of increasing dopamine availability with L-DOPA on model-based/model-free behaviour observed increase choice exploration as well as increased model-based behaviour (although in one it was only present in individuals with a higher working memory capacity)55,58.”

      With regards to the design, it is unfortunate that the order of drug administration is not counterbalanced. As far as I understand, model-based control is always measured without a drug in the first session, and then with the drug (or placebo) in the second. The change between sessions is then tested for all three conditions. Of course, it is possible that the increase in model-based control in the amisulpride condition is only driven by the drug. However, given the lack of counterbalancing, it's also possible that amisulpride increases model-based control only after the experience with the task. That is, if the authors had counterbalanced the drug effect, they may have found that amisulpride had a different effect if it was administered in the first session. That would have changed their interpretation quite a bit! As it stands, they are unable to verify their (admittedly simpler) hypothesis that there is only a main effect.

      We thank the reviewer for this comment. Indeed, a full within-subject design would have been statistically more powerful and would have enabled us to exclude the possibility that amisulpride’s effect on model-based behaviour is indirect. We have now included the following paragraph in the discussion that aims to highlight the limitation of not counterbalancing the drug administration (p.10):

      “One of the strengths of our design is a baseline measure, and the fact that the participants were all introduced to the task under no administration, thus avoiding potential effects of the treatment on task training. Although this design allowed to reduce between-subjects variability, we cannot completely exclude order effects. Although unlikely, it is possible that the effects of the treatment that we observe come indirectly from the effects of the two drugs on either skill transfer from the previous session, or simply on the effect of the drugs on the part of the experiment that preceded the task. For instance, participants under amisulpride could be less tired from other tasks and therefore more willing to exert effort in the task presented here. Speaking against this is the observation that we found no differences in mood between amisulpride and placebo regardless of low or high serum levels.”

    1. Author Response

      Reviewer #1 (Public Review):

      This study presents a series of experiments that investigate maternal control over egg size in honey bees (Apis mellifera). Honey bees are social insects in which a single reproductive female (the queen) lays all the eggs in the colony. The first set of experiments presented here explore how queens change their egg size in response to changes in colony size. Specifically, they show that queens have relatively larger eggs in smaller colonies, and that egg size changes when queens are transplanted into colonies of a different size (i.e. confirming that egg size is a plastic trait in honey bee queens). The second set of experiments investigates candidate genes involved in egg size determination. Specifically, it shows that Rho1 plays a role in determining egg size in honey bee queens.

      In principle, we agree with this summary, although we find the experimental demonstration that perceived colony size affects egg size (first set of experiments) and the overall proteomic comparison of ovaries that produce small and large eggs (second set of experiments that indicate the upregulation of metabolism, protein transport, cytoskeleton organization, and a few other processes in large egg-producing ovaries) also important.

      A strength of the study is that it combines both manipulative field (apiary) experiments and molecular studies, and therefore attempts to consider broadly the mechanisms of plasticity in egg size. The link between these two types of dataset in the manuscript, however, is not strong. While the two parts are related, the molecular experiments do not follow from the conclusions of the field experiments but rather run in parallel (both using the same initial treatments of queens from large v small colonies).

      We would welcome suggestions on how to further strengthen the integration between the field experiments and our molecular studies. We sought to explore the molecular basis of the observed plasticity in reproductive behavior and thus focused on samples from the first set of experiments for our proteome comparisons, realizing that every additional field experiment could have entail a similar molecular follow-up. We attempted to bring molecular studies and field experiments back together with the RNAi-mediated knock-down of Rho1 in queens that produce eggs in differently-sized colonies under realistic apicultural conditions. There may be better, additional opportunities for a closer integration of molecular and field experiments, but we could not conceive of them.

      Another strength of the study is the focus on social cues for egg size control in a social insect. Particularly interesting is data showing that queens suddenly exposed to the cues of a larger colony (even where egg-laying opportunities did not actually increase) will decrease their egg size, in the same way as queens genuinely transplanted to larger colonies. That honey bee queens can control their egg sizes in response to cues in the colony is not unexpected, given that queens are known to vary egg size based on the cell type they are laying into (queen, drone or worker cell). Nevertheless, it is interesting to show that worker egg sizes over time are also mediated by social cues.

      We thank the reviewer for this positive assessment and want to highlight that this experiment not only controls for egg laying opportunities, but also for potentially greater resource availability in larger colonies. These results are therefore important for the key argument that egg size is actively regulated by honey bee queens.

      A weakness of the study is that the consequence of egg size on egg development and survival in honey bees is not made clear. The assumption is that larger egg size compensates for smaller colonies in some way. Do smaller eggs (i.e. those laid in large colonies) fare worse in smaller colonies than they do in large colonies? Showing that the variation in egg size is biologically relevant to fitness is an important piece of the puzzle.

      We agree that the consequences of egg size variation are important to address beyond our previously published data set and the benefits demonstrated in other contexts by other authors. However, to comprehensively resolve the consequences requires considerable additional experiments that exceed the scope of our current study, which is primarily focused on the causes of the queens’ reproductive plasticity.

      Also, the relationship between egg number and egg size in honey bees remains rather murky. Does egg size depend at least in part on daily egg laying rate (which is sure to be greater in larger colonies)? The study makes an effort to explore this by preventing queens from laying for two weeks and then comparing their egg size when they resume to those that did not have a pause in laying. Although egg size did not vary between the groups in this case, it is unclear whether the same effect would be seen if queens had simply been restricted from laying at such high rates (e.g. if available empty brood cells had been reduced rather than removed entirely).

      We agree that the relation between egg number and egg size is complicated. We have added more data that show that egg laying rates can be higher in larger colonies than in smaller colonies. We also report now that the egg size is negatively correlated to egg number, although not in all instances, which partially supports (and partially contradicts) our previous findings (Amiri et al. 2020). We have modified the discussion of our results to account for the additional results and point out the limitation of the experiment with caged queens. It is important to realize though that the queens were caged on comb and not restricted in typical, small queen cages that are used for queen transport. It is not clear whether this treatment resulted in a downregulation of the reproductive efforts and/or the resorption of eggs.

      Overall this study makes new contributions to our understanding of maternal control over egg size in honey bees. It provides stepping stones for further investigation of the molecular basis for egg size plasticity in insects.

      We agree that we could not resolve everything in this study and that more investigations are needed.

      Reviewer #2 (Public Review):

      This paper builds on recent work showing that honeybee queens can change the size of the eggs they lay over the course of their life. Here the authors identified an environmental condition that reversibly causes queens to change their egg sizes: namely, being in a relatively small or large colony context. Recently published work demonstrated the existence of this egg size plasticity, but it was completely unknown what signaled to the queen. In a series of simple and elegant experiments they confirmed the existence of this egg size plasticity, and narrowed down the set of environmental inputs to the queen that could be responsible for signaling the change in the environment. They also began the work of identifying genes and proteins that might be involved in controlling egg size. They did a comparative proteomic analysis between small-egg-laying ovaries and large-egg-laying ovaries, and then selected one candidate gene (Rho1). They showed that it is expressed during oogenesis, and that when it is knocked down, eggs get smaller.

      This is a good summary, although we think that it is fair to add that the expression of Rho1 is specific to the egg growth stage, and that we found an almost perfect correlation of Rho1 mRNA levels and egg size in two separate experiments (in addition the difference between large and small egg-producing ovaries at the protein level).

      The experiments on honeybee colonies are well-designed, and they provide fairly strong evidence that the queens are reversibly changing egg size and that it is (at least some component of) their perception of colony size that is the signal. One minor but unavoidable weakness is that experiments on honeybees are necessarily done with small sample sizes. The authors were clear about this, however, and it was very effective that they showed all individual data points. Alongside the previous work on which this paper builds, I found their core results to be rather convincing and important.

      We thank the reviewer for this positive evaluation.

      I found the parts of the paper on oogenesis to be useful, but overall less informative in answering the questions that the authors set out for those sections. On balance, I think the best way to interpret the oogenesis results is as "suggestive and exploratory". For instance, the experiment aimed at understanding the relationship between egg-laying rate and egg size does not include a direct measurement of egg-laying rate, but instead puts queens in a place with no suitable oviposition sites. The proteomic analysis was fine, but since they were using whole ovaries, with tissue pooled across all stages of oogenesis including mature oocytes, I would be cautious in interpreting the results to mean that they had identified proteins involved in making larger eggs. These proteins might just as easily be the proteins that are put into larger eggs. In fact, for the one candidate gene that is examined, its transcripts seem as though they are predominantly in the oocyte cell itself rather than in the supporting cells that actually control the egg size (although it is hard to tell from the micrographs without a label for cell interfaces).

      We have added data on the number of eggs produced in the first experiment, which actually show a negative correlation between egg size and egg number. In addition, we have cautioned our wording about the conclusions that can be drawn from the oviposition restriction experiment. Concerning the expression and role of Rho1, we apologize for the lack of a cell membrane marker. However, we share the reviewer’s interpretation that the mRNA is located in the oocyte. While we also agree that egg loading from the nurse cells is important, transport of vitellogenin from the follicle cells may also be quite significant for egg size (Wu et al. 2021 – doi:10.3389/fcell.2020.593613 and Fleig 1995 - doi:10.1016/0020-7322(95)98841-Z), a process that could be controlled by Rho1 in the documented location. We have added to the discussion to clarify this point.

      On that note, with the caveat that the sample sizes are quite small, I agree that there is some evidence that Rho1 is involved in honeybee oogenesis. If this was the only gene they knocked down, and given that it results in a small size change with such a small sample size, it strikes me as a bit of a stretch to say that these results are evidence that Rho1 plays an important role in egg size determination. It is essential to know if this is a generic result of inhibiting cytoskeletal function or a specific function of Rho1. That is beyond the scope of this study, but until those experiments are done, it is hard to know how to interpret these results. For context, in Drosophila, there are lots and lots of genes such that if you knock them down, you get a smaller or differently shaped egg, including genes involved in planar polarity, cytoskeleton, basement membrane, protrusion/motility, septate junctions, intercellular signaling and their signal transduction components, muscle functions, insect hormones, vitellogenesis, etc. This is helpful, perhaps, for thinking about how to interpret the knockdown of just one gene.

      We thank the reviewer for this perspective and have consequently cautioned our wording. The role of Rho1 in regulating the cytoskeletal function has been established in other organisms, but we do not have the tools to study the corresponding pathways and establish causality in honey bees. We have added to the discussion to alert the reader to the point that additional experiments are necessary.

      Overall, I found the results to be technically sound, and there are several clever manipulations on honeybee colonies that will doubtless be repeated and elaborated in the future to great effect. The core result-that queens can change the size of their eggs quickly and reversibly, in response to some perceived signal-was honestly pretty astonishing to me, and it reveals that there are non-nutritive plastic mechanisms in insect oogenesis that we had no idea existed. I look forward to follow-up studies with interest.

      We thank the reviewer for the overall evaluation and encouragement to continue our research.

    1. Author Response

      Reviewer #2 (Public Review):

      In this manuscript, the authors performed single-cell RNA sequencing (scRNA-seq) analysis on bone marrow CD34+ cells from young and old healthy donors to understand the age-dependent cellular and molecular alterations during human hematopoiesis. Using a logistic regression classifier trained on young healthy donors, they identified cell-type composition changes in old donors, including an expansion of hematopoietic stem cells (HSCs) and a reduction of committed lymphoid and myeloid lineages. They also identified cell-type-specific molecular alterations between young and old donors and age-associated changes in differentiation trajectories and gene regulatory networks (GRNs). Furthermore, by comparing the single-cell atlas of normal hematopoiesis with that of myelodysplastic syndrome (MDS), they characterized cellular and molecular perturbations affecting normal hematopoiesis in MDS.

      The present manuscript provides a valuable single-cell transcriptomic resource to understand normal hematopoiesis in humans and the age-dependent cellular and molecular alterations. However, their main claims are not well supported by the data presented. All results were based on computational predictions, not experimentally validated.

      Major points:

      1) The authors constructed a regularized logistic regression trained on young donors with manually annotated cell types and predicted cell type labels of cells from old and MDS samples. As the manual annotation of cell types was implicitly assumed as ground truth in this manuscript, I'm wondering whether the predicted cell types in old and MDS samples are consistent with the manual annotation. They should apply the same strategy used in young samples for manual annotation to old and MDS samples, and evaluate how accurate their classifier is.

      We performed manual annotation for each MDS sample independently, and for the 3 healthy elderly donors integrated dataset. To do so, we performed unsupervised clustering with Seurat and annotated the clusters using the same set of canonical marker genes that we used for the young data. We then analyzed the correspondences between the annotated clusters and the predictions by GLMnet. Results are shown on Figure 1a. We observe that the biggest disagreements between methods occur between adjacent identities, such as HSC and LMPP, GMP and GMP with more prominent granulocytes profile, or MEP, early and late erythroid. When we explore these disagreements along the erythroid branch, we see that they particularly occur close to the border between subpopulations (Figure 1b). This is consistent with the continuous nature of the differentiation and the difficulty to establish boundaries between cell compartments. However, we observe that miss-labeling between different hematopoietic lineages is rare.

      In addition, unsupervised clustering was not always able to directly separate the data in the expected subpopulations. We can see different clusters containing the same cell types (e.g. LMPP1, LMPP2), as well as individual clusters containing cells with different identities (e.g. pDC and monocyte progenitors). This is usually due to sources of variability different to cell identity present in the data Additional, supervised finetuning by local sub clustering and merging would be needed to correct for this. On the contrary, we believe that our GLMnet-based method focusses on gene expression related to identity, resulting in a classification that is better suited for our purpose.

      Figure 1 Comparison between GLMnet predictions and manually annotated clusters A) Heatmaps showing percentages of cells in manually annotated clusters (columns) that have been assigned to each of the cell identities predicted by our GLMnet classification method (rows). The analysis was performed independently for the elderly integrated dataset and for every MDS sample. B) UMAP plots showing disagreements in classification between adjacent cell compartments in the erythroid branch. Cells from one erythroid cluster per patient are colored by the identity assigned by the GLMnet classifier. Cells in gray are not in the highlighted cluster, nor labeled as MEP, erythroid early or erythroid late by our classifier.

      2) The cell-type composition changes in Figures 1 and 4 were descriptively presented without providing the statistical significance of the changes. In addition, the age-dependent cell-type composition changes should be validated by flow cytometry.

      We thank the reviewer for the comment. Significance of the changes is included in Supplementary File 3. In addition, we included the percentage of several cell types we validated by flow cytometry, namely HSCs, GMPs and MEPs, in young and elderly healthy individuals in the manuscript, as Figure 1-figure supplement 3. Similarly to what we detected in our bioinformatic analyses, flow cytometry data demonstrated a significant increase in the percentage of HSCs, as well as an increasing trend in MEPs and a slight decrease in the percentage of GMPs in elderly individuals, corroborating our previous results.

      3) In Figure 2, the authors used two different pseudo-time inference methods, STREAM, and Palantir. It is not clear why they used two different methods for trajectory inference. Do they provide the same differentiation trajectories? How robust are the results of trajectory inference algorithms? It seems to be inconsistent that the pseudotime inferred by STREAM was not used for downstream analysis and the new pseudotime was recalculated by using Palantir.

      We thank the reviewer for the comment. The reason behind using two different methods to perform similar analyses, is that each of them provides specific outputs that can be used to perform a more robust and comprehensive analysis. STREAM allows to unravel the differentiation trajectories in a single cell dataset with an unsupervised approach. Also the visualization provided by STREAM (Figure 2C and 2D) allows for a simple interpretation of the results to the reader. On the other hand, Palantir provides a more robust analysis to dissect how gene expression dynamics interact and change with differentiation trajectories. For this reason, we decided to use this second method to investigate how specific genes were altered in the monocytic compartment.

      As a resource article, the showcase of different methods can be valuable as it provides examples on how each tool can be used to obtain specific results, which can help any reader to decide which might be the best tool for their specific case.

      Just to confirm that pseudotime results are similar, we perform a correlation analysis with the pseudotime values obtained from each method. We observed a correlation coefficient of 0.78 (p.val < 2.2e-16) confirming the similarity among both tools.

      Figure 2. Correlation analysis of pseudotime values obtained with STREAM and PALANTIR.

      4) In Figure 2D, some HSCs seem to be committed to the erythroid lineage. The authors should carefully examine whether these HSCs are genuinely HSCS, not early erythroid progenitors.

      We thank the reviewer for the comment. We have performed a deep analysis regarding the classification of HSCs (See Figure 3). Our analyses reveal that none of the cells classified as HSCs express early erythroid progenitor markers. We have also used STREAM to show the expression of these markers along the obtained trajectory and observed that erythroid markers show expression in the erythroid trajectory but not in the HSC compartment (Figure 4).

      Figure 3 Expression of marker genes in the HSC compartment. Dot plot depicting the normalized scaled expression of canonical marker genes by HSC of the 5 young and 3 elderly healthy donors. Marker genes are colored by the cell population they characterize. Dot color represents expression levels, and dot size represents the percentage of cells that express a gene.

      Figure 4. Expression of erythroid markers in STREAM trajectories. Expression of GATA1 and HBB (erythroid markers) in the predicted differentiation trajectories.

      5) It is not clear how the authors draw a conclusion from Figure 3D that the number of common targets between transcription factors is reduced. Some quantifications should be provided.

      We thank the reviewer for the comment. We have updated the manuscript to better reflect our findings and emphasize that the predicted regulatory networks of HSCs in elderly donors is displayed as an independent network, compared to the young donors. (Page 6, line 36).

      “Overall, we observed that the predicted regulatory network of elderly HSCs (Figure 3d) appeared as an independent network compared to the young GRN. This finding could result in the loss of co-regulatory mechanisms in the elderly donors.”

      6) The constructed GRNs and related descriptions were based solely on the SCENIC analysis. By providing the results of an orthogonal prediction method for GRNs, the authors should evaluate how robust and consistent their predictions are.

      We thank the reviewer for the comment regarding the method to build gene regulatory networks. As a resource article, our manuscript describes a complete workflow to perform different aspects of single cell analyses. These steps go from automated classification, trajectory inference and GRN prediction. All the selected algorithms have already been benchmarked and compared against other tools that perform similar analysis. SCENIC has already been benchmarked against other algorithms (11) and by others (12).

      We do agree with the reviewer that these new predictions could provide strength to our findings, however we believe that these orthogonal predictions would better fit if our article was intended for the Research Article category instead of Tools and Resources.

      7) The observed age-dependent cellular and molecular alterations in human hematopoiesis are interesting, but I'm wondering whether the observed alterations are driven by inflammatory microenvironment or intrinsic properties of a subpopulation of HSCs affected by clonal hematopoiesis (CH). To address this, the authors can perform genotyping of transcriptomes (GoT) on old healthy donors with CH. By comparing the transcriptomes of cells with and without CH mutations, we can evaluate the effects of CH on age-associated molecular alterations.

      We thank the reviewer for the comment. Unfortunately, in order to perform GoT (genotyping of transcriptomes) on the healthy donors, requires modifying the standard 10x Genomics workflow to amplify the targeted locus and transcript of interest. This would require collecting new samples, optimizing the method and performing new analysis from scratch (from sequencing up to analysis). We believe this is not in the scope of the manuscript. On the other hand, we don’t have enough material to create new single cell libraries, this fact would require the addition of new donors and as a result, a complete new analysis to perform the integration.

      Reviewer #3 (Public Review):

      The authors have performed a transcriptional analysis of young/aged hematopoietic stem/progenitor cells which were obtained from normal individuals and those with MDS.

      The authors generated an important and valuable dataset that will be of considerable benefit to the field. However, the data appear to be over-interpreted at times (for example, GSEA analysis does not have "functionality", as the authors claim). On the other hand, a comparison between normal-aged HSC and HSC from MDS patients appears to be under-explored in trying to understand how this disease (which is more common in the elderly) disrupts HSC function.

      A more extensive cross-referencing of other normal HSPC/MDS HSCP datasets from aged humans would have been helpful to highlight the usefulness of the analytical tools that the authors have generated.

      Major points

      1) The authors detail methodology for identification of cell types from single-cell data - GLMnet. This portion of the text needs to be clarified as it is not immediately clear what it is or how it's being used. It also needs to be explained by what metric the classifier "performed better among progenitor cell types" and why this apparent advantage was sufficient to use it for the subsequent analysis. This is critical since interpretation of the data that follows depends on the validation of GLMnet as a reliable tool.

      We thank the review for the comment. We have updated the corresponding section to better describe how GLMnet is used and that the reasoning on why we decided to use GLMnet as our cell type annotation method instead of other available tools such as Seurat, is based on the results of the benchmark described in Figure 1-figure supplement 1. We also described the main differences between our method and Seurat (See Answer to Review 1, Question # 4).

      2) The finding of an increased number of erythroid progenitors and decreased number of myeloid cells in aged HPSC is surprising since aging is known to be associated with anemia and myeloid bias. Given that the initial validation of GLMnet is insufficiently described, this result raises concerns about the method. Along the same lines, the authors report that their tool detects a reduced frequency of monocyte progenitors. How does this finding correlate with the published data on aging humans? Is monocytopenia a feature of normal aging?

      We thank the reviewer for this comment, as changes in the output of HSCs as a consequence of aging are of high interest. According to the literature, there is clear evidence of the loss of lymphoid progeny with age (13,14), which goes in agreement with our results. However, in the case of the myeloid compartment, the effects of aging are not as clear. Studies in mice have indeed observed that the loss of lymphoid cells is accompanied by increased myeloid output, starting at the level of GMPs (Rossi et al. 2005; Florian et al. 2012; Min et al. 2006). But studies on human individuals have not found changes in numbers of these myeloid progenitors (Kuranda et al. 2011; Pang et al. 2011). In addition, in the mentioned studies, myeloid production was measured exclusively by its white blood cells fraction. More recent studies have focused on the other myeloid compartments: megakaryocyte and erythroid cells. Results point towards the increase of platelet-biased HSC with age (Sanjuan-Pla et al. 2013; Grover et al. 2016) and a possible expansion of megakaryocytic and erythroid progenitor populations (Yamamoto et al. 2018; Poscablo et al. 2021; Rundberg Nilsson et al. 2016), which may represent a compensatory mechanism for the ineffective differentiation towards this lineage in elderly individuals. This goes in line with the accumulation of MEPs we see in our data. Finally, and in accordance with the reduced frequency of monocyte progenitors observed, it has been shown that with increasing age, there is a gradual decline in the monocyte count (15).

      Regarding the concerns about our classification method raised by the reviewer, we have performed additional validations that we describe in answers to reviewer 1 comment #4 and reviewer 2 comment #1. To further confirm that the changes in cellular proportions we found are real, we applied two additional classification methods: Seurat transfer and Celltypist (16) to the elderly donors dataset. We obtained a similar expansion in MEPs, together with reduction of monocytic progenitors with the three methods (Figure 5).

      Figure 5 Classification of HSPCs from elderly donors. Barplot showing proportions of every cell subpopulation per elderly donor, resulting from three classification methods: GLMnet-based classifier, Seurat transfer and Celltypist. For the three methods, cells with prediction scores < 0,5 were labeled as “not assigned”.

      3) The use of terminology requires more clarity in order to better understand what kind of comparison has been performed, i.e. whether global transcriptional profiles are being compared, or those of specific subset populations. Also, the young/aged comparisons are often unclear, i.e. it's not evident whether the authors are referring to genes upregulated in aged HSC and downregulated in young HSC or vice versa. A more consistent data description would make the paper much easier to read.

      We thank the reviewer for this comment. We have updated the manuscript to provide more clarity in the description of the different comparisons made in our analyses. Most changes are located in the Transcriptional profiling of human young and elderly hematopoietic progenitor systems sub-section within the Results.

      4) The link between aging and MDS is not explored but could be an informative use of the data that the authors have generated. For example, anemia is a feature of both aging and MDS whereas neutropenia and thrombocytopenia only occur in MDS. Are there any specific pathways governing myeloid/platelet development that are only affected in MDS?

      Thank you for raising this comment. We believe that discriminating events that take place during healthy aging from those associated to MDS will be helpful to understand this particular disease, as it is so closely related to age. This is why, when analyzing MDS, we have considered young and elderly donors as two separate sets of healthy controls, the eldery donors being the most suitable one for comparisons with MDS samples.

      With regards to the comment on myeloid and platelet development, the GSEA analysis gives potentially useful information. MYC targets and oxidative phosphorylation are significantly enriched in the MEP compartment from MDS patients when compared to elderly donors, indicating that these progenitors may recover a more active profile with the disease. Hypoxia related genes, on the other hand, are more active in HSCs and MEPs from healthy elderly donors than in MDS. Hypoxia is known to be implicated in megakaryocyte and erythroid differentiation (17)

      5) MDS is a very heterogeneous disorder and while the authors did specify that they were using samples from MDS with multilineage dysplasia, more clinical details (blood counts, cytogenetics, mutational status) are needed to be able to interpret the data.

      We thank the reviewer for the comment. All the clinical details for each MDS patient are included in Supplementary File 5.

    1. Author Response

      Reviewer #1 (Public Review):

      In this study, Sims et al. evaluate how system-level brain functional connectivity is associated with cognitive abilities in a sample of older adults aged > 85 years old. Because the study sample of 146 normal older adults has lived into advanced years of age, the novelty here is the opportunity to validate brain-behavioral associations in aging with a reduced concern of the potential influence of undetected incipient neuropsychological pathology. The participants afforded resting-state functional magnetic resonance imaging (rs-fMRI) data as well as behavioral neuropsychological test assessments of various cognitive abilities. Exploratory factor analysis was applied on the behavioral cognitive assessments to arrive at summary measures of participant ability in five cognitive domains including processing speed, executive functioning, episodic memory, working memory, and language. rsfMRI data were submitted to a graph-theoretic approach that derived underlying functional nodes in brain activity, the membership of these nodes in brain network systems, and indices characterizing the organizational properties of these brain networks. The study applies the classification of the various brain networks into a sensory/motor system of networks and an association system of network, with further sub-systems in the latter that includes the frontoparietal network (FPN), the default-mode network (DMN), the cingulo-opercular network (CON), and the dorsal (DA) and ventral (VA) attention networks. Amongst other graph metrics, the study focused on the extent to which networks in these brain systems were segregated (i.e., separable network communities as opposed to a more singular large community network). Evaluation of the brain network segregation indices and cognitive performance metrics showed that in general higher network functional segregation corresponds with higher cognitive performance ability. In particular, this association was seen between the general association system with overall cognition, and the FPN with overall cognition, and processing speed.

      The results worthy of highlighting include the documentation of oldest-old individuals with detectable brain neural network segregation at the level of the association system and its FPN sub-system and the association of this brain functional state notably with general cognition and processing speed and less so with the other specific cognitive domains (such as memory). This finding suggests that (a) apparently better cognitive aging might stem from a specific level of neural network functional segregation, and (b) this linkage applies more specifically to the FPN and processing speed. These specific findings inform the broader conceptual perspective of how human brain aging that is normative vs. that which is pathological might be distinguished.

      We appreciate this comment and we have added these points to the conclusion more explicitly.

      To show the above result, this study defined functional networks that were driven more by the sample data as opposed to a pre-existing generic template. This approach involves a watershed algorithm to obtain functional connectivity boundary maps in which the boundary brain image voxels separate functionally related voxels from unrelated voxels by virtue of their functional covariance as measured in the immediate data. This is also a notable objective and data-driven approach towards defining functional brain regions-of-interest (ROIs), nodes, and networks that are age-appropriate and configured for a given dataset as opposed to using network definitions based on other datasets used as a generic template.

      The sample size of 146 for this age group is generally sufficient.

      For the analyses considering the significance of the effect of the brain network metrics on the cognitive variables, the usage of heirarchical regression to evaluate whether the additional variables (in the full model) significantly change the model fit relative to the reduced model with covariates-only (data collection site, cortical thickness), while a possible approach, might be problematic, particularly when the full model uses many more regressors than the reduced model. In general, adding more variables to regression models reduces the residual variance. As such, it is possible that adding more regressors in a full model and comparing that to a reduced model with much fewer regressors would yield significant changes in the R^2 fit index, even if the added regressors are not meaningfully modulating the dependent variable. This may not be an issue for the finding on the FPN segregation effect on overall cognition, but it may be important in interpreting the finding on the association system metrics on overall cognition.

      Critically, we should note that the correlation effect sizes (justified by the 0.23 value based on the reported power analyses) were all rather small in size. The largest key brain-behavior correlation effect was 0.273 (between DMN segregation and Processing Speed). In the broader perspective, such effects sizes generally suggest that the contribution of this factor is minimal and one should be careful that the results should be understood in this context.

      The recent, highly publicized paper from Marek and colleagues (citation below) offers some support for the assertion that these effect sizes are on the order that would be expected for ‘true’ signals in the brain. While the study reported here is not a “BWAS” as described in the Marek article (BWAS is a brain-wide association study, examining, without a priori hypotheses of brain network, all possible associations), and therefore our study does not fall prey to some of the multiple comparisons issues described in that paper, the general expected effect sizes based on that paper should be relevant here.

      Marek and colleagues suggest that 1) effect sizes in the range of 0.273 are on the order of the larger brain-behavior relationships that can be expected to be replicable, and 2) samples that remove some drivers of individual variability are beneficial to the capability of a study to identify an effect. Relevant to the latter point, by reducing the variability in our sample due to age (our age range is tight) and early signs of neurological disease (these were screened out in our sample), this leaves a sample that is homogeneous along these variables, meaning that brain variability associated with cognitive performance can be more easily pulled from the data.

      Our data have large variability on the behavior end, and large variability on the brain end, allowing better power for seeing effects between them.

      Marek, S., Tervo-Clemmens, B., Calabro, F.J., Montez, D.F., Kay, B.P., Hatoum, A.S., Donohue, M.R., Foran, W., Miller, R.L., Hendrickson, T.J., et al. (2022). Reproducible brain-wide association studies require thousands of individuals. Nature 603, 654–660. 10.1038/s41586-022-04492-9.

      Overall, the findings based on hierarchical regressions that evaluate the network segregation indices in accounting for cognition and the small correlation magnitudes are basically in line with the notion that more segregated neural networks in the oldest-old support better cognitive performance (particularly processing speed). However, the level of positive support for the notion based on these findings is somewhat moderate and requires further study.

      The addition of a control analysis (sensorimotor network) in the newer version of the paper showed that these effects are not present in brain networks not thought to relate to cognition. We agree that further study of these questions is necessary for stronger claims to be made, but the current study advances the field by showing clearly that segregation of the association network and its components relates to behavior even in this oldest old cohort.

      Reviewer #2 (Public Review):

      The authors capitalised on the opportunity to obtain functional brain imaging data and cognitive performance from a group of oldest old with normative cognitive ability and no severe neurophysiological disorders, arguing that these individuals would be most qualified as having accomplished 'healthy ageing'. Combined with the derivation of a cohort-specific brain parcellation atlas, the authors demonstrated the importance of maintaining brain network segregation for normative cognition ability, especially processing speed, even at such late stage of life. In particular, segregation of the frontoparietal network (FPN) was found to be the key network property.

      These results bolstered the findings from studies using younger old participants and are in agreement with the current understanding of the connectomme-cognition relationship. The inclusion of a modest sample size, power analysis, cohort-specific atlas, and a pretty comprehension neuropsychological assessment battery provides optimism that the observed importance of FPN segregation would be a robust and generalisable finding at least in future cross-sectional studies. The fact that FPN segregation is relatively more important to cognition than other associative networks also provides novel insight about the possible 'hierarchy' between age-related neural and cognitive changes, regardless of what mechanisms lead to such segregation at such an advanced age. it is also interesting that processing speed remains to be the 'hallmark' metric of age-related cognitive changes, indirectly speaking to its long assumption fundamental impact on overall cognition.

      As laid out by the authors, if network differentiation is key to normative cognitive ability at old age, intervention and stimulation programs that could maintain or boost network segregation would have high translational value. With advent in mobile self-administrable devices that target behavioural and neural modifications, this potential would have increasing appeal.

      However, I feel that a few things have prevented the manuscript to be a simple yet impactful submission

      1) Interpretation and the major theme of discussion. While the authors attempted to discuss their findings with respect to both the compensatory and network dedifferentiation hypotheses, the results and their interpretation do not readily provide any resolution or reconciliation between the two, a common challenge in many ageing research. The authors did not further elaborate how the special cohort they had may provide further insights to this.

      While the results certainly are in line with the dedifferentiation hypothesis, why 'this finding does not exclude the compensation hypothesis' (Discussion) was not elaborated enough. In particular, the authors seemed to suggest that maintained network specialisation may be in such a role, but the results and interpretations regarding network specialisation were not particularly focused on throughout the manuscript. In addition, both up regulation within a network and cross-network recruitment can both be potential compensatory strategies (Cabeza et al 2018, Rev Nat Neurosci). Without longitudinal data or other designs (e.g. task) it is quite difficult to evaluate the involvement of compensation. For instance, as rightly suggested by the authors, the two phenomena may not be mutually exclusive (e.g., maintenance of the FPN differentiation at such old age could be a result of 'compensation' that started when the participants were younger).

      The reviewer makes some excellent points that we have taken to heart in this revision. We agree that the data as described do not directly address the compensation hypothesis, and therefore de-emphasized our descriptions of that hypothesis in service of a simpler, more impactful manuscript.

      As described above in our response to the essential revisions “In the original submission, we noted relevant literature which describe both the dedifferentiation hypothesis and the compensation hypothesis of aging. Our original aim was to include more of a literature review of cognitive aging theories in the introduction and discussion, but that choice made it too confusing (and honestly left out much important literature). In responding to the reviews we realize that bypassing this cursory literature review here is preferable for the readability of the manuscript. Instead, we cite a literature review, and focus on the dedifferentiation hypothesis.

      “The data we show here addresses the dedifferentiation hypothesis specifically since we are using the segregation metric- a reflection of dedifferentiation of network organization. The reviewers’ comments caused us to do a great deal of thinking on this topic, and we have a forthcoming review with our colleague Ian McDonough that covers this topic in more detail (McDonough, Nolin, and Visscher, 2022). We have substantially rewritten the relevant sections in the discussion (especially section 3.2) to be more clear for readers.”

      As also described in our response to essential revisions 2c, we have added to the discussion regarding the utility of studying the oldest-old; this is in the second paragraph of the discussion, and reproduced above. Additionally, in the Introduction, we also briefly address the importance of this cohort. We state “Prior work has mostly been done in younger-old samples (largely 65-85 years old). Studying the younger-old can be confounded by including pre-symptomatic disease, since it is unknown which individuals may be experiencing undetectable, pre-clinical cognitive disorders and which will continue to be cognitively healthy for another decade. The cognitively unimpaired oldest-old have lived into late ages, and we can be more confident in determining their status as successful agers. A further benefit of studying these successful cognitive agers is that because of their advanced age and the normal aging and plasticity processes associated with it, there is greater variance in both their performance on neurocognitive tasks, and in brain connectivity measures than there is in younger cohorts (Christensen et al., 1994). This increased variance makes it easier to observe across-subject relationships of cognition and brain networks (Gratton, Nelson, & Gordon, 2022). We provide new insight into the relationship between the segregation of networks and cognition by investigating this relationship in an oldest old cohort of healthy individuals.”

      2) Some further clarity about the data and statistical analyses would be desirable. First, since scan length determines the stability of functional connectivity, how long was the resting-state scan? Second, what is the purpose of using both hierarchical regression and partial correlation? While they do consider different variances in the dataset, they are quite similar and the decision looks quite redundant to me as not much further insights have been gained. [the main insight to including a regression is to be able to compare the different networks to each other.]

      The resting-state fMRI scan is 8 minutes in length. This has been added to the text. After considering the redundancy the reviewer notes between hierarchical regression and correlations, we have simplified our statistical approach and only included correlations in the main body of the manuscript. We have put the regressions in the supplemental materials so if interested readers would like to be able to see those results, they are still available.

    1. Author Response

      Reviewer #2 (Public Review):

      Zhukin et al., present the structure of the central scaffold component of the NuA4 complex. They hypothesise how the nucleosome interacting modules not present in the structure could be arranged, based on Alphafold modelling, and comparison of their structure to other complexes that use the same subunits. They show some interesting -albeit fairly preliminary - biochemistry on the binding of the flexible modules, suggesting a role for acetylation affecting H3K4me3 reading.

      While the work builds upon previous structural studies on the Tra1 subunit in isolation and a previous 4.7A resolution structure from another group, there are clear differences and novel findings in this study. The data is presented beautifully and nicely annotated figures make following the many subunits and interactions therein simple. What could have been a very complex manuscript is easy to digest. Some of the figures could do with a couple of additional labels and detailed figure legends to make things a little clearer.

      Overall, a nice study and a wonderfully detailed structure of a large multi-subunit assembly but we would recommend some further experimentation validation to bolster their findings.

      Major comments

      1) All 13 subunits of NuA4 are present by mass spec, however, based on the SDS-page gel (Fig1-1) components of the TINTIN sub-complex seem less than stoichiometric, with Eaf7 and Eaf3 certainly much weaker stained. This is particularly important with reference to Figure 3 and the discussion in the text which assumes the nucleosome interacting modules are all present equally, but too flexible to be observed in the structure.

      Simple peptide numbers from mass spec cannot be used as a measure of protein abundance as this is sensitive to multiple confounding factors.

      We did not identify the locations of individual modules (HAT, TINTIN and Yaf9) within the diffuse density, we merely indicate that this is a likely location for their presents based on the location of connections points and presence of crosslinks in previously published data. We did perform mass photometry analysis of the purified NuA4 sample to better determine the composition of the purified complex (Figure 1-1). We find that the major species peak is center at 1037 kDa, which is very close to the theoretical mass of 1043 kDa. There are a few other minor peaks but none of this would indicate a NuA4 complex lacking TINTIN (Eaf3,5,7) or any other distinct subcomplex.

      2) A major novel biological finding and conclusion from the abstract concerns the binding to modified nucleosomes. However, this seemed somewhat preliminary, especially considering the discussion around the role of acetylation affecting binding to H3K4me3 nucleosomes based solely on the dCypher screen used.

      The discussion on the role of HAT module binding preferential to acetylated and methylated tails concludes that the acetylation liberates the H3 tail from DNA interaction, making H3K4me3 more available for binding by the PHD domain. This is an interesting hypothesis but is stated as fact with very little evidence to make this assertion.

      Whilst others have seen similar results (cited in the paper), no data is presented to disregard an alternative hypothesis that there is some additional acetyl-binding activity in the complex. Indeed, in one of the references they cite the authors do show a direct reading of acetylation as well as methylation.

      TINTIN binding is subject to high background and a fairly minor effect. The biological relevance to these observations while intriguing needs to be proved further.

      We have changed the language of this section to hopefully better leave open other possibilities. As for the TINTIN dCypher results, we do not try to draw too many conclusions, but the data indicates that there is very little (if any) interaction with the histones tails (at least for the modifications present in the panel). One thing we can say is that the TINTIN module does not seem to have any binding preference for H3K36me3 nucleosomes.

      3) There is a large focus on the cross-linking mass spec study from another group and the previously published structure of the NuA4 complex. The authors are fairly aggressive in suggesting the other structure from Wang et al., is incorrect. It is very nice that their built structure shows a better interpretation of previous XL-MS data, but still many of the crosslinks are outside of the modelled density. One possibility that should be entertained is that the two studies are comparing different structures/states of NuA4. The authors of the Wang et al., paper indeed comment that Swc4 and Yaf9 are missing from their purified complex. It is of course possible that both structures are correct as they appear to be biochemically different, with the crosslinking in the Setiaputra paper better reflecting the complex presented here.

      Response given above.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript seeks to identify the mechanism underlying priority effects in a plantmicrobe-pollinator model system and to explore its evolutionary and functional consequences. The manuscript first documents alternative community states in the wild: flowers tend to be strongly dominated by either bacteria or yeast but not both. Then lab experiments are used to show that bacteria lower the nectar pH, which inhibits yeast - thereby identifying a mechanism for the observed priority effect. The authors then perform an experimental evolution unfortunately experiment which shows that yeast can evolve tolerance to a lower pH. Finally, the authors show that low-pH nectar reduces pollinator consumption, suggesting a functional impact on the plant-pollinator system. Together, these multiple lines of evidence build a strong case that pH has far-reaching effects on the microbial community and beyond.

      The paper is notable for the diverse approaches taken, including field observations, lab microbial competition and evolution experiments, genome resequencing of evolved strains, and field experiments with artificial flowers and nectar. This breadth can sometimes seem a bit overwhelming. The model system has been well developed by this group and is simple enough to dissect but also relevant and realistic. Whether the mechanism and interactions observed in this system can be extrapolated to other systems remains to be seen. The experimental design is generally sound. In terms of methods, the abundance of bacteria and yeast is measured using colony counts, and given that most microbes are uncultivable, it is important to show that these colony counts reflect true cell abundance in the nectar.

      We have revised the text to address the relationship between cell counts and colony counts with nectar microbes. Specifically, we point out that our previous work (Peay et al. 2012) established a close correlation between CFUs and cell densities (r2 = 0.76) for six species of nectar yeasts isolated from D. aurantiacus nectar at Jasper Ridge, including M. reukaufii.

      As for A. nectaris, we used a flow cytometric sorting technique to examine the relationship between cell density and CFU (figure supplement 1). This result should be viewed as preliminary given the low level of replication, but this relationship also appears to be linear, as shown below, indicating that colony counts likely reflect true cell abundance of this species in nectar.

      It remains uncertain how closely CFU reflects total cell abundance of the entire bacterial and fungal community in nectar. However, a close association is possible and may be even likely given the data above, showing a close correlation between CFU and total cell count for several yeast species and A. nectaris, which are indicated by our data to be dominant species in nectar.

      We have added the above points in the manuscript (lines 263-264, 938-932).

      The genome resequencing to identify pH-driven mutations is, in my mind, the least connected and developed part of the manuscript, and could be removed to sharpen and shorten the manuscript.

      We appreciate this perspective. However, given the disagreement between this perspective and reviewer 2’s, which asks for a more expanded section, we have decided to add a few additional lines (lines 628-637), briefly expanding on the genomic differences between strains evolved in bacteria-conditioned nectar and those evolved in low-pH nectar.

      Overall, I think the authors achieve their aims of identifying a mechanism (pH) for the priority effect of early-colonizing bacteria on later-arriving yeast. The evolution and pollinator experiments show that pH has the potential for broader effects too. It is surprising that the authors do not discuss the inverse priority effect of early-arriving yeast on later-arriving bacteria, beyond a supplemental figure. Understandably this part of the story may warrant a separate manuscript.

      We would like to point out that, in our original manuscript, we did discuss the inverse priority effects, referring to relevant findings that we previously reported (Tucker and Fukami 2014, Dhami et al. 2016 and 2018, Vannette and Fukami 2018). Specifically, we wrote that: “when yeast arrive first to nectar, they deplete nutrients such as amino acids and limit subsequent bacterial growth, thereby avoiding pH-driven suppression that would happen if bacteria were initially more abundant (Tucker and Fukami 2014; Vannette and Fukami 2018)” (lines 385-388). However, we now realize that this brief mention of the inverse priority effects was not sufficiently linked to our motivation for focusing mainly on the priority effects of bacteria on yeast in the present paper. Accordingly, we added the following sentences: “Since our previous papers sought to elucidate priority effects of early-arriving yeast, here we focus primarily on the other side of the priority effects, where initial dominance of bacteria inhibits yeast growth.” (lines 398-401).

      I anticipate this paper will have a significant impact because it is a nice model for how one might identify and validate a mechanism for community-level interactions. I suspect it will be cited as a rare example of the mechanistic basis of priority effects, even across many systems (not just pollinator-microbe systems). It illustrates nicely a more general ecological phenomenon and is presented in a way that is accessible to a broader audience.

      Thank you for this positive assessment.

      Reviewer #2 (Public Review):

      The manuscript "pH as an eco-evolutionary driver of priority effects" by Chappell et al illustrates how a single driver-microbial-induced pH change can affect multiple levels of species interactions including microbial community structure, microbial evolutionary change, and hummingbird nectar consumption (potentially influencing both microbial dispersal and plant reproduction). It is an elegant study with different interacting parts: from laboratory to field experiments addressing mechanism, condition, evolution, and functional consequences. It will likely be of interest to a wide audience and has implications for microbial, plant, and animal ecology and evolution.

      This is a well-written manuscript, with generally clear and informative figures. It represents a large body and variety of work that is novel and relevant (all major strengths).

      We appreciate this positive assessment.

      Overall, the authors' claims and conclusions are justified by the data. There are a few things that could be addressed in more detail in the manuscript. The most important weakness in terms of lack of information/discussion is that it looks like there are just as many or more genomic differences between the bacterial-conditioned evolved strains and the low-pH evolved strains than there are between these and the normal nectar media evolved strains. I don't think this negates the main conclusion that pH is the primary driver of priority effects in this system, but it does open the question of what you are missing when you focus only on pH. I would like to see a discussion of the differences between bacteria-conditioned vs. low-pH evolved strains.

      We agree with the reviewer and have included an expanded discussion in the revised manuscript [lines 628-637]. Specifically, to show overall genomic variation between treatments, we calculated genome-wide Fst comparing the various nectar conditions. We found that Fst was 0.0013, 0.0014, and 0.0015 for the low-pH vs. normal, low pH vs. bacteria-conditioned, and bacteria-conditioned vs. normal comparisons, respectively. The similarity between all treatments suggests that the differences between bacteria-conditioned and low pH are comparable to each treatment compared to normal. This result highlights that, although our phenotypic data suggest alterations to pH as the most important factor for this priority effect, it still may be one of many affecting the coevolutionary dynamics of wild yeast in the microbial communities they are part of. In the full community context in which these microbes grow in the field, multi-species interactions, environmental microclimates, etc. likely also play a role in rapid adaptation of these microbes which was not investigated in the current study.

      Based on this overall picture, we have included additional discussion focusing on the effect of pH on evolution of stronger resistance to priority effects. We compared genomic differences between bacteria-conditioned and low-pH evolved strains, drawing the reader’s attention to specific differences in source data 14-15. Loci that varied between the low pH and bacteria-conditioned treatments occurred in genes associated with protein folding, amino acid biosynthesis, and metabolism.

      Reviewer #3 (Public Review):

      This work seeks to identify a common factor governing priority effects, including mechanism, condition, evolution, and functional consequences. It is suggested that environmental pH is the main factor that explains various aspects of priority effects across levels of biological organization. Building upon this well-studied nectar microbiome system, it is suggested that pH-mediated priority effects give rise to bacterial and yeast dominance as alternative community states. Furthermore, pH determines both the strengths and limits of priority effects through rapid evolution, with functional consequences for the host plant's reproduction. These data contribute to ongoing discussions of deterministic and stochastic drivers of community assembly processes.

      Strengths:

      Provides multiple lines of field and laboratory evidence to show that pH is the main factor shaping priority effects in the nectar microbiome. Field surveys characterize the distribution of microbial communities with flowers frequently dominated by either bacteria or yeast, suggesting that inhibitory priority effects explain these patterns. Microcosm experiments showed that A. nectaris (bacteria) showed negative inhibitory priority effects against M. reukaffi (yeast). Furthermore, high densities of bacteria were correlated with lower pH potentially due to bacteria-induced reduction in nectar pH. Experimental evolution showed that yeast evolved in low-pH and bacteria-conditioned treatments were less affected by priority effects as compared to ancestral yeast populations. This potentially explains the variation of bacteria-dominated flowers observed in the field, as yeast rapidly evolves resistance to bacterial priority effects. Genome sequencing further reveals that phenotypic changes in low-pH and bacteriaconditioned nectar treatments corresponded to genomic variation. Lastly, a field experiment showed that low nectar pH reduced flower visitation by hummingbirds. pH not only affected microbial priority effects but also has functional consequences for host plants.

      We appreciate this positive assessment.

      Weaknesses:

      The conclusions of this paper are generally well-supported by the data, but some aspects of the experiments and analysis need to be clarified and expanded.

      The authors imply that in their field surveys flowers were frequently dominated by bacteria or yeast, but rarely together. The authors argue that the distributional patterns of bacteria and yeast are therefore indicative of alternative states. In each of the 12 sites, 96 flowers were sampled for nectar microbes. However, it's unclear to what degree the spatial proximity of flowers within each of the sampled sites biased the observed distribution patterns. Furthermore, seasonal patterns may also influence microbial distribution patterns, especially in the case of co-dominated flowers. Temperature and moisture might influence the dominance patterns of bacteria and yeast.

      We agree that these factors could potentially explain the presented results. Accordingly, we conducted spatial and seasonal analyses of the data, which we detail below and include in two new paragraphs in the manuscript [lines 290-309].

      First, to determine whether spatial proximity influenced yeast and bacterial CFUs, we regressed the geographic distance between all possible pairs of plants to the difference in bacterial or fungal abundance between the paired plants. If plant location affected microbial abundance, one should see a positive relationship between distance and the difference in microbial abundance between a given pair of plants: a pair of plants that were more distantly located from each other should be, on average, more different in microbial abundance. Contrary to this expectation, we found no significant relationship between distance and the difference in bacterial colonization (A, p=0.07, R2=0.0003) and a small negative association between distance and the difference in fungal colonization (B, p<0.05, R2=0.004). Thus, there was no obvious overall spatial pattern in whether flowers were dominated by yeast or bacteria.

      Next, to determine whether climatic factors or seasonality affected the colonization of bacteria and yeast per plant, we used a linear mixed model predicting the average bacteria and yeast density per plant from average annual temperature, temperature seasonality, and annual precipitation at each site, the date the site was sampled, and the site location and plant as nested random effects. We found that none of these variables were significantly associated with the density of bacteria and yeast in each plant.

      To look at seasonality, we also re-ordered Fig 2C, which shows the abundance of bacteria- and yeast-dominated flowers at each site, so that the sites are now listed in order of sampling dates. In this re-ordered figure, there is no obvious trend in the number of flowers dominated by yeast throughout the period sampled (6.23 to 7/9), giving additional indication that seasonality was unlikely to affect the results.

      Additionally, sampling date does not seem to strongly predict bacterial or fungal density within each flower when plotted.

      These additional analyses, now included (figure supplements 2-4) and described (lines 290-309) in the manuscript, indicate that the observed microbial distribution patterns are unlikely to have been strongly influenced by spatial proximity, temperature, moisture, or seasonality, reinforcing the possibility that the distribution patterns instead indicate bacterial and yeast dominance as alternative stable states.

      The authors exposed yeast to nectar treatments varying in pH levels. Using experimental evolution approaches, the authors determined that yeast grown in low pH nectar treatments were more resistant to priority effects by bacteria. The metric used to determine the bacteria's priority effect strength on yeast does not seem to take into account factors that limit growth, such as the environmental carrying capacity. In addition, yeast evolves in normal (pH =6) and low pH (3) nectar treatments, but it's unclear how resistance differs across a range of pH levels (ranging from low to high pH) and affects the cost of yeast resistance to bacteria priority effects. The cost of resistance may influence yeast life-history traits.

      The strength of bacterial priority effects on yeast was calculated using the metric we previously published in Vannette and Fukami (2014): PE = log(BY/(-Y)) - log(YB/(Y-)), where BY and YB represent the final yeast density when early arrival (day 0 of the experiment) was by bacteria or yeast, followed by late arrival by yeast or bacteria (day 2), respectively, and -Y and Y- represent the final density of yeast in monoculture when they were introduced late or early, respectively. This metric does not incorporate carrying capacity. However, it does compare how each microbial species grows alone, relative to growth before or after a competitor. In this way, our metric compares environmental differences between treatments while also taking into account growth differences between strains.

      Here we also present additional growth data to address the reviewer’s point about carrying capacity. Our experiments that compared ancestral and evolved yeast were conducted over the course of two days of growth. In preliminary monoculture growth experiments of each evolved strain, we found that yeast populations did reach carrying capacity over the course of the two-day experiment and population size declined or stayed constant after three and four days of growth.

      However, we found no significant difference in monoculture growth between the ancestral stains and any of the evolved strains, as shown in Figure supplement 12B. This lack of significant difference in monoculture suggests that differences in intrinsic growth rate do not fully explain the priority effects results we present. Instead, differences in growth were specific to yeast’s response to early arrival by bacteria.

      We also appreciate the reviewer’s comment about how yeast evolves resistance across a range of pH levels, as well as the effect of pH on yeast life-history traits. In fact, reviewer #2 pointed out an interesting trade-off in life history traits between growth and resistance to priority effects that we now include in the discussion (lines 535-551) as well as a figure in the manuscript (Figure 8).

    1. Author Response

      Reviewer #1 (Public Review):

      This works makes an important contribution to the study of the cell cycle and the attempt to infer mechanism by studying correlations in division timing between single cells.

      Given the importance of circadian rhythms to the ultimate conclusions of the study, I think it would be helpful to clarify the connection between possible oscillatory regulatory mechanisms and the formalism developed in e.g. Equation 3. The treatment appears to be a leading order expansion in stochastic fluctuations of the cell cycle regulators about the mean, but if an oscillatory process is involved, the fluctuations will be correlated in time and need not be small.

      We thank the reviewer for the positive assessment of our work. We have introduced Section S7 in the Supplementary Information to address the connection between our theory and two existing models of circadian modulation of cell division. In the first model, the circadian clock drives the interdivision time, while in the second model, the clock drives cell size control. We find that, while both models satisfy the cousin inequality for comparable parameters, they differ in their interdivision time correlation patterns. The first model yields an alternator-oscillator mixed pattern, while the second gives an aperiodic-oscillator pattern.

      The reviewer is right that our theory presents a leading-order expansion of cell cycle factor fluctuations. To overcome this limitation, we introduced the new Section S2 in the Supplementary Information, which shows how the correlation patterns are altered for moderately strong fluctuations. Interestingly, nonlinearity can be treated within our framework by introducing complexes of cell cycle factors. However, our model selection predicted that two cell cycle factors were enough to fit the present data without the need for complexes.

      Reviewer #2 (Public Review):

      This paper is of broad interest to scientists in the fields of cell growth, cell division, and cell-cycle control. Its main contribution is to provide a method to restrict the space of potential cell-cycle models using observed correlations in inter-division times of cells across their lineage tree. This method is validated on several data sets of bacterial and mammalian cells and is used to determine what additional measurements are required to distinguish the set of competing models consistent with a given correlation pattern.

      The patterns of correlations in the division times of cells within their lineage tree contain information about the inheritable factors controlling cell cycles. In general, it is difficult to extract such information without a detailed model of cell cycle control. In this manuscript, the authors have provided a Bayesian inference framework to determine what classes of models are consistent with a given set of observations of division time correlations, and what additional observations are needed to distinguish between such models. This method is applied to data sets of division times for various types of bacterial and mammalian cells including cells known to exhibit circadian oscillations.

      The manuscript is well-written, the analyses are thorough, and the authors have provided beautiful visualizations of how alternative models can be consistent with a finite set of observed correlations, and where and how extra measurements can distinguish between such models. Known models of growth rate correlations, cell-size regulation, and cell cycle control are analyzed within this framework in the Supplemental Information. A major advantage of the proposed method is that it provides a non-invasive framework to study the mechanism of cell-cycle control.

      We thank the reviewer for the positive response to our manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript the authors describe an approach for controlling cellular membrane potential using engineered gene circuits via ion channel expression. Specifically, the authors use microfluidics to track S. cerevisiae gene expression and plasma membrane potential (PMP) in single cells over time. They first establish a small engineered gene circuit capable of producing excitable gene expression dynamics through the combination of positive and negative feedback, tracking expression using GFP (Figure 1). Though not especially novel or complex, the data quality is high in Figure 1 and the results are convincing. Note that the circuit is excitable and not oscillatory; it is being driven periodically by a chemical inducer. I think the authors could have done a better job justifying the use of an excitable engineered gene circuit system, since you could get a similar result by just driving a promoter with the equivalent time course of inducer.

      We restructured the manuscript by presenting the open-loop version of our synthetic circuit and demonstrate that closed loop system integrating feedbacks performs significantly better than its open-loop version (revised Figures 1 and 3). This open-loop system is based on Mar proteins that can synchronizes gene expression on extended spatiotemporal scales (PerezGarcia et al., Nat Comm, 2021). Other driven systems (i.e., TetR, AraC, LacI) can temporally synchronize gene expression in single bacteria cells to successive cycles of inducer. However, over time these bacterial systems build substantial delays in phases between cells, partially due to noise that ultimately led to desynchrony between individual cells even though they tend to follow the common inducer. This is clearly not the case in Mar-based systems (Perez-Garcia et al., Nat Comm, 2021) as eukaryotic cells synchronize to each other under guidance of common environmental stimuli with neglectable phase drift. Furthermore, in revised version we show that dual feedback strategy provides a robust solution to control ion channel expression and associated changes in PMP (see Conclusions lines 231-237).

      The authors then use a similar approach to produce excitable expression of the bacterial ion channel KcsA, tracking membrane voltage using the voltage-sensitive dye ThT rather than GFP fluorescence (Figure 2). The experimental results in this figure are more novel as the authors are now using the expression of a heterologous ion channel to dynamically control plasma membrane potential. While fairly convincing, I think there are a few experimental controls that would make these results even more convincing. It is also unclear why the authors are now using power spectra to display observed frequencies compared to the much more intuitive histograms used in Figure 1.

      Now we use violin plots with period distributions consistently in all figures to ease the comparison between scenarios.

      Finally, the authors move on to use a similar excitable engineered gene circuit approach to produce inducible control of the K1 toxin which influences the native potassium channel TOK1 rather than the heterologous ion channel KcsA (Figure 3). I have a similar reaction to this figure as with Figure 2: the results are novel and interesting but would benefit from more experimental controls. Additionally, the image data shown in Figure 3b is very unclear and could be expanded and improved.

      In revised version we have decided to remove K1 toxin data as we are aware that we cannot modulate K1 degradation rate due to its extracellular nature. Instead, we have decided to perform additional experiments in which we directly plugged our circuit to TOK1 native potassium channel to demonstrate that our feedback-integrating synthetic circuit is capable of controlling TOK1 dosage and associated PMP changes (revised Figure 3, and lines 209-220). We believe these new data make more direct connection between synthetic circuits phytohormones and native channel expression than presented earlier K1-based scenario.

      Overall, in my opinion the claims in the abstract and title are a bit strong. I would deemphasize global coordination and "synchronous electrical signaling" since the authors are driving a global inducer. To make the claim of synchronous signaling I would want to see spatial data for cells near vs. far from K1 toxin producing cells in Figure 3 along with estimates of inducer/flow timescale vs. expression/diffusion of K1 toxin. As I read the manuscript, I see that most of the synchronicity comes from the fact that all cells are experiencing a global inducer concentration.

      We agree with the Reviewer, synchronicity and global coordination comes from phytohormone sensing feedback circuit that is guided by cyclic environmental changes. We have revised definition of synchronous signaling as suggested, focusing on the macroscopic synchronization of ion channel expression achieved by external modulation, which is the key message coming from this work.

      Reviewer #2 (Public Review):

      The authors present a novel method to induce electrical signaling through an artificial chemical circuit in yeast which is an unconventional approach that could enable extremely interesting, future experiments. I appreciate that the authors created a computer model that mathematically predicts how the relationship between their two chemical stimulants interact with their two chosen receptors, IacR/MarR, could produce such effects. Their experimental validations clearly demonstrated control over phase that is directly related to the chemical stimulation. In addition, in the three scenarios in which they tested their circuit showed clear promise as the phase difference between spatially distant yeast communities was ~10%. Interestingly, indirect TOK1 expression through K1 toxin gives a nice example of inter-strain coupling, although the synchronization was weaker than in the other cases. Overall, the method is sound as a way to chemically stimulate yeast cultures to produce synchronous electrical activity. However, it is important to point out that this synchronicity is not produced by colony-colony communication (i.e., self-organized), but by a global chemical drive of the constructed gene-expression circuit.

      The greatest limitation of the study lies in the presentation (not the science). There are two significant examples of this. First, the authors state this study 'provides a robust synthetic transcriptional toolbox' towards chemo-electrical coupling. In order to be a toolbox, more effort needs to be put into helping others use this approach. However little detail is given about methodological choices, circuit mechanisms in relation to the rest of the cell, nor how this method would be used outside of the demonstrated use case. Second, the authors stress that this method is 'non-invasive', but I fail to see how the presented methodology could be considered non-invasive, in in-vivo applications, as gene circuits are edited and a reliable way to chemically stimulate a large population of cells would be needed. It may be that I misunderstood their claim as the presentation of method and data were not done in a way that led to easy comprehension, but this needs to be addressed specifically and described.

      We apologize Reviewer for a potential misunderstanding. By ‘non-invasive’ we meant that such systems would not need, for instance, the surgical installation of light components to control ion channel activity. Nonetheless, we have removed these confusing sentences from the revised manuscript.

      The rational for using Mar-based system with feedback strategy data has been now presented in more structured and comprehensive way across the revised manuscript to demonstrate benefits from integrating feedback as well as potential of such systems for excitable dynamics with noise-filtering capability and faster responsiveness. We also show how system can be coupled to native potassium channels, opening ways to integrate synthetic circuit into other organisms.

      In terms of classifying the synchronicity, while phase difference among communities was the key indicator of synchronization, there were little data exploring other aspects of coupled waveforms, nor a discussion into potential drawbacks. For example, phase may be aligned while other properties such as amplitude and typical wave-shape measures may differ. As this is presented as a method meant for adoption in other labs, a more rigorous analytical approach was expected.

      In the revised manuscript, we have analyzed synchronicity using several different approaches:

      (1) we calculate cumulative autocorrelations of response between communities.

      (2) to complement autocorrelation analysis, we developed a quantitative metric of ‘synchrony index’ defined as 1 - R where R is the ratio of differences in subsequent ThT peak positions amongst cell communities (phase) to expected period. This metrics describes how well synchronized are fungi colonies with each other under guidance of the common environmental signal.

      (3) we analyzed amplitudes and peak widths for all presented scenarios and we conclude that while periods and peak widths are robust across communities there is noticeable variation in amplitudes (i.e. Figure 3E).

      We therefore believe that this multistep quantitative approach is rigorous in identifying oscillatory signal characteristics.

      Reviewer #3 (Public Review):

      We are enthusiastic about this paper. It demonstrates controlled expression of ion channels, which itself is impressive. Going a step further, the authors show that through their control over ion channel expression, they can dynamically manipulate membrane potential in yeast. This chemical to electrophysiological conversion opens up new opportunities for synthetic biology, for example development of synthetic signaling systems or biological electrochemical interfaces. We believe that control of ion channel expression and hence membrane potential through external stimuli can be emphasized more strongly in the report. The experimental time-lapse data were also high quality. We have two major critiques on the paper, which we will discuss below.

      First, we do not believe the analyses used supports the authors' claims that chemical or electrical signals are propagating from cell-to-cell. The text makes this claim indirectly and directly. For example, in lines 139-141, the authors describe the observed membrane potential dynamics as "indicative of the effective communication of electrical messages within the populations". There are similar remarks in lines 144 and 154-156. The claim of electrical communication is further established by Figure 2 supplement 3, which is a spatial signal propagation model. As far as we can tell, this model describes a system different from the one implemented in the paper.

      Second, it is not clear why the excitable dynamics of the circuit are so important or if the circuit constructed does in fact exhibit excitable dynamics. Certainly, the mathematical model has excitable dynamics. However, not enough data demonstrates that the biological implementation is in an excitable regime. For example, where in the parameter space of Figure 1 supplement 1 does the biological circuit lie? If the circuit has excitable dynamics, then the authors should observe something like Figure 1 supplement 1B in response to a nonoscillating input. Do they observe that? Do they observe a refractory period? Even if the circuit as constructed is not excitable, we don't think that's a major problem because it is not central to what we believe is the most important part of this work - controlling ion channel expression and hence membrane potential with external chemical stimuli.

      We thank Reviewer for encouraging comments and positive evaluation of our work.

    1. Author Response

      Reviewer 1# (Public Review):

      Purkinje cells (PCs) in the cerebellum extend axonal collaterals along the PC layer and within the molecular layer. Previous anatomical studies have shown the existence of these tracts and recently, the existence of functional synapses from PCs to PCs, molecular layer interneurons (MLIs), and other cell types was demonstrated by Witter et al., (Neuron, 2016) using optogenetics. In this manuscript, Halverson et al., first characterize the PC to MLI synapse properties in the slice using optogenetics and dual patch recordings. They then use computer simulations to predict the role of these connections in eyelid conditioning and test these predictions using in vivo recordings in rabbits. Authors claim that PCs fire before their target MLIs and that their activity is anticorrelated. They further suggest that the special class of MLIs receiving inhibitory input from PCs might serve to synchronize PCs during eyelid conditioning.

      Major comments:

      1) The manuscript is quite long with 9 main figure panels and 6 supplementary figures. The flow of the results is not smooth. While the first 4 figures are nicely done in terms of their results and organization, the same cannot be said about the rest of the figures.

      To address this concern, we have revised the Results section extensively. We believe that it is now much more accessible and better integrated.

      In fact, it would make sense to split the manuscript in two, one describing the synaptic properties and circuit mapping of the PC-PC-MLI circuit and the other describing their role in eyelid conditioning. As it stands, this manuscript is a tough read and difficult to get through.

      We acknowledge that our results, which were done in two different labs and employed a variety of different techniques, could have been split into two (or even more) separate papers. However, we believe that there is high value to our readers in providing a comprehensive study that integrates many different types of analyses to attack the same fundamental question. That is why we chose to organize the content in the way that we did and that is why we still prefer to keep the entire story together. However, we do agree with the reviewer’s point that the previous version was unwieldy and too challenging to understand. Therefore, we have invested a lot of effort to improve the readability of the revised version.

      Further, the authors have not connected the initial slice physiology with the later in vivo work to argue for their presence in the same paper. For example, the quantal content measurement, the short-term plasticity, the mobilization rate measurement, etc do not figure in the latter half of the manuscript at all. I strongly suggest carving figures 1-4 out into a separate manuscript.

      The slice work motivated the computational simulation and the in vivo recordings of MLI activity. While it is true that it is hard to correlate every aspect of the slice work (e.g. quantal content measurements, etc.) with the in vivo recordings, and vice versa, there are elements of each that have informed the other. As a result, consistent properties of the PC-to-PC-MLI circuit emerged. We have highlighted the cross-connections more in the revised manuscript, including the following passages:

      1) “Only a subset of MLIs (8.7%) showed clear inverse correlation with eyelid PCs and they were within approximately 120 µm of eyelid PCs. These connectivity rates and distances are comparable to our observations in cerebellar slices, where we found that approximately 5-6% of MLIs receive PC feedback inhibition (Figure 1b) that extends over 200 m or less (Figure 3).” (p.13, para 1)

      2) “The pattern of cross-correlation between connected PCs and PC-MLIs was qualitatively similar to that observed in slices…” (p.14, para 2).

      3) “This correspondence strengthens the conclusion that putative PC-MLI identified in vivo are equivalent to the PC-MLI identified in slices” (p. 15, para 1).

      4) “The need for relatively large changes in PC activity in vivo highlights the importance of the frequency-independent synaptic synaptic transmission at the PC-to-PC-MLI synapse illustrated in Figure 2.” (p. 16, para. 1)

      We have more closely harmonized the style of all figures, to subliminally emphasize the close connection between the slice and in vivo results.

      Above we have addressed the suggestion to split the paper into two. Instead of breaking up the paper, we worked hard to better integrate the two parts and make them easier to read as a whole.

      2) Authors conclude that eyelid PCs and eyelid PC-MLIs are inversely correlated and that PCs precede PC-MLIs during CRs and therefore could drive their activity. Both of these points are insufficiently justified by their analysis. First, it is not clear how eyelid PCs are identified – I’m assuming this is based on negative correlation with CRs just like positively correlated MLIs are assigned as eyelid PC-MLIs.

      We apologize for failing to mention that eyelid PCs were identified by the presence of US -evoked (eyelid stimulation) complex spikes. This criterion is completely independent of the responses of PCs during expression of eyelid CRs and also provides an in vivo tool for identifying the “eyelid” region of the cerebellar cortex, which should also be where eyelid PC-MLIs are located. To address this omission, we have now describe the method used to identify eyelid PCs in the Methods section (p. 31, para. 2) and the Results section (p. 11, para. 2).

      If this is how PCs and PC-MLIs are identified, then the inverse correlation between the two cell types results from this definition itself. And, their activity pattern during CRs, illustrated in many figure panels is hardly surprising.

      Yes of course this would be circular logic, but it is not at all what we did! Again, we apologize for the confusion.

      Second, to show that PCs fire ahead of PC-MLIs, the authors calculate the difference in fractional change in spike rate before and after the start of the CR (PC-MLI). Their reasoning is that if the bulk of firing rate change happened before the start of CR for PCs, but at the start or later for PC-MLIs, then this value will be positive, else it will be negative. The distribution of these values was shifted to the positive side leading them to conclude that PCs fire ahead of PC-MLIs. However, this is a huge logical jump. The sign of (PC-MLI) is dependent on the depth of modulation in each cell type as well and does not necessarily indicate relative timing. In any case, such caveats have not been ruled out in their analysis. This analysis to establish timing is unconvincing. Would it not be better to look at the timing of the spike modulation start directly rather than the round-about method they are using?

      We agree with the reviewer that PC and PC-MLI activities undergo complex time-dependent changes, particularly during CRs, which makes it challenging to have a single parameter that uniquely represents the differences in timing between the activity of the two cell types. In our revised manuscript, we have addressed this issue by creating a new section that is entirely devoted to analysis of the temporal relationship between PC and PC-MLI activity (pp. 14-17). In brief, here are the main lines of evidence that PCs fire prior to PC-MLIs, both in baseline conditions and during conditioned eyelid responses.

      We have provided 3 types of evidence that PCs fire prior to putative PC-MLIs during baseline activity:

      1) A spike-triggered average of PC and putative PC-MLI activity during baseline firing showed a modest decrease in PC-MLI firing rate in response to a PC action potential (Figure 8a; see also Figure 8-figure supplement 1a and 1b).

      2) A pause in PC activity caused a very substantial rise in activity in putative PC-MLIs (Figure 8c; see also Figure 8-figure supplement 1c).

      3) A burst of PC activity caused a decline in putative PC-MLI activity (Figure 8d; see also Figure 8-figure supplement 1d).

      We have an additional 3 lines of evidence showing that PCs fire prior to putative PC-MLIs during CRs:

      1) Simultaneous recordings of the time course in changes in PC and putative PC-MLI activity during CRs indicate that PC activity usually declined prior to the activity of putative PC-MLIs. This is clearly visible in the examples shown in Figure 9c, as well as the averaged data shown in Figure 9-figure supplement 1a.

      2) We measured the delay between the time at which PC activity reached 50% of its minimum during the CS and the time at which the activity of putative PC-MLIs reached 50% of its maximum during single trials. Whenever CRs were observed, PCs reached their half-maximal response before putative PC-MLIs did (Figure 9─figure supplement 1b).

      3) We also measured the collective timing of changes in the activity of putative PC-MLIs and eyelid PCs during conditioning across all of our paired recordings. This was done by calculating a ratio representing the magnitude of changes in activity prior to CR onset, normalized to the peak amplitude of the change during the entire interval. The distribution of differences in the timing of changes in PC and PC-MLI activity has a mean that is greater than zero (Figure 10), indicating that eyelid PCs decreased their activity before putative PC-MLIs increased their activity in a majority of cases.

      We hope that these improvements have adequately addressed the reviewer’s concern.

      3) Many figure panels make the same point and appear redundant. For example, that PCs and PC-MLIs are inversely correlated with each other in vivo during CRs is shown in Figure 7, figure 8a, S2, S4, and S5. Of course, in each case, the data are sorted differently (according to ISI, CR initiation, cumulative distributions, etc.,) but surely, the point regarding inverse relationship can be conveyed more concisely?

      As mentioned in response to the reviewer’s previous comment, we have made significant changes to this part of the manuscript, including creating a section that addresses the temporal relationship between PC and PC-MLI activity. This has involved removing some of the analyses listed by the reviewer, adding some new analysis and relegating some previous figures to the supplementary materials. We believe that these changes allow the manuscript to efficiently clarify the relationship between PC and PC-MLI activity and highlight the value of each type of analysis that is included.

      4) Several details are missing in the methods section even though parts of it may have been published before. For instance, how are CRs calculated in the simulation? Methods state that 'The averaged and smoothed activity of the eight deep nucleus neurons was used to represent the output of the simulation and the predicted "eyelid response" of the simulation'. It is not clear what the nature of this transform is and if any calibration factors were used. How comparable are the simulated CRs in kinetics and amplitude to experimental CRs?

      In response to this comment, we have revised the Methods section to include much more detail about the simulation methods, including two schematic diagrams. The methods employed for the experimental work in the paper are already described in detail.

      The simulation can produce simulated CRs (smoothed histogram of nucleus activity) with kinematic variables that are comparable to experimental CRs. A detailed account of how this is accomplished is described in Medina and Mauk (2000) and are briefly summarized on p. 39, para.2: “The averaged and smoothed activity of the eight deep nucleus neurons was used to represent the output of the simulation and the predicted “eyelid response” of the simulation.” Although this approach is not intended to simulate the precise kinematics of an eyeblink, comparison of Figures 11a (simulation) and 6a (rabbit) show that there is a reasonable concordance. The real value of the simulation is in predicting the relative changes in eyelid responses that occur during conditioning.

    1. Author Response

      Reviewer #1 (Public Review):

      Figures 2 through 6. There is no description of the relationship between the findings and the anatomical location of the electrodes (other than distal versus local). Perhaps the non-uniform distribution of electrodes makes these analyses more complicated and such questions might have minimal if any statistical power. But how should we think about the claims in Figures 2-6 in relationship to the hippocampus, amygdala, entorhinal cortex, and parahippocampal gyrus? As one example question out of many, is Figure 2C revealing results for local pairs in all medial temporal lobe areas or any one area in particular? I won't spell out every single anatomical question. But essentially every figure is associated with an anatomical question that is not described in the results.

      To address the reviewer’s point we now report the distribution of spike-LFP pairs across anatomical regions for each Figure 2-6. The results split by anatomical regions are reported in Figure 2 – figure supplement 7, Figure 3 – figure supplement 7, Figure 4 – figure supplement 1, Figure 5 – figure supplement 2, and Figure 6 – figure supplement 3. We also calculated a non-parametric Kruskal-Wallis Test to statistically examine the effect of anatomical regions on the results shown in each figure. Generally, these new results show that the effects are similar across regions, apart from two exceptions (i.e. Figure 4 – supplement 1; and Figure 5 – supplement 2). However, we would like to stress that these results should be taken with a huge grain of salt because the electrodes were not evenly distributed across regions (i.e. ~75% of observations pertain to the hippocampus), and patients as the reviewer correctly points out. This leads to sometimes very low numbers of observations per region and it is difficult to disentangle whether any apparent differences are driven by regional differences, or differences between patients. Detailed results are reported below.

      Manuscript lines 207-212: “In the above analysis all MTL regions were pooled together to allow for sufficient statistical power. Results separated by anatomical region are reported in Figure 2 – figure supplement 7 for the interested reader. However, these results should be interpreted with caution because electrodes were not evenly distributed across regions and patients making it difficult to disentangle whether any apparent differences are driven by actual anatomical differences, or idiosyncratic differences between patients.”

      Manuscript lines 255-258: “Finally, we report the distal spike-LFP results separated by anatomical region in Figure 3 – figure supplement 7, which did not reveal any apparent differences in the memory related modulation of theta spike-LFP coupling between regions.”

      Manuscript lines 264-266: “PSI results separated by anatomical regions are reported in Figure 4 – figure supplement 1, which revealed that the PSI results were mostly driven by within regional coupling.”

      Manuscript lines 399-303: “We also analyzed whether the memory-dependent effects of cross-frequency coupling differ between anatomical regions (see Figure 5 – figure supplement 2). This analysis revealed that the results were mostly driven by the hippocampus, however we urge caution in interpreting this effect due to the large sampling imbalance across regions.”

      Manuscript lines 343-346: “As for the above analysis we also investigated any apparent differences in co-firing between anatomical regions. These results are reported in Figure 6 – figure supplement 3 and show that the earlier co-firing for hits compared to misses was approximately equivalent across regions.”

      Figure 1

      1A. I assume that image positions are randomized during a cued recall?

      Yes, that was the case. We now added that information in the methods section.

      Manuscript lines 526: “Image positions on the screen were randomized for each trial.”

      What was the correlation between subjects' indication of how many images they thought they remembered and their actual performance?

      We did not log how many images the patients thought they remembered. Specifically, if the patients answered that they remembered at least one image, then they were shown the selection screen where they could select the appropriate images. Therefore, we cannot perform this analysis. We report this now in the methods section. However, albeit interesting, the results of such an analysis would not affect the main conclusions of our manuscript.

      Manuscript lines 523-524: “The experimental script did not log how many images the patient indicated that they thought to remember.”

      1B. Chance is shown for hits but not misses. I assume that hits are defined as both images correct and misses as either 0 or 1 image correct. Then a chance for misses is 1-chance for hits = 5/6. It would be nice to mark this in the figure.

      Done as suggested (see Figure 1).

      The authors report that both incorrect was 11.9%. By chance, both incorrect should be the same as both correct, hence also 1/6 probability, hence the probability of both incorrect seems quite close to chance levels, right?

      Yes, that is correct, however, across sessions the proportion of full misses (i.e. both incorrect) was significantly below chance (t(39)=-1.9214; p<0.05). Nevertheless, the proportion of fully forgotten trials appears to be higher than expected purely by chance. This is likely driven by a tendency of participants to either fully remember an episode, or completely forget it, as demonstrated previously in behavioural work (Joensen et al., 2020; JEP Gen.). We report this now in the manuscript.

      Manuscript lines 132-136: “Across sessions the proportion of full misses (i.e. both incorrect) was significantly below chance (t39=-1.92; p<0.05). However, the proportion of fully forgotten trials appears to be higher than expected purely by chance. This is likely driven by a tendency of participants to either fully remember an episode, or completely forget it, as demonstrated previously in behavioral work (25).”

      1C. How does the number of electrodes relate to the number of units recorded in each area?

      The distribution of neurons per region is shown in the new Figure 1D (see above). It approximately matches the distribution of electrodes per region, except for the Amygdala where slightly more neurons where recorded. This is because of one patient (P08) who had two electrodes in the left and right Amygdala and who contributed at lot of sessions (i.e. 9 sessions, comparing to an average of 4.44 per patient).

      Line 152. The authors state that neural firing during encoding was not modulated by memory for the time window of interest. This is slightly surprising given that other studies have shown a correlation between firing rates and memory performance (see Zheng et al Nature Neuroscience 2022 for a recent example). The task here is different from those in other studies, but is there any speculation as to potential differences? What makes firing rates during encoding correlate with subsequent memory in one task and not in another? And why is the interval from 2-3 seconds more interesting than the intervals after 3 seconds where the authors do report changes in firing rates associated with subsequent performance? Is there any reason to think that the interval from 2-3 seconds is where memories are encoded as opposed to the interval after 3 seconds?

      Zheng et al. used a movie-based memory paradigm where they manipulated transitions between scenes to identify event cells and boundary cells. They show that boundary cells, which made up 7.24% of all recorded MTL cells, but not event cells (6.2% of all MTL cells) modulate their firing rate around an event depending on later memory. There are quite a few differences between Zheng et al’s study and our study that need to be considered. Most importantly, we did not perform a complex movie-based memory paradigm as in Zheng et al. and therefore cannot identify boundary cells, which would be expected to show the memory dependent firing rate modulation. This alone could contribute to the fact that no significant differences in firing rates in the first second following stimulus onset were observed. Such an absence of a difference of neural firing depending on later memory is not unprecedented. In their seminal paper, Rutishauser et al. (2010; Nature) report no significant differences in firing rates (0-1 seconds after stimulus onset, which is similar to our 2-3 seconds time window) between later remembered or later forgotten images. This finding is also in line to Jutras & Buffalo (2009; J Neurosci) who also show no significant difference in firing rates of hippocampal neurons during encoding of remembered and forgotten images.

      The 2-3 seconds time interval, which corresponds to 0-1 seconds after the onset of the two associate images, is special because it marks the earliest time point where memory formation can start, therefore allowing us to investigate these very early neural processes that set the stage for later memory-forming processes. While speculative, these early processes likely capture the initial sweep of information transfer into the MTL memory system which arguably is reflected in the timing of spikes relative to LFPs. It is conceivable that these initial network dynamics reflect attentional processes, which act as a gate keeper to the hippocampus (Moscovitch, 2008; Can J Exp Psychol) and thereby set the stage for later memory forming processes. This interpretation would be consistent with studies in macaques showing that attention increases spike-LFP coupling, whilst not affecting firing rates (Fries et al., 2004; Science). We modified the discussion section to address this issue.

      Manuscript lines 468-474: “Interestingly, these early modulations of neural synchronization by memory encoding were observed in the absence of modulations of firing rates, which is consistent with previous results in humans (16) and macaques (12), but contrasts with (43). Studies in macaques showed that attention increases spike-LFP coupling whilst not affecting firing rates (44). It is therefore conceivable that these initial network dynamics reflect attentional processes, which act as a gate keeper to the hippocampus and thereby set the stage for later memory forming processes (45).”

      Lines 154-157 and relationship to the subsequent analyses. These lines mention in passing differences in power in low-frequency bands and high-frequency bands. To what extent are subsequent results (especially Figures 3 and 4) related to this observation? That is, are the changes in spike-field coherence, correlated with, or perhaps even dictated by, the changes in power in the corresponding frequency bands?

      To address this question we repeated the analysis that we performed for SFC for Power in those channels whose LFP was locally coupled to spikes in gamma, and distally coupled to spikes in theta. Furthermore, we correlated the difference in peak frequency between hits and misses between Power and SFC. If power would dictate the effects seen in SFC then we would expect similar effects of memory in power as in SFC, that is an increase of peak frequency for hits compared to misses for gamma and theta. Furthermore, we would expect to find a correlation between the peak frequency differences in power and SFC. None of these scenarios were confirmed by the data. These results are now reported in Figure 2 – figure supplement 5 for gamma, and Figure 3 – figure supplement 5 for theta.

      Manuscript lines 195-199: “We also tested whether a similar shift in peak gamma frequency as observed for spike-LFP coupling is present in LFP power, and whether memory-related differences in peak gamma spike-LFP are correlated with differences in peak gamma power (Figure 2 – figure supplement 5). Both analyses showed no effects, suggesting that the effects in spike-LFP coupling were not coupled to, or driven by similar changes in LFP power.”

      Manuscript lines 248-253: “As for gamma, we also tested whether a similar shift in peak theta frequency is present in LFP power, and whether there is a correlation between the memory-related differences in peak theta spike-LFP and peak theta power (Figure 3 – figure supplement 5). Both analyses showed no effects, suggesting that the effects in spike-LFP coupling were not coupled to, or driven by similar changes in LFP power.”

      Do local interactions include spike-field coherence measurements from the same microwire (i.e., spikes and LFPs from the same microwire)?

      Yes, they do. Out of the 53 local spike-SFC couplings found for the gamma frequency range, 11 (20.75%) were from pairs where the spikes and LFPs were measured on the same microwire. We assume that the reviewer is asking this question because of a concern that spike interpolation may introduce artifacts which may influence the spectrograms and consequently the spike-LFP coupling measures. This was also pointed out by Reviewer #2. To address this concern, we split the data based on whether the spike and LFP providing channels were the same or different. The results show that (i) the spectrogram of SFC is highly similar between the two datasets, with a prominent gamma peak present in both and no significant differences between the two; (ii) restricting the analysis to those data where the LFP and spike providing channels are different replicated the main finding of faster gamma peak frequencies for hits compared to misses; and (iii) limiting the SFC analysis further to only ‘silent’ channels, i.e. channels where no SUA/MUA activity was present at all also replicated the main finding of faster gamma peak frequencies for hits compared to misses.

      These analyses suggest that the SFC results were not driven by spike interpolation artefacts.

      Manuscript lines 199-203: “To rule out concerns about possible artifacts introduced by spike interpolation we repeated the above analysis for spike-LFP pairs where the spike and LFP providing channels are the same or different, and for ‘silent’ LFP channels (i.e. channels were no SUA/MUA activity was detected (see Figure 2 – figure supplement 6). “

      60 Hz. It has always troubled me deeply when results peak at 60 Hz. This is seen in multiple places in the manuscript; e.g., Figures 2B, 2E. What are the odds that engineers choosing the frequency for AC currents would choose the exact same frequency that evolution dictated for interactions of brain signals? This is certainly not the only study that reports interesting observations peaking at 60 Hz. One strong line of argument to suggest that this is not line noise is the difference between conditions. For example, in Figure 2B, there is a difference between local and distal interactions. It is hard for me to imagine why line noise would reveal any such difference. Still ...

      The frequency for AC currents in Europe is 50 Hz, not 60 Hz as in the US. Therefore, our SFC effects are well outside the range of the notch.

      Figure 6. I was very excited about Figure 6, which is one of the most novel aspects of this study. In addition to the anatomical questions about this figure noted above, I would like to know more. What is the width of the Gaussian envelope?

      The width of the Gaussian Window used in the original results was 25ms. We chose this time window because in our view it represents a good balance between integrating over a long-enough time window and thus allowing for some jitter in neural firing between pairs of neurons, whilst still being temporally specific. Finding the right balance here is not trivial because a too short time window underestimates co-firing, and a too long time window may not provide the temporal specificity necessary to detect co-firing lags (Cohen & Kohn, 2011; Nat Neurosci). To test whether this choice critically affected our results, we repeated the analysis for different window sizes, i.e. 15, 35, and 45 ms. The results show that the pattern of results did not change, with hits showing earlier peaks in co-firing compared to misses. Critically, the difference in co-firing peaks was significant for all window sizes, except for the shortest one which presumably is due to the increase in noise because of the smaller time window over which spikes are integrated. These issues are now mentioned in the methods section, and the results for the different window sizes are reported in Figure 6 – figure supplement 4.

      Manuscript lines 346-347: “The co-firing analyses were replicated with different smoothing parameters (see Figure 6 – figure supplement 4).”

      Manuscript lines 894-898: “We chose this time window because it should represent a good balance between integrating over a long-enough time window and thus allowing for some jitter in neural firing between pairs of neurons, whilst still being temporally specific (57). To test whether this choice critically affected our results, we repeated the analysis for different window sizes, i.e. 15, 35, and 45 ms (see Figure 6 – figure supplement 4).”

      Are these units on the same or different microwires?

      All units used for the analysis shown in Figure 6 come from different microwires. This was naturally the case because the putative up-stream neuron was distally coupled to the theta LFP, and the putative down-stream neuron was locally coupled to gamma at this same theta LFP electrode. This information is listed in Figure 6 – source data 1 which lists the locations and electrode IDs for all neuron pairs shown in figure 6.

      How do the spike latencies reported here depend on the firing rates of the two units?

      To address this question we first tested whether firing rates (averaged across the putative up-stream and down-stream neurons) differ between hits and misses. If they do, this would be suggestive of a dependency of the spike latency differences between hits and misses on firing rates. No such difference was observed (p>0.3). Second, we correlated the differences between hits and misses in Co-firing peak latencies with the differences in firing rates. Again, no significant correlation was observed (R=-0.06; p>0.7), suggesting that firing rates had no influence on the observed differences in co-firing latencies. These control analyses are now reported in the main text.

      Manuscript lines 347-350: “No significant differences in firing rates between hits and misses were found (p>0.3), and on correlations between firing rates and the co-firing latencies were obtained (R=-0.06; p>0.7), suggesting that firing rates had no influence on the observed co-firing differences between hits and misses.”

      What do these results look like for other pairs that are not putative upstream/downstream pairs?

      As we reported in the original manuscript in lines 352-355 we did not find a memory dependent effect on co-firing latencies if we select neuron pairs solely on the basis of distal theta SFC. Within this analysis the distally theta coupled neuron would be the up-stream neuron and the neuron recorded at the site where the theta LFP is coupled would be the down-stream neuron. This null-result suggests that in order for the memory dependent difference in co-firing lags to emerge, the down-stream neurons need to be coupled to a local gamma rhythm in order for the memory effect on co-firing latencies to emerge. However, within this previous analysis there is still a notion of up-stream and down-stream neurons because neuron pairs were selected based on distal theta phase coupling. We therefore repeated this analysis for all pairs of neurons in a completely unconstrained fashion such that all possible pairs of neurons that were recorded from different electrodes were entered into the co-firing analysis. This analysis also revealed no difference in co-firing lags, neither for positive lags nor for negative lags. Instead, what this analysis showed is tendency for hits to show a higher occurrence of simultaneous or near simultaneous firing, which is in line with Hebbian learning. These results are now reported in Figure 6 – figure supplement 1.

      Manuscript lines 333-335: “In addition, a completely unconstrained co-firing analysis where all pairs possible pairings of units were considered also showed no systematic difference in co-firing lags between hits and misses (Figure 6 – figure supplement 1).”

      Reviewer #2 (Public Review):

      Roux et al. investigated the temporal relationship between spike field coherence (SFC) of locally and distally coupled units in the hippocampus of epilepsy patients to successful and unsuccessful memory encoding and retrieval. They show that SFC to faster theta and gamma oscillations accompany hits (successful memory encoding and retrieval) and that the timing of the SFC between local and distal units for hits comports well with synaptic plasticity rules. The task and data analyses appear to be rigorously done.

      Strengths: The manuscript extends previous work in the human medial temporal lobe which has shown that greater SFC accompanies improved memory strength. The cross-regional analyses are interesting and necessary to invoke plasticity mechanisms. They deploy a number of contemporary analyses to disentangle the question they are addressing. Furthermore, their analyses address limitations or confound that can arise from various sources like sample size, firing rates, and signal processing issues.

      Weaknesses:

      Methodological:

      The SFC coherence measures are dependent in part on extracting LFPs derived from the same or potentially other electrodes that are contaminated by spikes, as well as multiunit activity. In the methods, they cite a spike removal approach. Firstly, the incomplete removal or substitution of a signal with a signal that has a semblance to what might have been there if no spike was present can introduce broadband signal time-locked to the spike and create spurious SFC. Can the authors confirm that such an artifact is not present in their analyses? Secondly, how did they deal with the removal of the multiunit activity? It would be suspected that the removal of such activity in light of refractory period violation might be more difficult than well-isolated units, and introduce artifacts and broadband power, again which would spuriously elevate SFC. Conversely, the lack of removal of multiunit activity would seem to for a surety introduce significant broadband power. One way around this might be that since it is uncommon to have units on all 8 of the BF microwires, to exclude the microwire(s) with the units when extracting the LFP to avoid the need to perform spike removal.

      The reviewer raises a valid concern which we address as follows. Firstly, an artefact introduced into SFC by linear interpolation would be a problem for those local SFCs where the spike providing channel and the LFP providing channel are identical. Out of the 53 local spike-SFC couplings found for the gamma frequency range, only 11 (20.75%) were from pairs where the spikes and LFPs come from the identical microwire. It is unlikely that this minority of data would have driven the results. Furthermore, it is unlikely that the interpolation would introduce a frequency shift of SFC that is memory dependent, because the interpolation is more likely to cause a general increase in broadband SFC (as opposed to having a frequency band specific effect). However, to address this concern, we split the data based on whether the spike and LFP providing channels were the same or different. The results show that (i) the spectrogram of SFC is highly similar between the two datasets, with a prominent gamma peak present in both and no significant differences between the two; (ii) restricting the analysis to those data where the LFP and spike providing channels are different replicated the main finding of faster gamma peak frequencies for hits compared to misses.

      Secondly, we followed the reviewer’s suggestion and repeated the SFC analysis for ‘silent’ microwires, i.e. microwires where no single or multi-units were detected. This analysis replicated the same memory effects as observed in the analysis with all microwires. Specifically, we found an increase in the local gamma peak SFC frequency for hits compared to misses, as well as an increase in distal theta peak SFC frequency for hits compared to misses. These results are reported in the main manuscript and in Figure 2 – figure supplement 6 for gamma, and figure 3 – figure supplement 6 for theta.

      Manuscript lines 199-203: “To rule out concerns about possible artifacts introduced by spike interpolation we repeated the above analysis for spike-LFP pairs where the spike and LFP providing channels are the same or different, and for ‘silent’ LFP channels (i.e. channels were no SUA/MUA activity was detected (see Figure 2 – figure supplement 6).”

      Manuscript lines 253-255: “We also repeated the above analysis for spike-LFP pairs by only using ‘silent’ LFP channels (i.e. channels were no SUA/MUA activity was detected (see Figure 3 – figure supplement 6) to address possible concerns about artefacts introduced by spike interpolation.”

      In a number of analyses the spike train is convolved with a Gaussian in places with a window length of 250ms and in others 25ms. It is suspected that windows of varying lengths would induce "oscillations" of different frequencies, and would thus generate results biased towards the window length used. Can the authors justify their choices where these values are used, and/or provide some sensitivity analyses to show that the results are somewhat independent of the window length of the Gaussian used to convolve with the times series.

      The different choices in window length for the Gaussian convolution reflect the different needs of the two analyses where these convolutions were applied. In one analysis we wanted to get a smooth estimate of spike densities that we can average across trials, similar to a peri-stimulus spike histogram. For this analysis we used a window length of 250 ms which we found appropriate to yield a good balance between retaining smooth time courses whilst still being temporally sensitive. Importantly, for the statistical analysis of the firing rates, spike densities were averaged in much larger time windows than 250 ms (i.e. 1 – 2 seconds) therefore our choice of window length for spike densities would not have any bearing on the averaged firing rate analysis.

      In the other analysis, which is more central for our manuscript, we used a cross-correlation between spike trains to estimate co-firing lags in the range of milliseconds. Therefore, this analysis necessitated a much higher temporal precision. We used a Gaussian Window with a width of 25ms because it represents a good balance between integrating over a long-enough time window and thus allowing for some jitter in neural firing between pairs of neurons, whilst still being temporally specific. Finding the right balance here is not trivial because a too short time window would be prone to noise and underestimates co-firing, whereas a too long time window may not provide the temporal specificity necessary to detect co-firing lags (Cohen and Kohn, 2013; Nat Neurosci). To test whether this choice critically affected our results, we repeated the analysis for different window sizes, i.e. 15, 35, and 45 ms. The results show that the basic pattern of results did not change, with hits showing earlier peaks in co-firing compared to misses. Critically, the difference in co-firing peaks was significant for all window sizes, except for the shortest one which is likely due to the increase in noise because of the smaller time window over which spikes are integrated. These issues are now mentioned in the methods section, and the results for the different window sizes are reported in Figure 6 – figure supplement 4.

      Manuscript lines 346-347: “The co-firing analyses were replicated with different smoothing parameters (see Figure 6 – figure supplement 4).”

      Manuscript lines 894-898: “We chose this time window because it should represent a good balance between integrating over a long-enough time window and thus allowing for some jitter in neural firing between pairs of neurons, whilst still being temporally specific (57). To test whether this choice critically affected our results, we repeated the analysis for different window sizes, i.e. 15, 35, and 45 ms (see Figure 6 – figure supplement 4).”

      Conceptual:

      The co-firing analyses are very interesting and novel. In table S1 are listed locally and distally coupled neurons. There are some pairs for example where the distally coupled neuron is in EC and the downstream one in the hippo, and then there is a pair that is the opposite of this (dist: hippo, local EC). There appear to be a number of such "reversal", despite the delay between these two regions one would assume them to be similar in sign and magnitude given the units are in the same two regions. It seems surprising that in two identical regions of the hippo the flow of information or "causality", could be reversed, when/if one assumes information flows through the system from EC to hippo. This seems unusual and hard to reconcile given what is known about how information flows through the MTL system.

      The reviewer is correct that the spike co-firing analysis suggests a bi-directional flow of information between the hippocampus and surrounding MTL regions (e.g. entorhinal cortex; see Figure 6 – figure supplement 3). However, this bi-directional flow of information is not incompatible with neuroanatomy and the memory literature. The entorhinal cortex serves as an interface between the hippocampus and the neocortex with superficial layers providing input into the hippocampus (via the perforant pathway), and the deeper layers receiving output from the hippocampus (van Strien et al., 2009; Nat Rev Neurosci). Therefore, on a purely anatomical basis we can expect to see a bi-directional flow of information between the hippocampus and the entorhinal cortex, albeit in different layers. Importantly, reversals as shown in our Figure 6 – source data 1 involved different microwires and therefore different neurons (i.e. the entorhinal unit in row 1 was recorded from microwire 3, whereas the entorhinal unit in row 2 was recorded from microwire 8). It is conceivable that these two neurons correspond to different layers of the entorhinal cortex and therefore reflect input vs. output paths. Moreover, studies in humans demonstrated that successful encoding of memories depends not only on the input from the entorhinal cortex into the hippocampus, but also on the output of the hippocampal system into the entorhinal cortex, and indeed on the dynamic recurrent interaction between these input and output paths (Maass et al. 2014; Nat Comms; Koster et al., 2018; Neuron). Our bi-directional couplings between hippocampal and surrounding MTL regions (such as the EC) are in line with these findings. We have added a discussion of this issue in the discussion section.

      Manuscript lines 447-452: “Notably, the neural co-firing analysis indicates a bidirectional flow of information between the hippocampus and surrounding MTL areas, such as the entorhinal cortex (see Figure 6 – figure supplement 3; Figure 6 – source data 1). This result parallels other studies in humans showing that successful encoding of memories depends not only on the input from surrounding MTL areas into the hippocampus, but also on the output of the hippocampal system into those areas, and indeed on the dynamic recurrent interaction between these input and output paths (43, 44).”

    1. Author Response

      Reviewer #3 (Public Review):

      Canetta et al have characterized the developmental regulation of PV neurons in PFC. The experiments have been carefully conducted and even though this is an area of broad scientific interest, there are several issues that require consideration.

      1) The dosing regime of the CNO that has been employed will not provide persistent inhibition. Inhibition will operate on a 16 hr on/ 8 hr off cycle. Under such circumstances, it will be very difficult to rule out interspersed inhibition-related artifacts.

      Our approach of twice daily injections of CNO is consistent with that of other publications that have used similar chemogenetic approaches to chronically alter activity1-3. However, the reviewer is correct that our twice daily CNO injection protocol may only intermittently inhibit PV cells, and it is possible that persistent inhibition might result in even stronger behavioral and circuit effects. However, it is also possible that more continuous CNO administration could lead to hM4DGi desensitization. Given these caveats, we respectfully submit that repeating all the experiments under conditions that would allow constant chronic dispensation of CNO (such as implantation of minipumps) is an excellent future experiment but currently outside the scope of this manuscript.

      2) The second major issue with the dosing regime is that it is long (35 days). Realizing that the development of PFC circuitry is complex but at P90, the animals will have been dosed for more than a third of their lives. How can the authors rule out compensatory changes that do not have anything to do with critical periods?

      In future studies we hope to refine the timing of the developmental window mediating these long-term effects by comparing inhibiting during shorter developmental periods. Our current studies demonstrate that a 35-day window of inhibition during development, but not during adulthood, leads to long-lasting effects on behavior and prefrontal network function. If the developmental manipulation is more impactful because the manipulation represents a longer proportion of the animals’ lifetime, we would expect that the effect should wane as the animal gets older. However, for the rescue experiments that take place at P120 and P130, rather than P90, we still find that the Dev Inhibition animals are impaired, suggesting that it is not the proportion of the animals’ lifetime that has been inhibited, but the timing during which this inhibition occurs, that matters most.

      3) To this point, in the discussion first para line 8 - please change "transient" to something more suitable to reflect the duration of treatment.

      We have replaced transient with reversible.

    1. Author Response

      Reviewer #1 (Public Review):

      This is a well performed study to demonstrate the antiviral function and viral antagonism of the dynein activating adapter NINL. The results are clearly presented to support the conclusions.

      This reviewer has only one minor suggestion to improve the manuscript.

      Add a discussion (1) why the folds of reduction among VSV, SinV and CVB3 were different in the NINL KO cells and (2) why the folds of reduction of VSV in the NINL KO A549 and U-2 OS cells.

      Thank you for this suggestion. We have amended the results section to include additional information about these observations and possible explanations for these results.

      Reviewer #2 (Public Review):

      This manuscript is of interest to readers for host-viral co-evolution. This study has identified a novel human-virus interaction point NINL-viral 3C protease, where NINL is actively evolving upon the selection pressure against viral infect and viral 3Cpro cleavage. This study demonstrates that the viral 3Cpros-mediated cleavage of host NINL disrupts its adaptor function in dynein motor-mediated cargo transportation to the centrosome, and this disruption is both host- and virus-specific. In addition, this paper indicates the role of NINL in the IFN signaling pathway. Data shown in this manuscript support the major claims.

      In this paper, the authors have identified a novel host-viral interaction, where viral 3C proteases (3Cpro) cleave at specific sites on a host activating adaptor of dynein intracellular transportation machinery, ninein-like protein (NINL or NLP in short) and inhibit its role in the antiviral innate immune response.

      The authors firstly found that, unlike other activating adaptors of dynein intracellular transportation machinery, NINL (or NLP) is rapidly evolving. Thus, the authors hypothesized that this rapid evolution of NINL was caused by its interaction with viral infection. The authors found that viruses replicated higher in NINL knock-out (KO) cells than in wild-type (WT) cells and the replication level was not attenuated upon IFNa treatment in NINL KO cells, unlike in WT cells. Next, the authors investigated the role of NINL in type I IFN-mediated immune response and found that the induction of Janus kinase/signal transducer and activation of transcription (JAK/STAT) genes were attenuated in NINL KO cells upon IFNa treatment. The author further showed that the reduction of replication IFNa sensitive Vaccinia virus mutant upon IFNa treatment was decreased in NINL KO A549 cells compared to WT cells. The authors further showed that the virus antagonized NINL function by cleaving it with viral 3Cpro at its specific cleavage sites. NINL-peroxisome ligation-based cargo trafficking visualization assay showed that the redistribution of immobile membrane-bound peroxisome was disrupted by cleavage of NINL or viral infection.

      This paper has revealed a novel host-virus interaction, and an antiviral function of a rapidly evolving activating adaptor of dynein intracellular transportation machinery, NINL. The major conclusions of this paper are well supported by data, but several aspects can be improved.

      1) It would be necessary to include a couple of other pathways involved in innate immune response besides JAK/STAT pathway.

      We are very interested in this question as well. Our RNAseq data (Supplementary file 4 and Figure 3 – Figure supplement 4) suggest that there are several transcriptional changes that result from NINL KO. Our goal in this manuscript was to focus on IFN signaling in order to understand this specific effect of NINL KO since it might have wide-ranging consequences on viral replication. While we agree that broadening our studies to other signaling pathways, including other pathways involved in innate immune response, is a good idea, we feel that those experiments would take longer than two months to perform and therefore fall outside of the scope of this paper.

      2) The in-cell cleavages of NINL by viral 3Cpros were well demonstrated and supported by data of high quality. A direct biochemical demonstration of the cleavage is needed with purified proteins.

      We agree with the reviewer that a direct biochemical cleavage assay would further demonstrate that viral 3Cpros cleave NINL specifically. However, our attempts to purify full-length NINL have been unsuccessful due to solubility issues (see example gel below), which is not surprising given that NINL is a >150 kDa human protein that has multiple surfaces that bind to other human proteins. As such, we focused our efforts on in-cell cleavage assays using specificity controls for cleavage. Specifically, we used catalytically inactive CVB3 3Cpro to show a dependence on protease catalytic activity and a variety of NINL constructs in which the glutamine in the P1 position is replaced by an arginine to show site specificity of cleavage. Notably, the cleavage sites in NINL that we mapped using this mutagenesis were predicted bioinformatically from known sites of 3Cpro cleavage in viral polyproteins, further indicating that cleavage is 3Cpro-dependent. We believe these results thus demonstrate that cleavage of NINL is dependent on viral protease activity and occurs in a sequence-specific manner. In light of the difficulty of purifying full-length NINL that would make biochemical experiments very challenging and likely take longer than two months to perform, we believe that our in cell data should be sufficient to demonstrate activity-dependent site-specific cleavage of NINL by viral 3Cpros.

      Sypro stained SDS-PAGE gel showing supernatant (S) and insoluble pellet (P) fractions across multiple purifications with altered buffer conditions.

      3) The author used different cell types in different assays. Explain the rationale with a sentence for each assay.

      Throughout this work, we choose to use a variety of cell lines for specific purposes. A549 cells were chosen as our main cell line as they are widely used in virology, are susceptible to the viruses we used, are responsive to interferon, and express both NINL and our control NIN at moderate levels. In the case of our virology and ISG expression data, we performed the same experiments with NINL KOs in other cell lines confirm that the phenotypes we observed in A549 cells could be attributed to the absence of NINL rather than off-target CRISPR perturbations or cell-line specific effects. All cleavage experiments were performed in HEK293T for their ease of transfection and protein expression. The inducible peroxisome trafficking assays were performed in U-2 OS cells as their morphology is ideal for observing the spatial organization of peroxisomes via confocal microscopy, and based on the fact that we had recapitulated the virology results and ISG expression results in those cells. At the suggestion of the reviewer, we have amended the text to include rationales where appropriate.

      4) While cell-based assays well support the conclusions in this paper, further demonstration in vivo would be helpful to provide an implication on the pathogenicity impact of NINL.

      We agree. However, we believe that examining the impact of the loss of or antagonism of NINL on the pathogenesis of infectious diseases in an in vivo model is outside the scope of this study.

      In summary, this manuscript contributes to a novel antiviral target. In addition, it is important to understand the host-virus co-evolution. The use of the evolution signatures to identify the "conflict point" between host and virus is novel.

    1. Author Response

      Reviewer #3 (Public Review):

      This paper is based on digital reconstruction of a serial EM stack of a larva of the annelid Platynereis and presents a complete 3D map of all desmosomes between somatic muscle cells and their attachment partners, including muscle cells, glia, ciliary band cells, epidermal cells and specialized epidermal cells that anchor cuticular chaetae (chaetal follicle cells) and aciculae (acicular follicle cells). The rationale is that the spatial patterning of desmosomes determines the direction of forces exerted by muscular contraction on the body wall and its appendages will determine movement of these structures, which in turn results in propulsion of the body as part of specific behaviors.

      To go a step further, if connecting this desmosome connectome with the (previously published) synaptic connectome, one may gain insight into how a specific spatio-temporal pattern of motor neuron activity will lead, via a resulting pattern of forces caused by muscles, to a specific behavior. In the authors' words: "By combining desmosomal and synaptic connectomes we can infer the impact of motoneuron activation on tissue movements".This is an interesting idea which has the potential to make progress towards understanding in a "holistic" way how a complex neural circuitry controls an equally complex behavior. The analysis of the EM data appears solid; the authors can show convincingly that desmosomes can be resolved in their EM dataset; and the technology used to plot and analyze the data is clearly up to the task. My main concern is with the way in which the desmosome pattern is entered in the analysis, which I think makes it very difficult to extract enough relevant information from the analysis that would reach the stated goal.

      1) The context of how different structures of the Platynereis larval body, by changing their position, move the body needs much more introduction than the short paragraph given at the end of the Introduction.

      -My understanding is that the larval body is segmented, and contraction of the segments can cause a certain type crawling or swimming: does it? Do the longitudinal muscles, for example, insert at segment boundaries, and alternating contraction left-right cause some sort of "wiggling" or peristalsis?

      Longitudinal muscles do not insert only at segment boundaries, but have desmosomal connections along the entire length of the cell. Individual longitudinal muscle cells can span up to 3 segments. However the cells are staggered in such a way that all longitudinal muscle cells with somas in one segment can collectively cover up to 4 segments. Longitudinal muscles are involved in turning when swimming (Randel et al., 2014). The undulatory trunk movements and parapodial walking movements are due to the contraction of oblique and parapodial muscles. The longitudinal muscles provide support during crawling (via desmosomal links) but it is unlikely that these muscles contract segmentally. Disentangling the distinct contributions of 53 types of muscles during crawling will require further studies.

      -In addition, there are segmental processes (parapodia, neuropodia), and embedded in them are long chitinous hairs (Chaetae, Acicula). Do certain types of the muscles described in the study insert at the base of the parapodia/neuropodia (coming from different angles), such that contraction would move the entire process, including the chaetae/acicula embedded in their tips?

      Yes, acicular muscles insert at the proximal base of the acicula, and by moving the acicula they move the entire noto-/neuropodia. We have presented the anatomy of all acicular and chaetal muscles types in the figures and videos.

      -Or is it that only these chaetae/acicula move, by means of muscles inserting at their base (the latter is clearly part of the story)? Or does both happen at the same time: parapodium moves relative to the trunk, and chaeta/acicula moves relative to the parapodium? How would these movements lead to different kind of behaviors?

      -Diagrams should be provided that shed light on these issues.

      We have extended Video2 to show individual muscles and their relation to the aciculae in one of the parapodia. We also clarified this in the text:

      “Several acicular muscles attach on one end to the proximal base of the aciculae and on the other end to the paratrochs and epidermal cells. Oblique muscles attach to the basal lamina, epidermal and midline cells at their proximal end, run along the anterior edge of parapodia and attach to epidermal and chaetal follicle cells at their distal tips. Both of these muscle groups are involved in moving the entire parapodium. Acicular muscles move the proximal tips of the aciculae, while oblique muscles move the parapodium by moving the tissue around the chaetae and the aciculae. All acicular movements also correspond to parapodial movements. Chaetae are embedded in the parapodium and therefore move with it, but the chaetal sac muscles can also independently retract the chaetae into the parapodium or protract them and make them fan out.”

      2) The main problem I have with the analysis is the way a muscle cell is treated, namely as a "one dimensional" node, rather than a vector.

      -In the current state of the analysis, the authors have mapped all desmosomes of a given muscle cell to its attached "target" cell. But how is that helpful? The principal way a muscle cell acts is by contracting, thereby pulling the cells it attaches to at its two end closer together. As the authors state (p.4) "...desmosomes..are enriched at the ends of muscle cells indicating that these adhesive structures transmit force upon muscle-cell contraction."

      At the level of the current analysis our data reveal which cells may be moved by the contractions of the individual muscle cells. The reviewer is right that treating a muscle as a vector (or set of vectors) would be a more accurate description, which would potentially also open up the possibility of computational modelling. We have provided such a vectorised dataset in the revised version, where each muscle-cell skeleton is subdivided into short linear segments (Figure2–source–data 2). This dataset may be useful to approach the problem with a three dimensional approach, which is beyond the scope of the current analysis. We also included an additional video (Video 7) showing examples of muscles and their partners where the cells and the desmosomes connecting them are highlighted. This reveals that the desmosomes connecting two cells are often at the very end of the muscle cell.

      -for that reason, the desmosomes at the muscle tips have to be treated as (2) special sets. Aside from these tip desmosomes there are other desmosomes (inbetween muscles, for example), but they (I would presume) have a very different function; maybe to coordinate muscle fiber contraction? Augment the force caused by contraction?

      Desmosomes between muscles only occur between muscles of different types, not for homotypic connections. There are other types of junctions (adhaerens-like junctions) that connect individual cells of a muscle bundle together (not analysed here). We clarified this in the text.

      • As far as I understand for (all of) the desmosome connectome plots, there is no differentiation made between desmosome subsets located at different positions within the muscle fiber. I therefore don't see how the plots are helpful to shed light on how the multiplicity of muscles represented in the graphs cause specific types of neurons.

      We would like to point out that the cells and structures that muscles connect to via desmosomes are very likely the parts of the body that will move during the contraction of the muscle or will provide structural support (e.g. basal lamina) for the muscle cell to contract. This is most evident in the parapodial complex. The majority of muscles in the body connect to the aciuclar folliclecells and the aciculae are the most actively moving parts in the body during crawling (see Video 4). In any case, since we provide all skeleton reconstructions and the xyz coordinates of all desmosomes, the data could be further analysed following these suggestions by the reviewer.

      • As it stands these plots "merely" help to classify muscles, based on their position and what cell type they target: but that (certainly useful) map could have probably also be achieved by light microscopic analysis.

      This has never been achieved by light microscopy analysis in the hundreds of papers on invertebrate muscle anatomy (e.g. by phalloidin staining). For an LM analysis, it would not be sufficient to label the muscle fibres, but one would also need to label the desmosomes and a multitude of non-muscle cell types including the extent of their cytoplasm. This is technically very challenging (we would nevertheless be happy to hear specific suggestions for markers etc. from the Reviewer). Currently, only EM provides the required depth of structural information and resolution. This is why we believe that our dataset and analysis is unique, despite over a century of research in invertebrate anatomy.

      3) Section "Local connectivity and modular structure of the desmosomal connectome" p.4-7" undertakes an analysis of the structure of the desmosome network, comparing it with other networks.

      -What is the rationale here? How do the conclusions help to understand how the spatial pattern of muscles and their contraction move the body?

      We hope that our analysis may also be of interest to the community of network scientists and we believe that the reconstruction of a quite large and novel type of biological network warrants a more quantitative network analysis, using the standard methods and measures of network science – as we presented e.g. in Figure 4 – even if these mathematical analyses may not directly reveal how muscles move the body. We hope that some readers with an interest in quantitative analyses will also appreciate the broader picture here.

      -Isn't, on the one hand (given that position of the desmosome was apparently not considered), the finding that desmosome networks stand out (from random networks) by their high level of connectivity ("with all cells only connecting to cells in their immediate neighbourhood forming local cliques") completely expected?

      We disagree that the result was completely expected. Even if this was the case, we think it is quite different to say that a result is expected or to thoroughly quantify certain parameters and mathematically characterise key properties of the desmosomal graph (as we have done). These network analyses help to conceptualise our findings and to think about the muscle system in more global, whole-body terms.

      -On the other hand, does this reflect the reality, given that (many?) muscle cells are quite long, connecting for example the anterior border of a segment with the posterior border.

      Indeed, a quantitative analysis helped us to identify cases where the reality deviated somewhat from what was completely expected, and we thank the reviewer for these comments. As we explain in the revised version, some longitudinal muscles show an unexpected position in the force-field layout of the graph, due to their long-range connections. We have added extra clarifications to the text: “To analyse how closely the force-field-based layout of the desmosomal connectome reflects anatomy, we coloured the nodes in the graph based on body regions (Figure 5). In the force-field layout, nodes are segregated by body side and body segment. Exceptions include the dorsolateral longitudinal muscles (MUSlongD) in segment-0. These cells connect to dorsal epidermal cells that also form desmosomes with segment-1 and segment-2 MUSlongD cells. These connections pull the MUSlongD_sg0 cells down to segment-2 in the force-field layout (Figure 5D).”

      1. In the section "Acicular movements and the unit muscle contractions that drive them" the authors record movement of the acicula and correlate it with activity (Ca imaging) of specific muscle types. This study gives insightful data, and could be extended to all movements of the larva.

      -The fact that a certain muscle is active when the acicula moves in a certain direction can be explained (in part) by the "connectivity": as shown in Fig.7L, the muscle inserts at a acicular follicle cell on the one side, and to an epithelial (epidermal?) cell and the basal lamina on the other side. But how meaningful is a description at this "cell type level" of resolution? The direction of acicula deflection depends on where (relative to the acicula base) the epithelial cell (or point in the basal lamina) is located. This information is not given in the part of the connectome network shown in Fig.7L, or any of the other graphs.

      This information is indeed not shown in the graphs, where each cell is treated as a node. However, we provide this information in the detailed anatomical figures in Figure 6 – figure supplement 1-3 and Video 7, where the individual acicular and oblique muscle types are visualised. In principle, one could subdivide aciculae into e.g. proximal and distal halves and derive a more detailed network. We have not done this but since all the EM, anatomical rendering and connectivity data are available in our public CATMAID server (https://catmaid.jekelylab.ex.ac.uk/), we hope that the interested readers will be able to further analyse the data.

      We renamed ‘epithelial’ cells to ‘epidermal’ cells.

    1. Author Response

      Reviewer #1 (Public Review):

      Using a large neonatal dataset from the developmental Human Connectome project, Li and colleagues find that cortical morphological measurements including cortical thickness are affected by postnatal experience whereas cortical myelination and overall functional connectivity of ventral cortex developed significantly were not influenced by postnatal time. The authors suggest that early postnatal experience and time spent inside the womb differentially shape the structural and functional development of the visual cortex.

      The use of large data set is a major strength of this study, furthermore an attempt to examine both structural and functional measures, and connectivity analysis and separating these analyses based on the pre-and full-term infants is impressive and strengthens the claims made in the paper. While I find this work theoretically well-motivated and the use of the large dHCP dataset very exciting, there are some concerns, that need to be addressed.

      There is a bit of confusion if the authors really compared the structural-functional measures in the final analysis. If the authors wish to make claims about the relationship, then there must be a compelling analysis detailing these findings.

      Thanks for the suggestions. We have added analysis to directly investigate the relationship between the development of homotopic connection and corresponding structural measurements in the area V1 (Page 13 Line 5-16):

      “The above results revealed that structural and functional properties of the ventral visual cortex both developed with PMA, but were differently influenced by the in-utero and external environment (Table 1). We further investigated the relationship between structural and functional development based on area V1, which showed a strong developmental effect in both structural and functional analyses. Mediation analysis was employed to see whether the development (GA or PT) of the homotopic connection between bilateral V1 was mediated by the structural properties (CT or CM). We found that the PT had a significant direct effect on the homotopic function that was not mediated by CT or CM (Fig 6a-b). In contrast, the direct effect of GA on the homotopic connection was not significant but the indirect effect of GA through CM on the connection was significant (Fig 6c-d).”

      There is also a bit of confusion in the terminology used in the study regarding ages; the gestational age, premenstrual age, and postnatal time. I think clarifying and simplifying it down to GA and postnatal time will help the reader and avoid confusion.

      Thank you for the suggestion. We have made extensive revision regarding the terminology throughout the paper and simplified it down to GA and PT. Please see the response to the 1st major concern in the Essential Revisions (for the authors) section above.

      *Reviewer #2 (Public Review):

      The authors utilize the publicly available dHCP dataset to ask an interesting question: how does postnatal experience and prenatal maturation influence the development of the visual system. The authors report that experience and prenatal maturation differentially contribute to different aspects of development. Namely, the authors quantify cortical thickness, myelination, and lateral symmetry of function as three different metrics of development. The homotopy and preterm infant analyses are strengths that, on their own, could have justified reporting. However, I have concerns about the analytic approaches that were used and the conclusions that were drawn. Below I list my major concerns with the manuscript.

      PMA vs. GA vs. PT

      The authors seek to understand the contribution of experience and prenatal development, yet I am unsure why the authors focused on the variables they did. There are three variables of interest used throughout this study: Gestational age at birth (GA), postnatal time (PT), and postmenstrual age at the time of scan (PMA). The last metric, PMA, is straightforwardly related to GA and PT since PMA = GA + PT. In most (but not all) of the manuscript, the authors use PMA and PT, with GA used without justification in some cases but not in others.

      It is unclear why PMA is used at all: PMA is necessarily related to PT and GA, making these variables non-independent. Indeed, the authors show that PMA and PT are highly correlated. The authors even say that "the contribution of postnatal experience to the development was not clarified because PMA reflects both prenatal endogenous effect and postnatal experience." So, why not use GA at birth instead of PMA? Clearly, GA is appropriate in some cases (e.g., Figure S4 or in some of the ANOVA applications), and to me, it seems to isolate the effect the authors care about (i.e., duration of prenatal development). Perhaps there is some theoretical justification for using PMA, but if so, I am unaware.

      That said, I expect that replacing all analyses involving PMA with GA will substantially change the results. I do not see this as a bad thing as I think it will make the conclusions stronger. As is, I am left unsure about what the key takeaways of this paper are.

      We appreciate the suggestions, and we have replaced the related analyses involving PMA with GA in the manuscript. Please see the Response to the 1st major concern in the Essential Revisions (for the authors) section above for more detail.

      Using GA instead of PMA will have several benefits: 1) It will be much simpler to think of these two variables since they contrast the duration of fetal maturation and time postnatally. 2) This will help the partial correlation analyses performed since the variance between the variables is more independent. It will also mean that the negative relationships observed between PT and cortical thickness when controlling for PMA (e.g., Figure 2h) might disappear (reversed signs for partial correlations are common when two covariates are correlated). 3) this will allow the authors to replace Figure 1a with a more informative plot. Namely, they could use a scatter of GA and PT, giving insight into the descriptive statistics of both dimensions.

      We have revised the manuscript throughoutly following the reviewer’s suggestion. However, we thought it would be necessary to show the overall development of CT and CM across the general age (PMA) in Figure 1. Therefore, we didn’t replace the figure 1a but added a scatter figure between GA and PT in Figure 2-figure supplement 1 and added descriptive statistics of them in the manuscripts: “The mean GA of the neonates was 39.93 weeks (SD = 1.26) and the mean PT was 1.21 weeks (SD = 1.25), the correlation between them was not significant (r = - 0.08, p > 0.1; Figure 2-figure supplement 1).” Moreover, the negative relationships between PT and CT when controlling for PMA disappeared in the revised results as the reviewer’s predicted.

      I suspect that one motivation for the use of PMA over GA is for the analysis in Figure 6. In this analysis, the authors pick a group of term infants with a PMA equal to the preterm infants. Since PMA is the same, the only difference between the groups (according to the authors) is the amount of postnatal experience. However, this is not the only difference between the groups since they also vary in GA (and now PT and GA are negatively correlated almost perfectly). I don't know how to interpret this analysis since both the amount of prenatal maturation and postnatal experience vary between the groups.

      We appreciate the reviewer’s opinion that both GA and PT were different between preterm and term-born neonates. Then any of the differences between the two groups might came from the combined effect of GA and PT in our results, and unfortunately, we might not able to separate them in this analysis. However, the preceding results indicated that the CT was significantly influenced by PT and GA while CM was significantly influenced by GA, which So we discuss the preterm and term-born comparison in the context of these findings (Page 19 Line 26-29 and Page 20 Line 1-5): “We found CT in the ventral cortex was generally lower in the term-born than preterm-born infants, while the CM showed the opposite trend in the two groups. Since the preterm babies have longer PT but shorter GA compared to full-term infants at the same PMA, this result supported the above analysis that CT was preferably influenced by PT while CM was largely dependent on GA during the neonatal period”. Furthermore, we added a description in the limitation section to stress the caveat (Page 20 Line17-19): “Meantime, both GA and PT were different between preterm and term-born neonates. Then any of the differences between the two groups might came from the combined effect of GA and PT, and unfortunately, we were not able to separate them in this study.”

      Justification of conclusions and statistical considerations

      I had concerns about some of the statistical tests and conclusions that the authors made. I refer to some of these in other sections (e.g., the homotopy analyses), but I raise several here.

      I am not sure what evidence the authors are using to make this claim: "we found that the cortical myelination and overall functional connectivity of ventral cortex developed significantly with the PMA but was not directly influenced by postnatal time." Postnatal time is significantly correlated with cortical myelination, as shown in Figures 2g, 2h, 3b, 3c, and postnatal time is significantly correlated with functional connectivity, as shown in Figures 4h, 5c, 5d, and 5e. Hence, this general claim that "the development of CT was considerably modulated by the postnatal experience while the CM was heavily influenced by prenatal duration" doesn't seem to be supported: both myelination and thickness are affected by postnatal experience and prenatal duration (as measured by PMA). A similar sentiment is expressed in the abstract. Perhaps the authors suggest different patterns in the strength of change for PMA vs. PT across these metrics, but if so, then statistical tests need to support that conclusion, and the claims need to reflect that sentiment.

      Interestingly, Figure S4 presents a compelling ANOVA that does support this conclusion. Still, this result is relegated to the supplement, and it also uses GA, rather than PMA, making it hard to reconcile with the other claims made in the main text. Moreover, it uses ANOVAs, which dichotomizes a continuous variable. Here and elsewhere in the manuscript (e.g., Figures 3d, 3e), the authors split the infants into quartiles and compare them with ANOVAs. Their use for visualization is helpful, but it is unclear what the statistical motivation for this is rather than treating these as continuous variables like is possible with linear mixed-effects models. Moreover, it is unclear why the authors excluded half the data from the study (i.e., quartiles 2 and 3) in this ANOVA when all four quartiles could be used as factors.

      We appreciate the reviewer’s comments. We have clarified our results and conclusion in the revised manuscript based on the new analyses that replaced PMA with PT and GA (See the response to the 1st major concern in the Essential Revisions). The previous claims have been changed as following:” the postnatal time could modulate the cortical thickness in ventral visual cortex and the functional circuit between bilateral primary visual cortices. But the cortical myelination, particularly that of the high-order visual cortex, developed without significant influence of postnatal time in such early period” (Page 2, Lines 8-12). This claims could be supported by the results in figure 2. Moreover, to support the claims about the comparison of the influence between GA and PT on structural development, we replaced the ANOVA analysis with a linear mixed-effect model as the reviewer mentioned.

      1) To compare the influence of GA against PT on the structural development in the whole ventral visual cortex (Page 7 Line 15-19), “We applied a linear mixed-effect model to test whether the CT (or CM) of the whole ventral cortex were differently influenced by the GA vs. PT, and found that the GA had a significantly stronger effect on the CM than PT (interaction between GA and PT, p < 0.05) but no significant difference was found of the effect on the CT between the ages (p > 0.6).”

      2) To compare the influence of GA against PT on the structural development in the area V1 and VOTC, we applied a similar linear mixed-effect model analysis for the two ROIs (Page 8 Line 17-18 and Page 9 Line 1-4): “Moreover, we applied a linear mixed-effect model to test the developmental influence of GA vs. PT on the cortical structure , and the results showed that the CT in two ROIs showed non-significantly different influences from GA against PT (p > 0.3), but CM showed at least marginally significant results in both two ROIs (V1: p < 0.01 and VOTC: p < 0.09).”

      It is unclear what the evidence is to support the following claim: "Both CT and CM show higher correlation with PMA in the posterior than anterior region, and higher correlation in the medial than lateral part within the anatomical mask (Figure 2a and Figure S2b-c [sic])" From Figure 2 or Figure S2, I don't see a gradient. From Figure S3, there might be a trend in some plots, but it is hard to interpret since it is non-monotonic. More generally, is there a statistical test to support this claim?

      We added a correlation analysis between the diction (x: lateral to medial; y: posterior to anterior) and measurements (CT and CM) in the ventral visual cortex, and the resulting coefficient was all significant (r = 0.7/-0.8 for CT along x/y axis, and r = 0.91/-0.83 for CM along x/y axis; p < 0.001). See Figure 1-figure supplement 2. However, the consideration provided by the reviewer still exists that such significance was driven by part of the areas and the gradient was non-monotonic. Therefore, we replaced the original claim with the following sentence (Page 6 Line 3-8): “In addition, we found distinct spatial variation along ventral cortex, e.g. posterior-anterior and medial-lateral directions (Figure 1-figure supplement 2a-b). Generally, both CT and CM showed higher correlation with PMA in the posterior than anterior region (r = -0.8 and -0.83; p < 0.001), and higher correlation in the medial than lateral part within the ventral visual cortex (r = 0.7 and 0.91; p < 0.001; Figure 1-figure supplement 2c-d).”.

      "and the interaction [sic] was more prominent in CM (simple effect: t = 10.98, p < 10-9) that in than CT (t = 2.07, p < 0.05)." Does 'more prominent' mean it is 'significantly stronger'? If not, then the authors should adjust this claim

      The claim ‘more prominent’ did express ‘significantly stronger’ since we found that the interaction between CM and CT along PMA or PT was significant in the ANOVA analysis. This analysis has been removed because we thought that the comparison between two structural measurements is not very relevant to the conclusion of the paper. We now applied a linear mixed-effect model to compare the influence of GA against PT on specific structural development. So this result and claim have been removed from the new manuscript.

      Are the authors Fisher Z transforming their correlations? In numerous places, correlation values seem to be added together or used as the input to other correlation analyses. It is unclear from the methods whether the authors are transforming their correlation values to make that use appropriate.

      We are sorry for the confusion. All the statistical analyses involving correlation coefficients were Fisher-Z transformed. We have added a clear description in the manuscripts involving the Fisher-Z transformation (Page 25 Line 16-18).

      Homotopy analyses

      The homotopy section is a strength of the paper, but I have doubts about the approach taken to analyze this data and some of the conclusions drawn. I don't expect any of my suggestions to change the takeaway of this section, but I do think they are essential criticisms to address.

      I do not think that the non-homotopic control condition is appropriate. In Arcaro & Livingstone (2017), the authors had 3 categories for this analysis: homotopic pairs (e.g., left V1 vs. right V1), adjacent pairs (e.g., left V1 vs. right V2), and distal pairs (e.g., left V1 vs. right PHA1). In the homotopy analysis performed by Li and colleagues, they compare homotopic pairs with all other pairs. I don't think that is generous to the test since non-homotopic pairs include adjacent pairs that should be similar and distal pairs that shouldn't be similar. This may explain why some non-homotopic distribution overlaps with the homotopic distribution in Figure 4c.

      Thanks for these suggestions. In the revised manuscript, we reanalyzed the data by dividing the connections into three groups for each subject. See Page 26 Line 24-29: “For each subject, Pearson correlations were carried out on the ROI-averaged time series within and across the left and right ventral cortex. The resulting connections were divided into three groups, namely the homotopic connection (the connection between two paired areas in two hemispheres. e.g. right and left V1), adjacent connection (e.g., right V1 and left V2 since V1 and V2 are adjacent) and distant connections (two areas that were not the paired or adjacent)”.

      Regardless of this decision, I think the authors should reconsider their statistical test. I think the authors are using a between samples t-test to compare the 34 homotopic pairs with the hundreds of non-homotopic pairs. This is statistically inappropriate since the items are not independent (i.e., left V1 vs. right V1 is not independent of left V1 vs. right V2, which is also not independent of left V3 vs. right V2). This means the actual degrees of freedom are much lower than what is used. Moreover, I am unsure how the authors do this analysis across participants since this test can be done within participants. The authors should clarify what they did for this analysis and justify its appropriateness.

      Thank you for the suggestion. In the previous manuscript, we first averaged the connection matrix across subjects and then calculated the homotopic (or non-homotopic) connections between areas, and therefore, statistical analysis could not be performed. In the revised paper, we calculated the three groups of connections for each subject before the average. We applied a non-parameter statistical analysis (Wilcoxon signed-rank) to address the issue of the independent comparison among the connections, and found the homotopic connections were significantly stronger than the adjacent or distant connections.

      See (Page 26 Line 29 and Page 27 Line 1-3): “Independent-sample T-test was used to test whether the homotopic correlation was significantly greater than zero across subjects. To compare the correlation among the three types of connections, we applied a non-parameter statistical analysis (Wilcoxon signed-rank) across subjects”.

      The results showed that (Page 9 Line 17-21) “the homotopic connections in all ROIs of ventral cortex were significant (mean r = 0.13– 0.43, t > 12.87, s < 10-9; Fig 4a-b), and were significantly higher than adjacent connections (0.29 ± 0.12 vs. 0.19 ± 0.10, Wilcoxon signed rank test on the Fisher-Z transformed r value: z = 16.32, p < 10-9) and distal connections (0.04 ± 0.06, z = 16.32, p < 10-9; Fig. 4c)”.

      Could the authors speculate on why the correlations in homotopic regions are so much lower than what Arcaro and Livingstone (2017) found. I can think of a few possibilities: higher motion in infants, less rfMRI data per participant, different sleep/wake states, and different parcellation strategies. Regarding the last explanation, I think this is a real possibility: the bilateral correlation may be reduced if the Glasser atlas combines functionally heterogeneous patches of the cortex. Hence, the authors should consider this and other possible explanations.

      Thank you for the suggestion. The neonates included in this study were all under natural sleep during the scan, so sleep/wake states would not be one of the causes. We added some possible reasons for this difference following the related results (Page 19 Line 9-13): “However, the present homotopic connections in the human neonates were lower than those in neonate macaca mulattas (Arcaro and Livingstone, 2017). This difference might relate to the higher motion in human infants, less r-fMRI data in the present study, coarser parcellation in the visual cortex used in this work, and the developmental difference between primates and humans in the neonatal period.”

      The authors assume that the homotopic analyses mean that there are lateral connections between hemispheres (e.g., "Furthermore, the connections among the ventral visual cortex have developed during this early stage. Specifically, the homotopic connections between bilateral V1 and between bilateral VOTC both increased with GA, indicating an increased degree of functional distinction"). While this might be true, it doesn't need to be. Functional connectivity can be observed between regions that lack anatomical connectivity. Instead, two regions could both be driven by another region. In this case, the thalamus might drive symmetrical activity in the visual cortex.

      We agree with the reviewer’s view that the development of functional connectivity might be driven by other regions like thalamus. So we added this interpretation in the discussion section (Page 19 Line 23-25): “It is worth noting that the increased homotopic connection can be direct or indirect, e.g., the effect might be driven external regions with enhanced connection to both of the areas (e.g. thalamus)”.

      Miscellaneous

      I am not sure what the motivation of this line is: "Moreover, those studies did not fully control the visual experience in the first few weeks of the subjects, thus cannot give a clear conclusion whether the innate functional connectivity is unrelated to postnatal visual experience." Arcaro, Schade, Vincent, Ponce, & Livingstone (2017) did control the visual experience of subjects. Moreover, the research here doesn't control infant experience in the way this sentence implies: it implies an experiment manipulation (i.e., fully control) rather than a statistical control that is done here. Consider rephrasing

      We have rephrased this sentence in the introduction section (Page 5 Line 2-5): “Moreover, the human infants participating in a previous study (Kamps et al., 2020) were around one month old (mean age: 27 d; range from 6 to 57 d), who might already acquire some visual experience, and thus this study could not exclude postnatal visual experience on the innate functional connectivity”.

      I am not sure why this claim is made: "Area V1 was selected because this region is the most basic region for visual processing and probably is the most experience-dependent area during early development". Is there evidence supporting this claim? Plasticity is found throughout the visual cortex, and I think which region is most plastic depends on the definition of plasticity. For instance, most people have the same tuning properties to gabor gratings (e.g., a cardinality bias), but there is enormous variability in face tuning across cultures.

      We have removed this claim in the manuscript.

      The abstract says 783 infants were included in this study, but far fewer are actually used. The authors should report the 407 number in the abstract if any number at all.

      We have revised the number accordingly.

      Any comparisons of preterms and terms ought to be given the caveat that the preterm environment can be very different than the term environment: whereas a term infant goes home and sees friends and family without restriction, the preterm environment can be heavily regulated if they are in a NICU. Authors should either provide details about the environments of the preterms in their study, or they should consider how differences in the richness of visual experience - regardless of quantity - may affect visual development.

      We agree with the reviewer’s concern, and added a paragraph in the limitation section to stress the caveat (Page 20 Line 12-16): “One limitation of this study is the comparison between preterm and term-born infants did not consider the different visual experience in these infants. The preterm-born neonates may experience very different environment than those of the term-born, e.g. the preterm environment can be heavily regulated if they were in a NICU, but we didn’t have detailed information about the postnatal environment to control for it.”

      Reviewer #3 (Public Review):

      The authors use a large neonatal dataset to examine how development may occur differently based on whether on not the neonate spent that time in gestation or out of the womb accruing potentially accruing visual experience. In this manner, the authors hope to tease apart those aspects of development that are biologically programmed versus those that occur in response to experience within the visual cortex. They show structurally that cortical thickness is affected by postnatal experience while cortical myelination is not, and functionally they find regional differentiation present between visual areas at birth and that their connectivity changes with development and postnatal experience. The conclusions seem well supported by the data and analyses and provide some insight into which aspects of brain structure at birth are sculpted more by postnatal experience and which are more determined by endogenous developmental timelines.

      The analyses are based on a large sample of infants, and the authors were careful to statistically separate which aspects of an infant's age, gestational or postnatal, are driving brain development, providing a deeper picture of infant brain development than previous publications. Overall, the findings seem well supported by the data as the analyses are relatively straightforward.

      Visualization of the data and findings could be improved, as a few figures are difficult to interpret without having to read the methods.

      We have extensively revised the figures in the manuscript to improve the readability. See updated Figures 2-7.

      The acronyms regarding gestation, postnatal, and post-menstrual time are a little distracting. Please consider explicitly writing "gestational time" etc when referring to these numbers to improve readability.

      We have replaced the analyses involving PMA with gestational age (GA) or postnatal time (PT) in the revised manuscript to simplify the terminology. Please see the Response to the 1st major concern in the Essential Revisions (for the authors) section above. We believe this change makes the paper easier to follow even with the abbreviations.

      Because the cortical ribbon of infants is so thin at birth, there seems to be a possibility that partial-volume effects could be more prevalent in less-developed infants and impact myelin metrics. If not modeled or estimated, it should at least be discussed.

      In fact, the cortical thickness of the neonatal brain is not thinner than that of the adult. Particularly, the average cortical thickness of infants aged 0-5 months is around 2-2.5 mm (Wang et al., 2019), which is similar to adults (Fjell et al., 2015). Therefore, the partial-volume effect for cortical gray matter is not a special concern for infants.

      Nevertheless, we agree that the partial-volume effects might have different influences on infants of different ages. We added this consideration in the limitation section (Page 20 Line 20-24). “Another concern was about the partial-volume effect on the cortical measurements. The changing thickness of cortical ribbon during development may changes the degree of partial-volume effect, and thus may affect the cortical myelination measurement and may contribute to the myelination difference observed between preterm and term-born groups.”

      Structural and functional development could be more formally compared using quantitative models if the authors want those points more strongly related; the two are only qualitatively discussed at present.

      We have added a formal analysis to investigate the relationship between structural and functional development. Please see the Response to the 1st concern of Reviewer 1 (public review).

    1. Author Response

      Reviewer #1 (Public Review):

      The previous study as the authors stated showed a weaker expression of DMP1 in skeletal muscle. The authors provide a clear justification that sarcopenia-like phenotype was unlikely caused by DMP1-cre expression in muscle cells given there is no change of muscle cell numbers. It would be helpful to provide some quantification data of muscle cells to further preclude this possibility.

      To define how osteocyte partial ablation was achieved, we performed the quantification of empty lacunae ratio of DTAhet mice at 13 weeks. About 80% empty lacunae was observed in DTAhet mice at 13 weeks which increased about 20% compared to 4 weeks (Line 127-131, Figure 1 – figure supplement 1B), indicating diphtheria toxin (DT) has an accumulative effect with age in DTAhet mice. We speculated that when DT accumulated to a threshold, osteocytes were ablated.

      The underlying molecular mechanism is not shown in the current study, but it might be worthwhile to provide some more-depth discussions and hypotheses concerning how osteocytes could influence cell lineage commitment in bone marrow.

      We thank the reviewer’s suggestion, and we now have updated this in the Discussion in the revised version (Lines 424-433).

      Reviewer #3 (Public Review):

      The finding that osteocyte reduction induced senescence in osteoprogenitors and myeloid lineage cells is intriguing. However, further validation of cellular senescence in bone/bone marrow is lacking. Additional approaches, such as immunostaining of key senescence markers in bone tissue sections, are needed to validate the phenotype.

      According to the reviewer’s suggestion, we performed the senescence associated 𝛽galactosidase (SA-𝛽Gal) staining of frozen sections of WT and DTAhet mice femur (Figure 6 - figure supplement 1D). Accordingly, the details were given in Response to Essential Revision 2.

      It is interesting that partial osteocyte ablation alters mesenchymal lineage commitment, i.e. increased adipogenesis and impaired osteogenesis. The authors should perform further analysis of their scRNA-Seq data and conduct trajectory analysis to confirm the phenomenon. Additional functional assays of bone marrow mesenchymal stem/progenitor cells, such as CFU-F and tri-lineage differentiation assays, are needed to claim the lineage commitment change of the cells.

      As we used total bone marrow cells to perform scRNA-seq, the number of MSCs was not enough to perform further re-clustering and trajectory analysis. We performed GO enrichment analysis of MSC cluster which revealed that downregulated genes after osteocyte ablation were enriched in ossification and biomineral tissue development (Figure 6 - figure supplement 1E), which was consistent with the finding of impaired osteoblast differentiation (Figure 4H-J). Further, as reviewer suggested, we performed qPCR to verify related gene changes during tri-lineage differentiation. We found that the mRNA level of osteogenic markers including Alp, Ocn, Runx2 was decreased (Figure 4J), indicating the impaired osteogenesis after osteocyte ablation. Meanwhile, the mRNA level of adipogenic markers including Adipoq, Fabp4, Ppap𝛾 and Cebpa was significantly increased (Figure 6 - figure supplement 1F), indicating the promoted adipogenesis and altered MSC commitment. Besides, the mRNA level of cartilage anabolism related genes (Col1a2, Acan, Sox9 and Prg4) and catabolism related genes (Mmp3, Mmp13, Adamts1 and Adamts5) was not significantly changed (Figure 6 - figure supplement 1G), indicating that chondrogenesis was not altered after osteocyte ablation. And we now have updated this in the revised version (Lines 324-333) and trilineage differentiation methods and information of primers have been updated in Material and methods (Lines 579-590, Lines 623-637).

      The mechanism why osteocyte reduction causes cellular senescence of the surrounding cells is an interesting question. It would be helpful if the authors provide evidence or give an explanation on this point. Does the phenotype recapitulate age-associated bone impairment? The laboratories of Sundeep Khosla (Mayo Clinic) and Maria Almeida (University of Arkansas for Medical Sciences) reported that osteocytes are a major cell type in bone that become senescent during aging. Although most of osteocytes were eliminated in the mouse model used in this study, were the rest osteocytes undergoing cellular senescence?

      We thank the reviewer’s suggestion, and we now have updated this in the Discussion in the revised version (Line 424-433). The details were given in Response to Essential Revision 4.

      We thought that the phenotypes after osteocyte ablation were similar with the ageassociated bone impairment, and to certain degree this phenotype recapitulated the ageassociated bone impairment, which further indicated the important role of osteocytes in maintaining the bone homeostasis during aging. We performed the SA-𝛽Gal staining of frozen sections of WT and DTAhet mice femur, in which we observed SA-𝛽Gal+ cell in the cortical bone region of DTAhet mice (Figure 6 - figure supplement 1D). As cortical bone mainly contains osteocyte and matrix, we inferred that the rest osteocytes may also underwent cellular senescence.

    1. Author Response

      Reviewer #3 (Public Review):

      In invertebrates, learning-dependent plasticity was reported to take place predominantly in presynaptic neurons. In Drosophila appetitive olfactory learning, cholinergic synapses between presynaptic Kenyon cells and postsynaptic MBONs undergo behaviourally relevant associative plasticity, and it was shown to reside largely in Kenyon cell output sites. This study provided several lines of evidence for postsynaptic plasticity in MBONs. The authors nicely showed the requirement of Kenyon cell output during training, strongly suggesting that behaviourally relevant associative plasticity also resides downstream of Kenyon cell output. This is further supported by impaired appetitive memory by downregulating nAChR subunits (a2, a5) and scaffold protein Dlg in specific MBONs. Live imaging experiments demonstrated that the learning-dependent depression in M4-MBON was reduced upon knocking down the a2 nAChR subunit. Using in-vivo FRAP experiments, the authors showed recovery rates of nAChR-a2::GFP were altered by the co-application of olfactory stimulation and DA. All these lines of evidence point to the significance of nAChR subunits in MBONs for postsynaptic plasticity.

      On the technical side, this study achieved a very high standard, such as the measurement of lowexpressed receptor mobility by in-vivo FRAP. The authors conducted a wide array of experiments for collecting data supporting postsynaptic mechanisms. The downside of this multitude is somewhat compromised coherence. To give an example, the authors duplicated many behaviour and imaging experiments in different MBONs for non-associative learning (Fig. 7 and 8), which is primarily out of the scope of this paper (cf. title).

      We thank the reviewer for their positive assessment and constructive criticism. We have thought a lot about removing data on non-associative learning (Fig. 7 and 8.), however feel that they do add important experiments that are not feasible to address for the other MBONs due to technical constraints (complexity of training protocols and localization of imaging area). We also decided, as reviewer 1 was happy with these experiments, that it is important to show that the receptor plasticity is not confined to associative appetitive memory but also is important for other postsynaptic memory storage mechanisms. As a response to this reviewer, we have adjusted the title to:

      Postsynaptic plasticity of cholinergic synapses underlies the induction and expression of appetitive and familiarity memories in Drosophila

      We also now include familiarity learning in the abstract. Moreover, we now expanded our explanation on to why we conduct these additional experiments and now state:

      line 436ff: ‘Our data so far suggest that regulation of α2 subunits downstream of α5 are involved in postsynaptic plasticity mechanisms underlying appetitive, but not aversive memory storage. Besides associative memories, non-associative memories, such as familiarity learning, a form of habituation, are also stored at the level of Drosophila MBs. We next asked whether postsynaptic plasticity expressed through α5 and α2 subunit interplay, was exclusive to appetitive memory storage, or would represent a more generalizable mechanism that could underlie other forms of learning represented in the MBs. We turned to the α’3 compartment at the tip of the vertical MB lobe that has previously been shown to mediate odor familiarity learning. This form of learning allows the animal to adapt its behavioral responses to new odors and permits for assaying direct odor-related plasticity at the level of a higher order integration center. Importantly, this compartment follows different plasticity rules, because the odor serves as both the conditioned (activating KCs) and unconditioned stimulus (activating corresponding dopaminergic neurons)15. While allowing us to test whether the so far uncovered principles could also be relevant in a different context, it also provides a less complex test bed to further investigate whether α5 functions upstream of α2 dynamics.’

      We also would like to emphasize that - if the reviewer feels that keeping these data / this information as part of our manuscript would prevent publication - we are prepared to remove these data from the manuscript, and submit these data in their own right (potentially as a research advance subsequently).

    1. Author Response

      Reviewer #1 (Public Review):

      1. Probably the shortest review I've ever written! Most birds today can lift the upper beak independently of the brain case. This is made possible by a series of mobile joints and bending zones in the skull. To investigate the evolution of this phenomenon, the authors successfully CT-scanned the thoroughly squished skull of the Early Cretaceous stem-bird Yuanchuavis. The detailed description and illustration of the shapes and positions of the skull bones leave no doubt about the conclusion that the toothed snout was unable to move independently of the brain case. They also show, however, that the loss of a few extensions from specific skull bones would have made mobility possible. This plugs a major gap in our understanding of the evolution of mobility within the skull in birds (and by extension elsewhere, notably in the similarly diverse lizards & snakes).

      Yes, we are delighted that this work will further advance our understandings about the avian skull evolution.

      Reviewer #2 (Public Review):

      1. Wang et al. present a detailed description and analysis of the previously reported cranial remains of enantiornithine bird Yuanchuavis. The authors use X-ray CT scan data to reconstruct the cranial elements and retro-deform the facial and palatal skeleton. The authors also use principle component analysis with geometric morphometrics data to investigate where Yuanchuavis falls in palatine phylomorphospace. The authors use these data to make inferences about the kinetics of the Yuanchuavis skull as well as the evolution of cranial kinesis across birds. Generally, I find the authors' direct interpretation of their anatomical and PCA data to be convincing and compelling. The anatomical description is thorough and accurate. The methods used for the geometrics morphometrics and PC analyses are appropriate. I find compelling the authors' interpretations that Yuanchuavis largely retained the ancestral non-avialan akinetic skull.

      One of the greatest strengths of this paper are the extremely attractive figures. In particular, I find figure 4 to be exceptionally useful - this is easily the most effective illustration I have yet seen of avian cranial kinesis and the shifts in cranial morphology that underlie its evolution. I applaud whoever designed this figure. My one major concern with this paper's methodology is that the palatine used for Ichthyornis is incorrect. Torres et al. (2021) published the correct palatines, which were very different from those incorrectly (but understandably) identified in Field et al. (2018) and used here. I strongly urge the authors to rerun their GMM analysis with corrected data

      We thank the reviewer for supporting this study. As for the palatine of Ichthyornis, we have used the palatine reconstruction in Torres et al. (2021) and reperformed the GMM analyses. This certainly changes the GMM result, and the main conclusion has not been strongly influenced. We are grateful for this comment.

    1. Author Response

      Reviewer 2 (Public Review):

      1) The hypothesis that the genes responsible for the Mendelian traits are also the causal genes for the cognate complex traits does not seem to hold, given the prior work and the data shown in the study. For example, if this hypothesis is true, it is unexplained why the candidate genes were not even enriched in the GWAS regions for height and breast cancer.

      Following the removal of a data artifact from our breast cancer analysis and the inclusion of Backman et al.’s larger list of genes implicated in height, every phenotype in our analysis displays enrichment in proximity to GWAS peaks. Enrichment is present not only in genes selected based on cognate Mendelian phenotypes, but also on those from Backman et al., which examined the same complex trait phenotypes that were used for GWAS. In that work, the enrichment GWAS signal near of genes selected on coding variants was as high as 59.3-fold.

      Our use of Mendelian-trait-causing genes is not dependent on GWAS. Short of large-scale experimental work, we do not know any better way to confirm the genes’ broad relevance to GWAS phenotypes than their enrichment near peaks. This enrichment has been persuasively demonstrated by previous research. Freund et al. (2019) tested the enrichment of 20 Mendelian disorder gene sets against 62 complex phenotypes. Though there was no statistically significant overlap of phenotypically non-matched Mendelian genes and GWAS peaks (2% matched), the overlap of matched Mendelian genes and GWAS peaks was significant (54% matched).

      We have included additional evidence and references for this relationship in Supp. Note 1.

      2) The only evidence supporting their hypothesis appears to be the enrichment of the candidate genes in the GWAS regions for seven out of the nine traits. However, significant enrichment of the candidate genes in the GWAS regions does not necessarily mean that a large proportion of the candidate genes are the causal genes responsible for the GWAS signals. Analogously, we cannot use the strong enrichment of eQTLs in GWAS regions as evidence to claim that a large proportion of the GWAS signals are driven by eQTLs.

      Our gene sets were selected by considering two criteria: whether they are relevant to each complex trait, and whether they are biologically interpretable.

      The genes identified in Backman et al. have a strong case for relevance. They are evaluated for association, not with cognate Mendelian phenotypes, but with the exact same complex traits used for GWAS.

      Our genes, selected based on cognate Mendelian traits, are less obviously relevant, but have advantages for interpretation. Many have well-understood biological roles and are part of pathways that have been studied in great detail. Because most of these genes can cause dramatic phenotypic changes with one variant, the direction of effect is easier to understand than genes identified through burden testing. In fact, loss-of-function coding variants that cause autosomal dominant traits can be thought of as large-effect, context-independent eQTLs—they cause phenotypic change by decreasing gene expression roughly 50% across cell types, developmental stages, etc.

      Ideal genes for our analysis would combine the advantages of both sets. They would have individual coding variants that could be tied to complex traits using exome sequences. However, natural selection creates tradeoffs between variant frequencies and variant effect sizes. Large-effect variants (such as those responsible for Mendelian traits) are generally too rare to be detected in population sequencing. Coding variants that reach frequencies detectable in databases such as UK Biobank typically have smaller effect sizes, requiring them to be aggregated in order to implicate genes.

      We believe that our original gene set is plausible both because of its collective enrichment in GWAS signal and because each gene is individually known to cause cognate phenotypes. Enrichment is not proof, but can serve as strong evidence when backed up by known biology. Though selection precludes a perfect gene set, the enrichment in both our Mendelian gene set and the set from Backman et al. addresses each criterion—interpretability and relevance—individually, and, taken together, provides an argument for the relevance of genes selected based on coding variants.

      3) Considering the large numbers of GWAS signals, we would expect a substantial number of genes in the GWAS regions by chance. It would be interesting to quantify the number of genes in the GWAS regions if the 143 genes are randomly selected. Correcting the observed number of genes for that expected by chance (e.g., subtracting the observed number by that expected by chance), the proportion of the candidate genes in the GWAS regions would be small.

      The proportion of the candidate genes whose eQTL signals were colocalized with the GWAS signals or in close physical proximity with the fine-mapped GWAS hits was small. However, I would not be surprised if they are significantly enriched, compared with that expected by chance (e.g., quantified by repeated sampling of the 143 genes at random).

      Taking random sets of genes, or the entire set of non-putatively-causative genes shows that, given the size of our gene set, we would expect 43 randomly selected genes to fall within 1 Mb of a peak (95% confidence interval: 31.5-54.5). Instead, we find 147 peak-adjacent genes. When looking closer to genes, the enrichment increases. At a distance of 100 kb, we find 104 putatively causative genes, but the null model predicts only 11 (95% CI 4.5-17.0), a roughly ten-fold difference.

      Enrichment remains significant even when using a more conservative null. It may be that genes like ours, with importance to phenotype, are more likely than random genes to fall near GWAS peaks, even if their phenotype does not correspond to the GWAS phenotype. In this case, we might see enrichment even in the absence of a relationship between our Mendelian and complex traits. To account for this, we also tested significance by testing genes sets against different phenotypes (e.g. testing our LDL genes with a UC GWAS, and our height genes with a T2D GWAS). The results of this permutation are visible in Supp. Fig. 1, and further confirm the enrichment.

      Finally, non-expression based analysis found that Mendelian genes had large enrichments in heritability. As in our study, they included Mendelian genes for diabetes and LDL—the Mendelian diabetes genes were enriched 65-fold for common-variant heritability and the Mendelian LDL genes were enriched 212-fold (Weiner et al. 2022).

      Though it is true that the number of colocalizations and TWAS hits likely represents a statistically significant enrichment over all genes, we feel that this does not affect the conclusions of the paper. The model that noncoding variants identified by GWAS act as eQTLs certainly has some truth—colocalization and TWAS studies have found, in total, many associations. But the model’s success has not lived up to its expectations. This has been suggested, albeit inconclusively, by the failure of most GWAS peaks to colocalize. By evaluating, not the portion of loci that can be tied to a gene, but the portion of already-implicated genes that can be tied to a locus, we believe the model’s deficiencies are both more clear and more puzzling.

      4) It is unclear how the authors selected the breast cancer genes. If the genes were selected based on tumor somatic mutations, it is a problem because there is no evidence supporting that somatic mutation target genes are also cancer germline risk genes.

      Genes for breast cancer were selected using the MutPanning method (Dietlein et al. 2020), which takes somatic mutations found in tumors, and evaluates them in the context of known mutation patterns. The relationship between somatic and germline variants in cancer is little studied. We believe it is meaningful that, as explained in our response to overall comment 2ii, we do now find an enrichment of our breast cancer genes near GWAS peaks. Though these genes are very unlikely to be a perfect set, the conclusions of our paper remain true with or without the inclusion of this phenotype.

      5) The authors observed no enrichment of the candidate genes in height and breast cancer GWAS regions. In this case, should these traits and the corresponding genes be removed from the subsequent analyses?

      The reviewers’ notes about enrichment—and its absence in height and BC—prompted us to review our analysis of it. The enrichment for five of our phenotypes remained significant, and the lack of enrichment for breast cancer genes proved artifactual. After accounting for the artifact, the enrichment of breast cancer genes displays the same pattern as most other phenotypes, displaying highly significant enrichment as compared to the genomic background and a permutation analysis. Supplementary figure 1 has been updated to reflect this change, and to add the enrichments found in Backman et al.

      Because our original analysis of height has nominal, but not corrected, significance for enrichment, the problem may be one of power. The set of height genes identified by Backman et al. is larger than our original set and displays a significant enrichment in proximity to GWAS signal. This enrichment is also present when the two gene sets are combined, as shown in the updated Supp. Figure 1.

      Reviewer 3 (Public Review):

      1) The positive results are substantially reduced when restricting the analyses to a set of selected tissues of relevance to the trait. Isn't it implicated that the selection of relevant tissues in this study is not comprehensive, and further, tissue specificity is common in mediating genetic effects by gene expression? First, it seems some apparently relevant tissues are not selected (Table 2), such as bone for height (Finucane et al. 2015 NG). One approach to assess the relevant tissues for the predefined set of putatively causative genes is to see if these genes are enriched in the differentially expressed gene sets for those tissues. Second, among 84 putatively causative genes overlapped with GWAS signals, they identified 39 genes by TWAS, 11 genes by fine mapping with linear distance to chromatin modification features, and 41 genes by fine mapping with ChromHMM enhancer annotations, but these numbers reduced substantially to 9, 5 and 27 when restricting the same analysis to the selected tissues for each trait. If genes function only in the relevant tissues, I think using bulk expression data would lose power but is unlikely to give false positives. Thus, it is possible that for the traits analysed, not all relevant tissues are selected so that only a fraction of genes identified in bulk expression analysis can be replicated in the tissue-specific analysis. This appears to me a notable piece of evidence to support the hypothesis of biological context that the authors tend to have reservations in discussion.

      Testing for colocalizations or TWAS hits in all tissues may increase power for several reasons. First, it is possible that some GTEx tissues have unrealized relevance to our phenotypes. Secondly, in the event that a tissue is not present in GTEx, we may still detect relevant eQTLs in a tissue that is not itself involved in the trait, but which has similar patterns of expression. Finally, some tissues may be correct, but underpowered due to their small sample size. In this case, we may better detect the colocalization in tissues that are “irrelevant,” but are well-powered and have correlated expression.

      However, this creates problems of interpretation. Say we find, for example, a colocalization of an APOE eQTL with an LDL GWAS peak in skin tissue. Does this mean that skin tissue contributes to LDL levels? Is it simply because skin tissue has more samples than liver? Are we uncovering a strange, unexpected pleiotropy?

      We believe we can achieve both objectives—power and interpretability—with our use of MASH (Urbut et al. 2019) as described in response 3 of the first section. Briefly, MASH is a Bayesian tool that we use to update the estimates of eQTLs in GTEx data. Each tissue is adjusted to incorporate signals detected in other tissues with similar expression. This mitigates the danger of ignoring the correct tissue, and increases the power of tissues with small sample sizes. Its benefit is demonstrated by the substantial increase in the number of expression-GWAS colocalizations identified by coloc—however, the number of genes identified that fall within our putatively causative gene sets remains strikingly small.

      2) How much do both LD differences between GWAS and eQTL samples and the presence of allelic heterogeneity contribute to the observed low colocalization rate? One of their main findings is the low colocalization between trait-associated variants and eQTL in non-coding regions, which accounts for only 7% of the putatively causative genes. In discussion, the authors believe that this finding cannot be explained by lack of statistical power and is directly supported by a Bayesian analysis which reported high posterior probabilities of distinct signals for GWAS and eQTL. I agree that power is probably not a big issue. However, my concern is that given the large difference in sample size between GWAS and GTEx datasets, any small differences in LD between the two samples might cause a statistical separation of the signals even when trait phenotype and gene expression truly share a causal variant. Moreover, the presence of more than one causal variant with allelic heterogeneity in the locus may also play a part in the failure of colocalization. Consider two causal variants for the complex trait, one regulating the target gene and the other regulating another gene in co-expression. Potentially, the presence of the second causal variant would diminish the colocalization probability at the target gene.

      The ability of our statistical tools to actually find colocalizations is a critical one in this project. Small sample size increases the variance of the LD matrix, but is one of only many factors that influence power, which include LD differences between study populations and eQTL effect sizes.

      Though we restricted both GWAS and GTEx samples to subjects with European ancestry and used PCs as covariates, reviewers are correct that there are likely to be LD differences between samples, due to both slight variations in populations and the smaller sample sizes of GTEx. Analysis of colocalization tools in cases of mismatched LD have shown that decreases in power are small. Chun et al. (2017) tested JLIM in simulated conditions of modest population mismatch, using CEU haplotypes to create the GWAS, and haplotypes from all non-Finnish Europeans for eQTL associations. They then attempted to distinguish shared vs. distinct causative variants for GWAS and eQTL, finding no decrease in sensitivity or specificity (Supp. Fig. 6 of Chun et al. 2017).

      The case in which two genes are co-regulated by nearby variants, both causative for the GWAS trait, creates a condition of allelic heterogeneity for the GWAS trait (as opposed to the expression trait). Chun et al. evaluated JLIM’s loss of power as a result of AH, and found that the power loss is small, except in cases in which the two variants have equal effects (Supp. Fig. 10). Testing cases in which the AH occurs for the expression trait returned a similar result (Supp. Fig. 9).

      Hukku et al. (2021) performed similar analyses on coloc, eCAVIAR, and fastENLOC. Allelic heterogeneity was found to damage the power of coloc (by about a factor of 2). Testing on different pairs of populations, they conclude that extreme LD mismatches (e.g. Finnish vs. Yoruban samples) can lead to substantial power loss, but moderate LD mismatches (e.g. Finnish vs. British samples) do not. Though a factor of two is substantial, it would not change the qualitative conclusions of this paper. Overall, given the variety of methods we employ (including those, such as JLIM, more robust to AH), we are confident that they have, when taken together, been shown to be robust to the concerns raised.

      Finally, TWAS should, by design, be less vulnerable to LD differences and allelic heterogeneity. This can result in false positives, when genes with correlated expression are identified together, despite only one being causative. It can also result in non-causative genes being prioritized over causative ones, however, generally both genes will be identified (Wainberg et al. 2019).

      3) Perhaps the authors can perform some simulations to quantify the influence of tissue-specific expression effects, LD differences between eQTL and well-powered GWAS, and allelic heterogeneity, as discussed above, on their analyses. I understand that the authors may not be willing to do as it would involve a lot of work. But I'd like to see at least some discussion on how these questions can be better addressed in the future research.

      These are nuanced technical questions, and to address them by simulation in our paper would, as noted, involve a lot of work. We have summarized previous work that evaluated the effects of LD differences and AH in our response to essential revision 4. We discuss our concerns about the possibility of an overly broad tissue search in essential revisions 3 and 5, and our decision to address this question using MASH in essential revision 3.

      4) It looks quite striking that only 6% of the putatively causative genes are identified by TWAS with the correct effect direction. But I think this number is slightly misleading as one may interpret it as only 6% of the functionally relevant genes are regulated by trait-associated variants. In fact, 46% of the genes are detected by TWAS but only 11% are confirmed in their selected tissues, among which about half (5/9) have correct effect direction. First, the result could be limited by the selection of relevant tissues, as discussed above. Second, the fact that half of the genes do not show correct effect direction may reflect a nonlinear relationship between expression and trait, or the presence of cell-type heterogeneity within a tissue. These may not necessarily overturn the assumption that these genes are regulated by trait-associated variants in the causal tissues or cell types.

      In our initial submission, we had been reluctant to expand the list of tissues for two reasons. First, increasing from the small number of tissues with known biological relevance to all tissues (or all non-brain tissues) increases the multiple-testing correction burden. Second, and, in our eyes, more important, colocalizations in tissues without clear biological relevance are not biologically interprable. Such hits can be results of complicated genetic architecture (e.g. shared eQTLs), power differences in tissues with correlated expression, or biology not directly related to the trait in question.

      That said, the tissue data we have access to are incomplete, and we are without question missing some relevant tissues. Additionally, some relevant tissues have lower sample sizes, and thus lower power, than tissues that are not relevant but may still share eQTLs. To overcome these problems, we applied Multivariate Adaptive Shrinkage (MASH), a Bayesian method that detects correlations between different (in this case tissues) and uses them to produce posterior estimates of summary statistics in each tissue (Urbut et al. 2019). Unlike meta-analysis, which produces one result, the effect size estimates for each tissue are distinct, though informed by one another.

      Using MASH has a pronounced effect on colocalization results. The number of non-putatively causative genes colocalizing increases from 389 to 489, while the number of putatively causative genes in our Mendelian set is unchanged, remaining at 2. The number of genes from the Backman et al. set increases from 2 to 5. Though this is a proportionally large increase, it still represents a small fraction of genes. We have updated our paper to use these results—which should be less dependent on the tissues we selected—but the message has not changed.

      5) While they highlight the roles of alternative regulatory mechanisms, few testable hypotheses are put forward for the field, which is somewhat disappointing but understandable given how little we know about the human genome at the mechanistic level.

      We have added a set of models that may explain the “missing heritability” to Table 4 in the discussion. Though we do not propose experiments, we have included citations for research relevant to confirming or disproving these models.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Kowalczyk and colleagues report on identifying coding and non-coding genetic determinants of hairlessness in mammals using an approach they developed called RER-converge. The approach has previously been employed to examine several different traits in previous publications from this group. The authors determine that hairlessness is associated with relaxed evolutionary constraint at genetic loci and identify both coding genes and non-coding sequencing associated with this phenotype. Several known-hair-associated and novel genes and microRNAs are observed.

      This is a strong manuscript with interesting results. It is remarkable how robust this method is. There are a few places where I was not fully convinced of the choice to highlight a gene as "significant" however.

      In Figure 4 and the associated text and figure legend the claim is made that non-coding regions exhibit accelerated evolution of matrix and dermal papilla elements. However, the enrichment, even prior to multiple testing correction is not significant. Should this be reported on?

      We agree that some of the results that we displayed had borderline significance and we have clarified this in the text so the reader is aware. Our rationale for highlighting tissue annotations from borderline-significant enrichment results from noncoding analyses (matrix p=0.078 adj.p=0.18, dermal sheath p=0.059 adj.p=0.16, dermal papilla p=0.049 adj.p=0.16) is because we believe that these are an honest depiction of the trends we see in this scan (with alpha=0.2 for adjusted p-values), particularly when supported by effect sizes as reported through AUC. We prefer to set a generous threshold to avoid missing any meaningful results rather than setting a more stringent threshold. A more generous alpha is also more forgiving of the noise related to identifying noncoding regions and assigning them to genes.

      Related to the above, Table 1 includes just one 'significant gene,' with the remainder of the genes highlighted because they have a Bayes Factor ratio >5. Should a gene with a BF HvM be highlighted as a gene "whose evolutionary rates are significantly associated with the hairless phenotype?" Perhaps I am incorrect, but the hypothesis that is being tested by this approach seems distinct from "is the gene associated with hair loss."

      Similar to pathway enrichment analyses, we also used generous significance thresholds for gene-specific results to show our top, most significant results from protein-coding analyses. Significance of noncoding enrichment was not a criterion for inclusion/exclusion of genes in Table 1. Generally, some genes with significant convergent evolutionary rate shifts in protein coding sequence also have significant enrichment of convergent rate shifts in nearby noncoding regions (like PTPRM in Table 1), but many do not, which is also shown in Figure 6A. We have clarified column titles in Table 1 by adding (Gene) or (Noncoding) to indicate which sequences the values refer to.

      Bayes factors (BF) are a complementary Bayesian approach to analyze statistical associations that we use here to supplement information we get from our more traditional Kruskal-Wallis test. BF are easy to interpret because they directly describe the amount of support for our alternative hypothesis rather than indirectly describing support as p-values do. For example, in Table 1, the hypothesis that the evolution of FGF11 is associated with the evolution of mammalian hairlessness has 6,354.7 times more support than the null hypothesis that phenotype and gene evolution are not related. These large values are interpreted as supporting the alternative, which is equivalent to what we want to be able to interpret from p-values (i.e. low p-values allow us to reject the null and implicitly support the alternative).

      BF values in Table 1 are calculated using evolutionary rates in protein coding sequence and so are not expected to match values in the “Noncoding” columns. “BF Hairless” is directly related to the “Statistic” and “p-adj” columns, which is why the “BF Hairless” values are all quite large, indicating a large amount of support for an association between gene and phenotype evolution.

      The hypotheses that are tested with the “Statistic” and “p-adj” columns and the “BF HvM” column are colloquially the same: they both test to determine if the evolutionary rate of the gene is different in hairy mammals compared to hairless mammals. Only the details are different. The traditional statistics test for an association without accounting for marine mammals as a potential confounder. The BF tests check for a significant association that is driven more strongly by hairlessness than by marine habitat.

      Slightly more description of the Bayes factor calculation would be beneficial to the supplement. e.g. is the R package BayesFactor package being used here... or something else?

      We agree that a clearer description of Bayes factors is appropriate and have modified the methods description as follows:

      “In addition to calculating element-specific association statistics, Bayes factors were calculated for each gene using the marine and hairless phenotypes using the BayesFactor R package (Morey & Rouder, 2021). These values were calculated to disentangle the two phenotypes, which are heavily confounded since nearly all marine mammals in the genome alignment used for this work are hairless. Briefly, Bayes factors are a Bayesian approach complementary to more standard statistical tests. Instead of returning statistics and p-values, Bayes factors directly quantify the amount of support for an alternative hypothesis. For example, a Bayes factor value of 5 for a particular statistical test would indicate 5 times more support for the alternative hypothesis than the null hypothesis. Bayes factors can also be used to compare different alternative hypotheses by calculating the ratio of two Bayes factors. When considering the hairless phenotype, we use Bayes factors to quantify the support for a linear model predicting phenotype using evolutionary rate information from each gene, with a higher Bayes factor indicating greater support. We perform this calculation for two alternative hypotheses: 1) a gene shows different evolutionary rates in hairless versus hairy species, and 2) a gene shows different evolutionary rates in marine species versus non-marine species. The ratio of Bayes factors between the hairless and marine phenotypes quantifies the level of support of one phenotype over the other and thus can be used to tease apart intricacies of the two heavily-confounded phenotype. When the Bayes factor for the hairless phenotype is much larger than the Bayes factor for the marine phenotype, that indicates stronger support for signal driven by hairlessness.”

      Why are the qq-plot distributions of non-coding elements so distinct compared to coding? Some comment on this would be appreciated in the main text, even if briefly.

      We have added the following text as a tentative speculation about why noncoding elements seem to show more signal than coding signal:

      “Interestingly, noncoding regions appeared to show even stronger deviation from uniformity than coding regions, perhaps because regulatory changes more strongly underlie the convergent evolution of hairlessness.”

      Reviewer #3 (Public Review):

      The authors present a phylogenetic analysis of evolutionary rates as they correlate with independently derived "hairlessness" across mammals. This is a very good paper, well written and very carefully analyzed. This paper makes a number of interesting biological insights, including the identification of protein coding as well as noncoding regions that appear to evolve in correlated fashion with hairlessness.

      I have several recommendations:

      1) The main assumption behind this experiment is that species "use" the same genes to accomplish hairlessness. Only then would one predict correlated rate shifts along hairless lineages. If, on the other hand, each hairless species used a unique gene to accomplish hairlessness, then one might only see a rate shift on that species' lineage. Therefore, a complementary approach might be to i) define all genes with known involvement in hair morphology (i.e., genes in the categories listed in Fig. 1C). ii) test how many of those genes show a significant rate shift in at least one hairless lineage. iii) test whether hair genes are more likely to show at least one rate shift compared to genomic background. This complementary analysis would relax the assumption that all hairless species show similar rate shifts compared to haired species.

      Our analyses detect convergently evolving genomic elements associated with hairlessness for two reasons. First, species-specific analyses may detect genomic changes associated with any unique phenotypes in a particular species and it is difficult to distinguish which of those genomic changes are associated with hairlessness. Second, we are seeking genomic elements associated with hair growth in all mammals and species-specific adaptations will not be shared across all mammals.

      Nevertheless, we conducted a complementary analysis to test for rate shifts specific to each hairless species compared to all of the non-hairless species. We then tested for enrichment of hair follicle genes among genes with significant rate shifts in different numbers of hairless species. For example, among all genes with significant rate shifts in at least one hairless species, is there an enrichment of hair follicle genes? Then, among all genes with significant rate shifts in at least two hairless species, is there an enrichment of hair follicle genes? Et cetera until we test for enrichment only in genes with rates shifts in all ten hairless species. As expected, the signal of enrichment gets stronger as more species share the rate shift (the “convergent signal”). This happens because the genes with shared rate shifts are more hair-specific than the genes with unshared rate shifts.

      We also performed another analysis to test for enrichment of hair follicle genes among genes with significant rate shifts per hairless species. For example, in orca, are the genes with significant rate shifts enriched for hair follicle genes? To complement this analysis, we also repeated the procedure for non-hairless species for comparison. Only two of the ten hairless species show species-specific hair follicle enrichments, which indicates that most of the hairless species alone are insufficient to detect hair signal at all. Even among the two species with significant enrichment, there are thousands of total genes identified, many of which are likely related to other unique characteristics of those species other than hairlessness, and it is impossible to distinguish the hair-related genes from the other genes without additional information.

      All of these results are reported in the manuscript in the text and figures shown below:

      Species-Specific Analyses

      In addition to conducting convergent evolution analyses to identify genetic elements evolving at different rates across all hairless species, we also conducted complementary analyses to detect elements evolving at different rates in individual hairless species to demonstrate the importance of convergent evolution in our analyses. Indeed, the strength of enrichment for hair follicle-related genes among top hits steadily increases as more hairless species share rate shifts in those genes, an indicator of the power of the convergent signal (Figure 2). Further, analyses on single species alone only show enrichment for hair follicle-related genes among top hits in two hairless species out of ten – armadillo and pig (Figure 2 Supplement 1). Together, these results demonstrate the importance of testing for convergent evolutionary rate shifts across all hairless mammals to best detect hair-related elements.

      Also of important note is that every individual hairless species has thousands of genes with significant rate shifts in that species (Supp. File 10). It is impossible to tell which of those rate shifts is associated with hairlessness specifically because the species have many unique phenotypes other than hairlessness that could be responsible for rate shifts in their respective genes. Convergent analyses allow for more concrete identification of hair-related elements by weeding out rate shifts that are not shared across species with the convergent hairless phenotype.

      2) It would be interesting to break up noncoding into additional strata. For example, one might predict that rate shifts in predicted transcription factor binding sites would have a larger functional impact than rate shifts in noncoding regions with no function. Or... that rate shifts in highly conserved noncoding regions vs. less conserved noncoding regions.

      We have performed extensive analyses to investigate the roles of TFBSs in the convergent evolution of hairlessness and found little enrichment of specific TFBS in our top noncoding regions from RERconverge. Perhaps because the noncoding regions are highly conserved, they contain many potential locations for TF binding and so it may be more reasonable to consider their full stretch of sequence as functional than it would be if they were less conserved.

      We have calculated conservation scores for noncoding regions and found no global association between RERconverge results and sequence conservation score.

      3) Why is aardvark considered a haired species? Aardvarks have as much (or as little) hair as pigs.

      Body hair is a difficult phenotype to categorize in mammals because all mammals do have hair. In order to create a binary distinction between hairy and hairless mammals, we needed to make a choice about where to draw that line. We were particularly concerned about the impact of assigning some of the hairier mammals, like pig, armadillo, and human, as hairless, so we performed the drop-out tests shown in Figure 4 to demonstrate that removing individual hairless species from our analyses does not change the overall signal. Indeed, removing pig impacts detection of genes in the two hair-related pathways shown less than removing clearly hairless species like killer whale or dolphin. We believe that these results are sufficient to demonstrate that subtle differences in phenotyping decisions will not substantially change the findings stated in our manuscript.

      4) The primary goal of the paper is to identify coding/noncoding regions that show shifts in evolutionary that are correlated on hairless vs. haired lineages. I was left wondering... when these correlations are found, how often is it due to the same mutations hitting the regions vs. mutations randomly hitting the same regions. If the former, this would suggest some limited way that species can achieve "hairlessness".

      In general, we do not expect amino acid convergence (for genes) or nucleotide convergence (for noncoding regions) to drive much of the signal we detect using RERconverge. For species separated by millions of years of evolutionary time, it is highly unlikely that a change in a single amino acid (or nucleotide) would drive exactly the same phenotypic change for a highly complex phenotype like hairlessness. However, we argue that there do appear to be some limited ways that species become hairless, albeit at the scale of evolutionary rates across a length of sequence rather than individual bases.

      Related to this point is the distinction between positively selected regions compared to regions under reduced constraint, which we would expect to accumulate mutations randomly.

      For genes, we believe that accelerated evolution of specific genomic regions in hairless species is caused by an accumulation of random mutations, not positive selection or specific targeted mutations. As stated in the manuscript, we performed branch-site tests for positive selection on our top genes, all KRTs, and all KRTAPs, and we found little indication that quickly evolving genes are undergoing positive selection specific to hairless species. This conclusion is also consistent with the hypothesis that genes under relaxation of evolutionary constraint will have rate shifts that are easier to detect over long periods of evolutionary time compared to genes under more subtle and short-lived periods of positive selection in association with the establishment of a new phenotype.

      For noncoding regions, it is much more difficult to distinguish positive selection from relaxation of evolutionary constraint because it is difficult to establish an estimate of neutral evolution for those sequences. Models of positive selection in regulatory sequence is a current area of emerging research in the field and are not yet reliable enough to make the distinction between positive selection and accumulation of random mutations.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors took advantage of an existing protein-trap resource in zebrafish to identify genes important for normal pacemaker function in adults. They generated a collection of lines with mutation in genes that expressed at reasonably high levels in the heart and assess their ECG. They identified 3 candidates with increased incidence of sinus arrest and focused on validation of dnajb6b. The dnjb6b mutant fish display other defects including enhanced response to atropine and carbacol and bradycardia. They show that dnajb6b is expressed in a subset of cells in the sinus node in zebrafish. In mouse sinus node, DNAJB6 expressing cells have low expression of TBX3 and its target HCN4. In addition, Dnajb6b+/- mice also display similar phenotypes. Analysis of pacemaker function in ex vivo mouse hearts by high-resolution fluorescent optical mapping of action potentials revealed that the number of leading pacemakers in Dnajb6b+/- hearts is decreased in the sinus node, with a concomitant increase in the auxiliary pacemakers. RNAseq analysis of the right atrial tissues detected expression changes in ion channels and genes involved in Ca2+ handling and Wnt signaling. Overall, the results support the conclusion that DNAJB6 is important for proper sinus node function, thus adding it to the short list of sick sinus syndrome genes. However, the manuscript has several weaknesses.

      Weakness:

      The manuscript does not address the mechanism by which decreased DNAJB6B causes sick sinus syndrome. For example, it is unknown if DNAJB6B functions cell autonomously or non-cell autonomously in the sinus node. The RNAseq analysis identified changes in ion channels in the right atrial tissues of 1-year old mice, cellular electrophysiology of the sinus node cells was not assessed.

      The main goal of this research is to prove the feasibility of discovering novel SSS genes in adults via a forward genetic approach in zebrafish. Thus, the major hallmark would be to prove causality and specificity of the candidate genes identified from this screen, such as Dnajb6. Comprehensive mechanistic study would be a focus for future studies.

      Nevertheless, we carried out the following experiments to address the mechanisms. Based on these data, a new section was added to the discussion section (Lines 424-465).

      (1) In mice, we did more antibody immunostaining and confirmed a negative correlation in terms of expression intensity between the Dnajb6 and Tbx3 proteins. We further detected a significantly increased Tbx3 immunostaining signal in the SAN tissues of Dnajb6 heterozygous mice compared to WT controls (new Figure 3D-F).

      (2) In zebrafish, we compared expression patterns of the sqET33-mi59B conduction system reporter line between the GBT411/dnajb6b heterozygous and homozygous mutants. We found the atrio-ventricular canal (AVC) signal became diffused in GBT411/dnajb6b homozygous adult hearts. In addition, the ring-like structure usually seen in the SAN region of WT controls and in the GBT411/dnajb6 heterozygous was largely lost in 3 out of 9 GBT411/dnajb6b homozygous adult hearts examined (new Figure 2).

      Together with the ectopic pacemaker activity detected in the Dnajb6 heterozygous mice (new Figure 5A and 5B), we speculate that Dnajb6 might act as a suppressor of Tbx3 transcription factor in defining cell fate specification into SAN pacemaker myocytes. Since Tbx3 was reported to suppress chamber myocardial differentiation (Mommersteeg et al., Circ Res. 2007;100(3):354-62), upregulation of Tbx3 may thus contribute to enhanced atrial ectopic activity in Dnajb6 heterozygous mice.

      Furthermore, TBX3 has been recently identified as a component of the Wnt/β-catenin-dependent transcriptional complex (Zimmerli et al., eLife. 2020;9:e58123), which is significantly affected in Dnajb6 heterozygous mice (see new Figure 7B-C). This further supports a possible role of TBX3 in both SAN and atrial remodeling.

      (3) Finally, in collaboration with Drs. Grandi, Morotti, and Ni from University of California Davis, we utilized a population-based computational modeling approach to determine the cellular/ionic mechanisms that could underlie the ex vivo observed SSS phenotype in the Dnajb6 heterozygous mice (new Figure 6). We used our previously published model of the mouse SAN myocyte (Morotti et al. Int J Mol Sci. 2021; 22(11):5645) and enhanced it with addition of both sympathetic and parasympathetic stimulations to model the effects of isoproterenol- and carbachol-induced changes in pacemaker activity (i.e., firing rate), respectively. We generated a population of 10,000 mouse SAN myocyte models by random modification of selected model parameters describing maximum ion channel conductances and ion transport rates from the baseline model and assessed isoproterenol- and carbachol-induced effects on each model variant. We then separated this population of models in two subpopulations representing the WT and Dnajb6+/- mice phenotypes: namely, we extracted the model variants that recapitulate changes observed in Dnajb6+/- vs. WT mice, including a reduced firing rate at baseline, an increased response to isoproterenol, and a decreased response to carbachol administration (new Figure 6). This filtering process resulted in n=438 models that correspond to the Dnajb6+/- mice phenotype and n=6,995 models that correspond to the WT phenotype. We analyzed the parameter value differences in these two subgroups to revealed several crucial parameters that are significantly correlated with the observed electrophysiological changes. The analysis revealed a significant decrease in the maximal conductances of the fast (Nav1.5) sodium current, the L-type Ca2+ current (ICa,L), the transient outward, sustained, and acetylcholine-activated K+ currents, the background Na+ and Ca2+ currents, as well as the ryanodine receptor maximal release flux of the Dnajb6+/- vs. WT model variants. We also found a significant increase in the Na+/Ca2+ exchanger (NCX) maximal transport rate, and conductances of the T-type Ca2+ current and the slowly-activating delayed rectifier K+ current. These new studies provide some novel mechanistic insights into the observed SSS phenotype in Dnajb6+/- mice. Importantly, these new in silico experiments add another conceptual level to the phenotype-based screening approach introduced in the current study to identify new genetic factors associated with SAN dysfunction. Direct testing of these mechanisms would require a substantial amount of single SAN cell patch clamp and confocal microscopy experiments which are out of scope of the current manuscript and will be pursued in a follow-up study.

      The manuscript does not address why the zebrafish homozygous mutants are adult viable while the mouse homozygotes are embryonic lethal. The insertion of the GBT411 disrupt dnajb6b(L) but not dnajb6b(S), while the mouse mutation deletes the entire gene. Does this difference partially explain the difference?

      Indeed, the difference between zebrafish and mouse can be partially explained by the fact that only the long isoform of dnajb6b gene, dnajb6b(L), was disrupted in the GBT411 mutant, while both the long-Dnajb6(L) and short-Dnajb6(S) isoforms of Dnajb6 gene was largely deleted in the Dnajb6 knockout mice. However, we think the main reason is probably that functional redundancy in zebrafish but not mouse: zebrafish has two dnajb6 homologues, dnajb6b and dnajb6a, while mouse has only one Dnajb6 homologue. We added these points to the paper (Lines 377-379).

      Reviewer #2 (Public Review):

      In this manuscript, the authors expand upon previous work describing development of a protein trap library made with the gene-break transposon. This library was screened to identify lines displaying gene trap expression in the heart (zebrafish insertional cardiac mutant collection). A pilot screen of these lines using adult ECG phenotypes identifies dnajb6b as a new gene important for cardiac rhythm. Using the GBT/dnajb6b zebrafish line, Ding et al. find a proportion of aged homozygous mutant fish (1.5-2 years) present sinus arrest episodes and reduced heart rate. Treating GBT411/dnajb6b mutant adults with compounds revealed aberrant responses to autonomic stimuli, and sinus arrest episodes were induced following verapamil exposure, providing evidence that GBT411/dnajb6b as an arrhythmia mutant. This conclusion could be better supported by presenting specific ECG parameters to characterize the conduction defect more thoroughly. The authors then report that Dnajb6+/- adult mice recapitulate some of the phenotypes observed in zebrafish, including sinus arrest and AV blocks, as well as impaired (although different) responses to autonomic stimuli. The authors describe that these are features of sick sinus syndrome in the absence of cardiomyopathy phenotypes in either the zebrafish or mouse lines. However, overall cardiac morphology is not well described for either the GBT411/dnajb6b or Dnajb6+/- models.

      We carried out more experiments to examine left ventricular (LV) structure in Dnajb6 heterozygous mice at 1 year of age, using H&E staining, Masson’s trichrome staining, and transmission electron microscopy (TEM) analysis. We now show clearly that there are no significant myocardium structural changes in the LV as well as atrial and SAN tissues of Dnajb6 heterozygous mice (new Supplemental Figures 3 and 5), when the SSS phenotype was already noticeable. However, in the GBT411/dnajb6b heterozygous mutant at ~2 years of age, we detected severe sarcomere structural abnormality in 1 out of 3 fish hearts examined (see Response-only Figure 1). In addition, in a previous publication (Ding et al., Circ Res, 2013:112(40:606-17), we reported evident cardiac remodeling phenotypes in the GBT411/dnajb6b homozygous fish at 12 months of age.

      Together, we have obtained more experimental evidence to strengthen the claim that arrhythmia is not due to cardiomyopathy/structural remodeling in the Dnajb6+/- mice. However, the evidence from fish remains weak. Therefore, we removed the claim that “when structural remodeling/cardiac dysfunction have not yet occurred” in fish and modified our statement in mice accordingly (Lines 372-377, 385-386).

      To further support a role for Dnajb6 in sinoatrial node dysfunction, the authors performed optical mapping of action potentials from isolated mouse atrial tissue. These data reveal that Dnajb6+/- cultures exhibit ectopic pacemakers outside of the sinoatrial node, including within the atrial wall and inter-atrial septum. These data also show prolongation of SAN recovery time at baseline and following autonomic stimulation, further suggesting SAN dysfunction. RNA-sequencing experiments of DNAjb6+/- adult right atrial tissue showed differentially expressed genes encoding Ca2+ handling related proteins, ion channels, and WNT pathway related proteins. As these genes are involved in the cardiac conduction system, the authors suggest these pathways as molecular mechanisms underlying SSS phenotypes in Dnajb6 models.

      Sick sinus syndrome is a relatively rare arrhythmia most commonly found in older populations. Therefore, it has been challenging to establish clinically relevant models and there is a limited understanding of mechanisms of SSS pathogenesis. One particular strength of this manuscript is the ECG phenotype-based forward screen of the gene-breaking transposon (GBT)-based gene trap library in aged animals. This pilot study provides proof-of-concept that this screening approach is well suited to identify regulators of cardiac function in adults and genes linked to adult diseases like SSS.

      Thank you very much for recognizing the major strength of our manuscript!

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, the authors investigated the role of a long noncoding RNA VPS9D1-AS1(VPS) in colorectal cancer (CRC). They found that a high level of VPS was negatively associated with T cell infiltration in CRC patients; in cell line-derived xenograft models or a conditional knock-in mouse model, VPS overexpression enhanced tumor growth and suppressed the infiltration of CD8+ T cells, which was reverted by VPS antisense oligonucleotide (ASO) treatment. They also investigated the molecular mechanisms underlying VPS function and revealed a VPS/TGF-β/ISG signaling cascade in tumor cells and crosstalk between tumors and T cells depending on IFNAR1 level.

      The authors had performed extensive analyses on the functions of VPS using patient samples, CRC cell lines, xenograft tumors, and drug-induced tumors, and the data were of relatively good quality; they targeted VPS overexpression in cell line-derived xenografts or mouse tumors by ASO treatment as potential therapeutics, although the overexpression level may not be physiologically relevant. The authors also made great efforts to explore the mechanisms in vitro and proposed a very interesting model of ribosomes/VPS/TGF-β/ISG signaling axis in tumor cells and opposing regulation on IFNAR1 in tumor and T cells; however, the mechanistic model was tested in vitro, not in cell line-derived xenografts or mouse tumors used in the study, which undermined the authors' claims.

      Thanks for these positive comments from reviewer #1, and we attached great importance to critical comments about in vivo data.

      Reviewer #2 (Public Review):

      In this paper, Yang et al. seek to show the importance of the lncRNA VPS9D1-AS1 in the biology and pathology of colorectal cancer (CRC). Starting with the analysis of patient data, and proceeding to cellular and animal cancer models.

      Specifically, the authors report higher VPS9D1-AS1 levels in tumor tissues in two independent cohorts of CRC patients. There was a positive association between VPS9D1-AS1 levels and molecules involved in TGFb signaling, yet a negative association between VPS9D1-AS1 levels and levels of tumor-infiltrating CD8+ T cells (and a negative correlation of these levels of tumor-infiltrating CD8+ T cells and protein expression of molecules involved in TGFb signaling). Cell line studies revealed a positive feedback loop between VPS9D1-AS1 and TGFb signaling molecules, with a cell-intrinsic, pro-proliferative, and pro-survival effect of VPS9D1-AS1 on CRC cancer cells. VPS9D1-AS1 also controls the expression of several genes in the IFN pathway, in particular the ISGs IFI27 and OAS1. In addition, IFI27 and OAS1 expression are controlled by TGFb, TGFBR1, and SMAD1, and the promoter of OAS1 is targeted by SMAD4 (but also TGFb), which binds to it. VPS9D1-AS1 expression in tumor cells promotes PD1 expression and negatively affects IFNAR1 on T cells to reduce their effector functions. In vivo, MC38 CRC cells overexpressing VPS9D1-AS1 show increased tumor growth in mice, and animals with transgenic VPS9D1-AS1 expression in the intestine develop larger CRC lesions upon AOM/DSS treatment. Finally, in vivo targeting of VPS9D1-AS1 using anti-sense oligo reduced tumor size. The data indicate a series of intricate molecular and cellular interactions and suggest that VPS9D1-AS1 can help with patient stratification, improving prognostic prediction and allowing for personalized treatment.

      Taken together, there is a multitude of datasets and several complementary experiments using patient-derived samples, genetically engineered cell lines, and mouse models. Definitely, the paper includes many avenues of inquiry that cover the broad field of cancer molecular biology, biochemistry, and pathogenesis. However, this broad approach renders the paper difficult to follow at times and also leads to numerous typographical and interpretive (but, largely, not methodological), mistakes. In addition, the quality of some of the figures needs to be improved before they can be properly evaluated.

      In methodology, the authors are largely successful, and I would not recommend major changes to the work, other than to recommend a "focusing" of the manuscript objectives, or a paring of the data to better convey the desired story.

      The experiments presented herein, particularly those that test the efficacy of the lncRNA as cancer therapeutics are important for the field, and should be of high import to other cancer biologists.

      We thank you very much for your constructive comments. We had replied all your concerns

      Reviewer #3 (Public Review):

      The authors have accomplished large amounts of work to prove the role of VPS9D1-AS1 in promoting immune escape from cytotoxic T cells, and the mechanistic exploration is valid enough to support the conclusions, as well as the translational significance of this target through in vivo experiments. However, the logicality of the diagram requires improvement, and several revisions are warranted.

      We thank for reviewer’s positive comments. We revised our manuscript according to your suggestions

    1. Author Response

      Reviewer #2 (Public Review):

      This study has investigated the pathway for degradation of the inner nuclear membrane protein SUN2. Based on earlier studies that had searched for bTrCP substrates and interactors, the authors postulated that SUN2 might be a target of this ligase. They found two potential bTrCP recognition sites and showed that the second of these, Site2, is important for SUN2 turnover. A phospho-mimetic mutant is turned over faster, and a phospho-resistant mutant is turned over slower. The degradation is slowed by inhibitors of Neddylation (and therefore, of Cullin Ring Ligases), inhibitors of p97, and proteasome inhibitors. Using a genetic screen, they find bTrCP, components of the Cullin ring ligase, p97, the proteasome, and subunits of CK2. They use inhibitors to show that CK2 is needed for maximal SUN2 degradation, and a phosphatase called CTDNEP1 antagonises CK2-mediated SUN2 degradation. Using a non-degraded variant of SUN2, the authors show that its overexpression can influence nuclear morphology and various nuclear functions. In sum, the authors outline a pathway for regulated degradation of the inner nuclear membrane SUN2.

      The study is generally sound in its logic, well written, and appropriately interpreted for the most part. The data are of high quality. The findings are new and will provide a foundation to now examine how LINC complex abundance is regulated. I have a number of suggestions for improvement, listed in order of importance. Only the first two should require any experimental work, and the second item is potentially optional depending on the authors' response. The remaining items can be handled with adjustments to the manuscript.

      1) It is surprising that nowhere in the paper is an experiment directly and rigorously establishing that bTrCP is required for SUN2 degradation. I realise this is quite plausible from the shown experiments, but it seems to be a rather glaring oversight (apologies if I have missed it somewhere). At present, the current evidence for its role is the similarity of Site2 to a bTrCP recognition motif, the physical interaction of SUN2 with bTrCP, and the modulation of this interaction by mutants intended to mimic or eliminate phosphorylation. The inhibitor experiment is not strong evidence because it inhibits all CRLs. I would therefore recommend, at the least, to present an experiment knocking down or out bTrCP2 (i.e., FBXW11, which nicely showed up in the genetic screen). This simple experiment could be included in the validation experiments in Fig. S4b. It would be worth also including FBXW1A for comparison, and if needed, the double-knockdown. This seems essential to complete the study.

      We thank the reviewer for suggestion. These experiments have now been included in the manuscript. For further information, see response to Editor’s point 1.

      2) The experiments with TBCA are not complemented with knockdown experiments of CK2 subunits. I realise CK2 is essential, but cells can evidently tolerate acute knockdown sufficiently well to do experiments given that this came up in the CRISPR screen. I would think such knockdown experiments would strengthen the argument and mitigate any concern about the off-target effects of TBCA. Kinase inhibitors are often only partially specific, so arguments about the involvement of any kinase are stronger if inhibitor studies are complemented with genetic perturbations.

      We thank the reviewer for suggestion. We have now included knockdown experiments with independent sgRNAs that validate our conclusion on the role of CK2 in SUN2 degradation (Figure Supplement 4C). In addition, we would like to point out that besides the essential CK2 regulatory subunit CSNK2B, our genome widescreen also identified the catalytic subunit CSNK2A2 (non-essential as it is redundant with CSNK2A1) (see Figure S3A). Considering that our library contains 4 sgRNAs per gene, this makes a total of 8 sgRNAs targeting subunits of CK2. Importantly no other kinase was identified in our screen. Moreover, TBCA is well established as a specific CK2 inhibitor. Altogether, these various observations make us quite confident that CK2 is the prime kinase controlling SUN2 stability.

      3) Lines 173-183: MLN4924 is used interchangeably with inhibition of SCFbTrCP. But MLN4924 is an NAE inhibitor that indirectly inhibits all CRLs. It seems premature to invoke SCFbTrCP as being involved because the experiments have not yet established a role for this specific CRL (see point 1 above). Instead, the conclusion should be that the data indicate a role for one or more CRLs. At this point in the narrative, the only evidence that bTrCP is involved is the sequence similarity of site1 and site2 to canonical bTrCP recognition sites. However, this is not enough evidence as no experiments knocking down or knocking out bTrCP, or experiments showing a physical interaction, have been presented yet. That comes in the subsequent section.

      We thank the reviewer for pointing this out. The text was modified and additional data on the depletion of βTrCP has been included in the revised manuscript to support our conclusions.

      4) Line 195 - At this point in the narrative, there is no evidence that SUN2 is ubiquitinated by SCFbTrCP. This needs to be rephrased. I would think one can conclude at this point that SUN2 is degraded by a pathway that relies on a CRL, p97, and the proteasome. The degradation is controlled by Site2, potentially by phosphorylation (again, this has not really been established at this point in the story, even if it seems plausible based on the mutagenesis).

      The sentence has been modified.

      5) I think the discussion needs to include some thoughts on what the authors believe happens to the rest of the SUN2 trimer or more broadly, the LINC complexes. In other words, what is the consequence of degrading a single protein of a much larger complex? In this vein, the model shows monomeric SUN2. Is it worth showing that it is part of a trimer and part of the LINC complexes? Regardless of how the authors depict the model, discussing this issue seems worthwhile.

      We thank the reviewer for the suggestion. We observe that the turnover rates of endogenous SUN2 is affected by the exogenous expression of SUN2 and primarily its derivatives Site 2A and Site 2D (Figure 1E). The effects are likely due to the assembly of trimers containing both endogenous and exogenous SUN2. This observation also suggests that degradation of one of the subunits in the trimer leads to the degradation of the other two. However, in the current manuscript we do not directly test or analyse these models or look at SUN2 complexes.

      6) Lines 225-226 - again, MLN4924 is not an inhibitor of SCFbTrCP, but rather a CRL inhibitor. The evidence for bTrCP being the key ligase is still missing at this point in the narrative.

      We now present evidence in an earlier figure that βTrCP is the F-Box involved in SUN2 degradation. In this context, the sentence appears correct.

      7) Fig. 5G is not especially convincing - to my eye, the effect on endogenous SUN2 is very similar to the effect on the transgene SUN2-site2A mutant, but simply a fainter exposure. Can the authors provide some numbers to allay this concern? It might well be that there is little difference between the behaviour of the endogenous and exogenous SUN2 in this experiment because they engage in heterotrimeric complexes. Also, why is the transgenic SUN2 not detected on the SUN2 blot? Would it not be evident at ~100 kD?

      We have consistently seen that SUN2 Site 2A is refractory to CTDNEP1 regulation. The blot has been replaced to better convey this result.

      The transgenic SUN2 is not detected in this blot because while the same cell lines were used for this experiment, to visualise the endogenous SUN2, doxycycline were not added to these cells. Thus, two sets of lysates were collected, one for cells that were treated with Doxycycline (transgene) and one without Doxycycline (endogenous). This is explained in the figure legend.

      8) In panel 1E, the heterologously expressed SUN2 protein has two bands, with the upper band being more readily degraded than the lower band in some cases. Is the upper band the phosphorylated product? Might be worth a comment if anything is known about what the two bands represent.

      We believe that the two bands do not correspond to different phosphorylated SUN2 forms. This is based on the analysis of SUN2 by SDS-PAGE in presence of Phostag reagent and the fact that two bands are seen both also for non-phosphorylatable and phospho-mimetic SUN2 derivatives. The appearance of two bands has been observed for other ERAD substrates characterized in our lab (for example Weijer et al. 2020) and appears to depend on the lysis conditions (see for example Figure 2 and 3).

      9) Worth mentioning in the main text that FBXW11 is bTRCP2. Also, it is worth noting whether bTRCP1 (FBXW1A) was a hit on the screen or not.

      Thanks for the suggestion. We have now included this information.

      Reviewer #3 (Public Review):

      The manuscript by Krshnan et al. reports a cellular mechanism akin to the endoplasmic reticulum-associated degradation (ERAD) that degrades SUN2, a nuclear inner membrane protein. The authors previously identified the Asi ubiquitin ligase complex that mediates the degradation of inner nuclear membrane proteins in budding yeast. In this manuscript, they identified the SCF β TrCP, and SCF as another ligase that regulates the ubiquitination and degradation of SUN2 in mammalian cells. The key findings include the identification of a substrate recognition motif that appears to undergo casein kinase (CK) dependent phosphorylation. Mutagenesis studies show that mutants defective in phosphorylation are stabilized while a phosphor-mimetic mutant is more unstable. They further show that the degradation of SUN2 requires the AAA ATPase p97, which allows them to draw the analogy between SUN2 degradation and Vpu-induced degradation of CD4, which occurs on the ER membrane via the ERAD pathway. Lastly, they show that the stability of endogenous SUN2 is regulated by a phosphatase and that over-expression of a non-degradable SUN2 variant disrupts nuclear envelope morphology, cell cycle kinetics, and DNA repair efficiency. Overall, the study dissects another example of inner nuclear envelope protein turnover and the involvement of a pair of kinase and phosphatase in this regulation. The data are of extremely high quality and the manuscript is clearly written. That being said, the following questions should be addressed to improve the robustness of the conclusions and to avoid potential misinterpretation of the data.

      1) Since SUN2 is normally incorporated into a SUN2-SYNE2-KASH2 LINC heterohexamer complex, the authors should be cautious with the use of over-expressed SUN2 in this study. Over-expressed SUN2 is expected to stay mostly as unassembled molecules and thus is likely degraded by a protein quality control mechanism that targets unassembled proteins. Consistent with this possibility, CK2 has been implicated in the regulated turnover of aggregation-prone proteins (Watabe, M. et al., JCS 2011). This mechanism would be potentially distinct from the one proposed for endogenous SUN2 degradation.

      We thank reviewers for the suggestion to provide further genetic evidence of the involvement of βTrCP1 and 2 F-box proteins in the degradation of SUN2. We now show that maximum stabilization of endogenous (Figure Supplement S4D) and transgenic (Figure S2) SUN2 is observed upon simultaneous depletion of βTrCP1 and βTrCP2 indicating that these F-Box proteins are redundant. Depletion of βTrCP1 alone did not impact SUN2 levels while depletion of βTrCP2 increased SUN2 steady state levels, with the effect being more pronounced for overexpressed SUN2. Depletion of other F-Box proteins did not affect SUN2 levels indicating that the effect observed for βTrCP1 is specific (Figure S2B). These results are in line with the results of our genome wide screen (Figure 4 and S3) and the literature. The differences in the effects of βTrCP1 and βTrCP2 depletion likely result from the relative abundance of the two F-Box proteins in the HEK cells used in this study.

      2) Certain conclusions appear to be an overstatement. This is particularly the case for the title, which implies that SUN2 is a protein that undergoes regulated turnover (under certain physiological conditions). Given that CK2 is a constitutive kinase and that the authors have not identified the conditions under which the activity of CTDNEP1 is regulated, it is premature to make such a conclusion.

      We disagree with the reviewer in this point. We present clear evidence that the turnover rate of SUN2 (both overexpressed and endogenous) is regulated by opposing kinase/phosphatase activities. This per se implies a mode of regulation. Similar kinase/phosphatase balances regulate a plethora of physiologic processes (from cell cycle progression to DNA repair) and the term “regulation” is commonly used in these contexts. We agree with the reviewer that upstream events controlling SUN2 remain elusive however, we do present evidence the balance of CK2 and CTDNEP1 activities regulate SUN2 degradation.

      3) Likewise, the demonstration of the impact of SUN2 accumulation on different cellular pathways mainly relies on the over-expression of a non-degradable SUN2 mutant. Whether similar defects could be seen when the degradation of endogenous SUN2 is blocked remains an open question.

      It would be great to gene edit the SUN2 locus to introduce the desired mutations. But as pointed out this is not trivial, in particular considering that the desired mutations would need to be introduced in both chromosomal copies.

    1. Author Response

      Reviewer #1 (Public Review):

      “Overall this is an interesting study of the function of ATP6AP2 in the osteoblastic lineage. This gene is unstudied in the osteoblast, despite its known role in WNT signaling. In this study, the authors first show that loss of this gene in mature osteoblasts results in a strong cortical bone phenotype, with reduced osteocyte numbers and disorganized collagen. This phenotype is not present at birth but progressively worsens as the animals reach weaning age. In the compact bone, they show that loss of ATP6AP2 results in osteocytes largely devoid of dendritic processes. Loss of this gene starting at the osteocyte stage results in a milder phenotype. They then show that the osteocytes presenting have reduced MMP14 and that partial restoration of MM14 attenuates the severity of the cortical phenotype.”

      Strengths

      “This study uses cutting-edge microscopy to thoroughly characterize how and where the loss of ATP6AP2 in either the mature osteoblast or the osteocyte results in disorganized bone. Innovative proteomics techniques are used to identify cell surface proteins, including MMP14 that may mediate this phenotype. Two cre-drivers are used to determine when in the osteoblast-osteocyte lineage this gene has the maximum effect. Lastly, in vivo lentivirus replacement is used to test if the replacement of MMP14 can rescue the phenotype. This latter experiment solidifies the importance of MMP14 as a major player in the downstream sequela of ATP6AP2 action.”

      Weaknesses

      “Unfortunately, all of the histology is conducted on demineralized bone, and counts of osteoblasts and osteoclasts on the bone surface are not presented. This reduces the ability to interpret all downstream work. As such, the extent of the mineralization defects is difficult to interpret. Much of this paper is focused on the osteocyte, which is curious as the phenotype of the mature osteoblasts ATP6AP2 knockout mice is so much more severe than that of the osteocyte ATP6AP2 knockout mice. While it is clear how MMP14 was identified as being deficient in the mature osteoblasts ATP6AP2 knockout cells, it is not obvious how this gene became the sole focus of the remainder of this paper. This phenotype progresses as the mice become ambulatory and therefore weight bearing on their limbs. This could partially explain the presentation of the mouse phenotype, but this is not discussed.”

      Good suggestions! We have performed the suggested experiments on mineralized bone sections, and quantified both osteoblasts and osteoclasts on the bone surface.

      The results, shown in Fig. 1A-B above, demonstrated increased osteoclast numbers in both trabecular and endocortical bone surfaces in ATP6AP2 mutant mice, which were accompanied with elevated bone resorption (see Fig. 1C, measured by serum levels of PYD). However, upon bisphosphonates (alendronate) treatment, an inhibitor of osteoclastic activity, the trabecular bone mass was restored, but little effect on the cortical bone phenotype in the mutant mice (Fig. 2A-F). These results thus suggest an osteoclast activity independent cortical bone phenotype in the mutant mice.

      We thus further investigated the cortical bone phenotypes and the osteoclast independent underlying mechanisms. Whereases no significant change in the number of osteoblasts was detected in the metaphysis region of femur in ATP6AP2Ocn-cre mice (see Fig. 6A-B below), we did detect mineralization deficit in the mutant mice by both in vivo and in vitro experiments (see Fig. 5 and Fig. 7). These results suggest that the increased cortical woven bone in the mutant mice is likely due to an impairment in the replacement of woven bone with the mineralized cortical bone matrix.

      Additionally, the expression levels of ATP6AP2 in osteocytes appeared to be similar to that in osteoblasts and BMSCs (Fig. 8). The phenotype of osteocyte ATP6AP2 knockout mice (ATP6AP2DMP-Cre) appeared to be weaker than that of the BMSCs/osteoblasts ATP6AP2 knockout mice (ATP6AP2OCN-Cre) led us to speculate that ATP6AP2 in Ocn-Cre+ osteoblastic cells (e.g., immature osteocytes) may play a more critical role than that in DMP1-Cre+ mature osteocytes in regulating cortical bone matrix remodeling and osteocyte development.

      These results and points, in line with our model, will be included into a revised manuscript.

      Reviewer #3 (Public Review):

      “In this work, the authors have assessed the bone phenotype of a mouse with targeted ablation of the vacuolar ATPase accessory protein ATP6AP2 in the osteoblast lineage. They observe a clear increase in cortical thickness, but the cortex is highly porous and contains remnant cartilage as well as extensive woven bone. They then follow this by suggesting that one cause of this phenotype may be a change in the surface expression of the protein MMP14, a matrix metalloproteinase, known to be involved in bone matrix degradation, at least in osteoclasts. They provide evidence that this protein may also regulate matrix degradation surrounding osteocytes and an increase in this protein in osteocytes lacking ATP6AP2 may be a cause of the initial phenotype described.”

      While the phenotype described is very dramatic, the interpretation that it reflects a defect in osteoblast to osteocyte transition is questioned by this reviewer. The phenotype appears to be an osteopetrosis, including a lack of remodelling of the cortex. Cartilage and woven bone are not replaced effectively by lamellar bone. The bone contains ample osteocytes, but they are the osteocytes typical of woven bone, with rounded cell bodies, disordered organisation, low sclerostin expression, and short dendritic processes. The defect in the ATP6AP2 mice is a lack of cortical remodelling during cortical consolidation (for review see PMID: 34196732). Cartilage and woven bone remnants, which are normally remodelled as cortical bone matures, remain in the cortex until adulthood. It is not clear whether this results from reduced or increased remodelling of the cortex, but it is not because the osteoblasts cannot form osteocytes.

      Some of the data is very challenging to interpret because of low sample numbers (n=4 for much of the analysis), and lack of detail as to the sex of the animals. Regions used for imaging, histomorphometry, and dynamic histomorphometry all need to be defined throughout the work. Since the cortex differs dramatically by site, and by distance from the growth plate (due to the different stages of maturation) this is critical. Some methods are not defined, although they could be of great use to the field (e.g. the method for assessing bone degradation by MMP14).

      Good suggestions! We will describe the results more precisely in a revised manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      In this work, Bentley et al. describe the development and use of a novel microfluidic platform to study motility of green algae. By confining algae to circular corrals of various diameters (and with a height that renders the system quasi-two-dimensional), the authors gather extremely long time series of the swimming trajectories under various degrees of lateral confinement, in the presence of several different kinds of perturbations.

      The data is presented in a number of ways, most importantly by means of transitions between the three characteristic states of motion for these algae. This allows contact to be made with ideas from nonequilibrium dynamical systems by examining the transition probabilities between those states and identifying nonequilibrium characteristics of the fluxes between them.

      Overall the work is extremely impressive in terms of the data acquisition and careful time series analysis. The work falls short though in not following through on the many interesting observations that can be deduced from the data to come to precise conclusions about the biology and physics. For example, we see in Figs. 2 and 3 the effects of confinement on the trajectories, leading to clearly chiral motion at the strongest confinement. I would have expected the next step of the analysis to be a study of this problem in the context of, say, a Fokker-Planck equation for the probability distribution function for orientations, complete with boundary conditions that encode the scattering laws that we know from prior work by Kantsler et al. and others. Similar comments can be made about the other observations, which are followed up with any clear mechanistic analysis or comparison with theory.

      The example above suggests that this paper, in its current form, is more akin to a "Methodology" paper than one that discovers new phenomena and explains them.

      We thank the reviewer for their summary of our work and for these pertinent comments. As discussed above, we performed new experiments and modelling to successfully answer the main question of why chiral circling appears in the smallest traps (highest confinement), and also why the chirality depends on light. As demonstrated in the prior Cammann et al PNAS (and to some extent also the Ostapenko et al, PRL) study, encoding the scattering laws measured from Kantsler et al for a basic swimmer produces chiral motion (circular movement). An analytical treatment in terms of FP equations already appears in these prior studies. However, the novelty here is why this circular movement should remain chiral in the time-averaged sense.

      In the revisions, we restrict ourselves to a conceptual-level where we show how a small internal asymmetry at the cellular level suffices to produce macroscopic chirality and how this depends on the size of the trap. Our new explanation reaches a precise conclusion about how a fundamentally biological phenomenon (slightly asymmetric flagellar beating in a phototactic swimmer), leads to a confinementinduced physical phenomenon (a preferred sense of circular swimming).

      In a separate follow-up study, we will extend our model to incorporate more realistic parameters from the current dataset (e.g. time-dependent speed, stochastic reorientations, shock-responses, softness of the potential etc) to understand more subtle aspects of the high-resolution data we acquired.

      Reviewer #2 (Public Review):

      The authors use microfluidic devices to follow single swimmers for long periods, measuring their movement in detail and allowing detailed statistics at a level that has never been possible before and machine learning.

      Its strength is the extraordinary detail and the doors opened by the quality of the resultant data. As such it makes a substantial contribution to a narrow field and adds slightly more subtly to an important field of full mathematically accessible descriptions of migration phenotypes.

      Its weakness is that these tools are not yet used for any particularly enlightening tests. The directed probability fluxes are interesting, but not surprising. The strength of this paper is in the method, the analysis, and the ability to generate rigorous datasets.

      We thank the reviewer for highlighting the quality and detail of our datasets, and we agree with the criticisms raised. We hope these weaknesses are now rectified in the revision. There is clearly scope to do much more now that we have access to this data, and we demonstrate this in the revision with our new model/interpretation.

      We highlight three innovations of our work that may not have been made clear in the previous draft.

      1. We have suggested a new paradigm for analysis of microbial motility and behaviour. The extraction of state transition probabilities from single-cell trajectories reveals exactly how motility changes at the subcellular level, which is much more informative than whether an organism speeds up or slows down on average. This tells us how does a given individual modulates the balance of possible behavioural states in response to their environment and also over time. These concepts apply not just microbes but to any behaving organism.

      2. The emphasis on keeping track of the ‘arrow of time’ in the analysis of movement trajectories in important, can again be applied to any organism. As discussed above, while circling behaviour or symmetry breaking in confinement may not be surprising itself (though that does not prevent the flurry of experimental and theoretical papers on this topic), we argue that the emergence of chirality in the timeaveraged trajectory is surprising and does requires more subtle treatment. We now suggest this is down to a very small amount of internal symmetry breaking – it is interesting that such a small amount of symmetry-breaking at the sub-cellular scale can manifest as robust symmetry-breaking at the macroscopic scale.

      This kind of insight has broad implications for understanding how (even simple) organisms can dramatically alter how they interact with their physical environment by effecting even minute internal adjustments. This could also motivate the design of novel biomimetic artificial devices or microswimmers.

      1. Our approach of fusing droplets to investigate rapid motility responses to chemicals has plenty of potential for drug screening and also for investigating cellular transduction pathways (e.g. functional assays of mutants). We demonstrate its operation here as proof of concept on one species for one chemical only, but there are clear advantages over traditional approaches involving setting up chemical gradients or similar, where it is impossible to get a handle on instantaneous cell reactions nor individual-level responses.
    1. Author Response

      Reviewer #1 (Public Review):

      This paper follows several innovative articles from the authors exploring the molecular mechanisms of insulin and IGF1 receptors activation by their ligands using cryo-electron microscopy. Here the authors explore the role of an alpha helical C-terminal segment (called the alpha-CT motif) of a disordered disulfide-linked insert domain in the FnIII-2 module of the insulin and IGF1 receptors (at the end of the alpha subunit), in the mechanism of ligand binding, negative cooperativity and receptor activation.

      Biochemical data gathered over several decades have suggested that insulin and IGF1 use two separate binding sites, site 1 and site 2, to bind to two distinct domains (sites 1 and 2, and 1'and 2') on each protomer of the homodimeric receptors, disposed in an antiparallel symmetry. This disposition was corroborated by the early x-ray crystallographic studies of the unliganded insulin receptor ectodomain (apo-receptor). A subsequent somewhat surprising finding was that the insulin receptor site 1 is in fact a composite, made of the beta surface of the L1 module of one protomer, and of the alpha-CT motif of the other protomer which binds perpendicularly to the L1 surface (a "tandem binding element"), with insulin binding more to the alpha-CT motif than to L1.

      Previous work from the authors showed that the subsaturated insulin receptor has an asymmetric configuration while the receptor saturated with 4 insulins has a symmetric T-shaped configuration. In contrast, the IGF1R shows only one IGF1 bound to an asymmetric configuration, indicating according to the authors a stronger negative cooperativity. This is attributed to a more rigid and elongated conformation of the alpha-CT motives that restricts the structural flexibility of the alternate binding site.

      To test this hypothesis, the authors determined the cryo-EM structure of IGF1 bound to IGF1R with a mutated alpha-CT motif elongated by four glycine residues. Strikingly, a portion of these constructs adopt a T-shaped symmetric structure.

      Conversely, they show that the cryo-EM structure of insulin bound to an insulin receptor with non-covalently bound alpha-CTs insert domains by mutation of the cysteines to serine adopts asymmetric conformations even at saturated insulin concentrations. They conclude that the alpha-CTs in disulfide-linked insert domains of the insulin receptor play an important role in the structural transition from asymmetric to symmetric during the insulin-induced insulin receptor activation.

      All in all, this is a very interesting and well-designed study that represents an advance in the knowledge of the insulin/IGF1 receptor systems, although the details of the structural interpretations deserve some discussion.

      This is very clear and succinct summary of our work. We thank Dr. Pierre De Meyts for the positive assessment of our manuscript, and we greatly appreciate the constructive comments which we have addressed.

      Reviewer #2 (Public Review):

      Li et al build upon recent observations that the alphaCT peptide is a key element in the IGF-1R and IR that regulates negative cooperativity and receptor activation. The use of IGF-1R and IR mutants builds upon previous observations with these mutants by Li et al (IGF-1R) and Weis et al. (IR).

      Here they determined the structures of the IGF-1R mutant, IGF-1R-P673G4, which has a 4 glycine motif inserted at residue P673 at 4Å resolution. By introducing structural flexibility in the alphaCT the IGF-1R is able to bind 2 IGFs and adopts a symmetric T conformation, which is in contrast to the single IGF bound WT IGF-1R that adopts an asymmetric conformation. The ability to bind two IGFs is taken as a sign that negative cooperativity has been affected and confirms the importance of the alphaCT in constraining the IGF-1R into the asymmetric conformation. The increased flexibility of the alphaCT linkage between the two receptor monomers results in reduced ability of IGF-I to activate the IGF-1R and Erk leading to reduced IGF-1R internalisation. This is consistent with previous reports that effective Erk signalling is dependent on endosomal signalling. A second mutant, IGF-1R -3CS, was also used where a cysteine triplet in the alphaCT is mutated to serine to perturb the disulfide bonding between the alphaCTs of the two monomers. IGF-1R activation and signalling by this mutant was also reduced.

      In addition, the structure of an equivalent insulin receptor mutant, IR-3CS, was determined with complexes formed with excess insulin. Again, the increased flexibility of the alphaCT altered the structural rearrangement upon ligand binding. Three conformations with 4 insulins bound were detected, two unique asymmetric (4.5 Å and 4.9 Å) and one symmetric (3.7 Å), whereas WT IR:insulin complex predominantly forms a symmetric T 4 insulin bound structure. This suggests the IR alphaCT is important in stablising the active T structure. In contrast to the IGF-1R -3CS, the IR-3CS has the same affinity for insulin as WT IR, is more potently activated (pY1150/Y1151) by insulin but has a reduced signalling response. This demonstrates the role of alphaCT in the activation.

      Whilst the symmetric IR-3CS:insulin complex structure is compared with the WT IR: insulin complex, no comparisons were made between the asymmetric conformations described here and those previously reported. Is the ligand binding in the asymmetric conformations different to the asymmetric binding seen when WT IR:insulin complexes were generated at low insulin concentrations? It would be interesting to see these overlaid. How do these asymmetric conformations relate to the existing asymmetric conformations reported by Nielsen (10.1016/j.jmb.2022.167458) and Xiong (DOI 10.1038/s41589-022-00981-0)?

      Thank you for the good suggestions. We have now prepared a new Figure 4-supplement 2 that compares the structure of asymmetric IR-3CS/insulin with that of asymmetric IR bound with subsaturated insulin previously published by us and others. All asymmetric structures of IR bound with subsaturated insulin have similar structural features, i.e., in one half of the complex, one insulin bound at site-1 also contacts site-2 from adjacent protomer, or vice versa. However, in the asymmetric structure of IR-3CS/insulin, two insulins were bound at the hybrid site in the middle of the IR-3CS/insulin complex. To accommodate the binding of two insulins, the L1/αCT together with bound site-1 insulin move outward as compared to the asymmetric structure of IR bound with subsaturated insulin. This is the major structural difference between these asymmetric structures. We have discussed this in the revised manuscript.

      What is the distance between the FnIII-3 domains of the IR:insulin asymmetric conformations and in the symmetric structure? Does this correlate to activity as is seen for the IGF-1R-P673G4? It would be good to comment on this, particularly as there is an interesting disconnect between the receptor activation and downstream signalling activity. Why is there greater pY1150/Y1151 activation than for the WT IR and how can the lower downstream signalling activity be explained?

      We thank Dr. Briony Forbes for raising this point. Asymmetric IR-3CS/insulin, asymmetric IR/insulin and symmetric IR/insulin have similar distances between their membrane-proximal regions (approximately 30 – 35 Å). This indicates that the distances between the membrane-proximal regions within these complexes are all short enough to allow the intracellular kinase to undergo efficient autophosphorylation, in contrast to IGF1R.

      As indicated by Dr. Forbes, our cellular functional assays showed that the IR-3CS has higher levels of autophosphorylation, but lower levels of downstream signaling activity and a defect in endocytosis. Although the distances between the membrane-proximal regions are similar, the relative positions and orientations between the two membrane proximal regions are significantly different between asymmetric IR-3CS and symmetric IR. Given the fact that the FnIII-3 domain is connected to the transmembrane domain by a short linker containing four residues, we speculate that the structural differences in the extracellular domains may lead to both differential dimeric assembly of transmembrane and intracellular domains, as well as the stable interaction between the intracellular IR domains and downstream adaptors and effectors. This could in part explain why IR-3CS can still undergo robust autophosphorylation but its downstream signaling becomes defective. Similar hypothesis has been proposed in the EGF and TGF-α induced activation of EGFR (PMID: 34846302). The endocytosis defects of IR-3CS might be the result of reduced IR signaling, but it is tempting to speculate that less endocytosis of IR-3CS may cause defective downstream signaling. The structure of transmembrane and intracellular domains in the context of the entire full-length/insulin complex needs to be further investigated. We have included new analysis and expanded the discussion.

      It would be good to reword the opening statement that "IGF1 only has one type of ligand binding site (site-1)" to acknowledge that two binding sites on IGF-I have been detected through analysis of competition binding studies which are fitted to a two-site sequential model and detect both high affinity and low affinity binding (Kiselyov). Site directed mutagenesis studies of both IGFs have detected two binding surfaces analogous to insulin's site 1 and site 2 (Gaugin et al and Alvino et al). Furthermore, binding assays with mini-IGF-1R (L1, CR, L2 fused to alphaCT, ie site 1 only) clearly demonstrated that IGF-II site 2 residues do contribute to overall binding affinity (Alvino et al). Perhaps we are yet to capture site 2 of IGF-1R as it is not in the same location as IR site 2? It would be good to comment on this.

      Point accepted. Gauguin L. et al. demonstrated that alanine mutagenesis in IGF1, including E9A, D12A, F16A, D53A, L54A, and E58A, markedly reduced IGF1R-binding affinity. With the exception of IGF1 E9 (Site-1b of IGF1R, the same position in IGF2, E12), none of IGF1 D12, F16, D53, L54, and E58 are involved in the binding to site-1, suggesting that IGF1 has an additional site that maximizes the binding to the receptor. Despite saturated IGF1 levels, however, our previous and current structural studies did not reveal the putative site-2 of IGF1R-IGF1 binding. We speculate that IGF1 binds to site-2 transiently, which might be important for IGF1-induced activation of IGF1R. We have revised the manuscript and expanded the discussion.

      Reviewer #3 (Public Review):

      Li et al. present cryo-EM structures of the insulin receptor (IR) and insulin-like growth factor-1 receptor (IGF1R), exploring the functional roles of the disulfide-linked alphaCT regions in ligand binding and receptor activation.

      Cryo-EM structures of mutants of IGF1R and IR designed to increase the flexibility between disulfide-linked alphaCT regions revealed conformational states that were distinct from those of the wild-type (WT) receptors. Mutant (P673G4) IGF1R displayed conformations in which two IGF1 molecules were bound, rather than the 1:1 ligand:receptor state observed previously for WT IGF1R. Mutant (3CS) IR displayed asymmetric conformations with four insulin molecules bound, as well as the symmetric T conformation with four insulin molecules bound observed previously for WT IR. In each case, the mutant receptor was shown in cells to be poorly activated by its respective ligand.

      This study demonstrates the importance of the disulfide-coupled alphaCT regions in the IR and IGF1R for ligand binding and receptor activation. What is not resolved in this study is whether differences in the alphaCT regions of these two highly related receptors contribute to their disparate active states - asymmetric for IGF1R (and 1:1 IGF1:IGF1R) vs. symmetric (T) for IR (and 4:1 insulin:IR).

      We thank Dr. Stevan Hubbard for the positive assessment of our manuscript, and we greatly appreciate the constructive comments which we have addressed.

    1. Author Response

      Reviewer #1 (Public Review):

      In one of the most creative eDNA studies I have had the pleasure to review, the authors have taken advantage of an existing program several decades old to address whether insect declines are indeed occurring - an active area of discussion and debate within ecology. Here, they extracted arthropod environmental DNA (eDNA) from pulverized leaf samples collected from different tree species across different habitats. Their aim was to assess the arthropod community composition within the canopies of these trees during the time of collection to assess whether arthropod richness, diversity, and biomass were declining. By utilizing these leaf samples, the greatest shortcoming of assessing arthropod declines - the lack of historical data to compare to - was overcome, and strong timeseries evidence can now be used to inform the discussion. Through their use of eDNA metabarcoding, they were able to determine that richness was not declining, but there was evidence of beta diversity loss due to biotic homogenization occurring across different habitats. Furthermore, their application of qPCR to assess changes in eDNA copy number temporally and associate those changes with changes to arthropod biomass provided support to the argument that arthropod biomass is indeed declining. Taken together, these data add substantial weight to the current discussion regarding how arthropods are being affected in the Anthropocene.

      Thank you very much for the positive assessment of our work.

      I find the conclusions of the paper to be sound and mostly defensible, though there are some issues to take note of that may undermine these findings.

      Firstly, I saw no explanation of the requisite controls for such an experiment. An experiment of this scale should have detailed explanations of the field/equipment controls, extraction controls, and PCR controls to ensure there are no contamination issues that would otherwise undermine the entirety of the study. At one point in the manuscript the presence of controls is mentioned just once, so I surmise they must exist. Trusting such results needs to be taken with caution until such evidence is clearly outlined. Furthermore, the plate layout which includes these controls would help assess the extent of tag-jumping, should the plate plan proposed in Taberlet et al., 2018 be adopted.

      Second, without the presence of adequate controls, filtering schemes would be unable to determine whether there were contaminants and also be unable to remove them. This would also prevent samples from being filtered out should there be excessive levels of contamination present. Without such information, it makes it difficult to fully trust the data as presented.

      Finally, there is insufficient detail regarding the decontamination procedures of equipment used to prepare the samples (e.g., the cryomil). Without clear explanations of the steps the authors took to ensure samples were handled and prepared correctly, there is yet more concern that there may be unseen problems with the dataset.

      We are well aware of the potential issues and consequences of contamination in our work. However, we are also confident that our field and laboratory procedures adequately rule out these issues. We agree with the reviewer that we should expand more on our reasoning. Hence, we have now significantly expanded the Methods section outlining controls and sample purity, particularly under “Tree samples of the German Environmental Specimen Bank – Standardized time series samples stored at ultra-low temperatures” (lines 303-304), “Test for DNA carryover in the cryomill” (lines 448-464) and “Statistical analysis” (lines 570-575).

      We ran negative control extractions as well as negative control PCRs with all samples. These controls were sequenced along with all samples and used to explore the effect of experimental contamination. With the exception of a few reads of abundant taxa, these controls were mostly clean. We report this in more detail now in the Methods under “Sequence analysis” (lines 570-575). This suggests that our data are free of experimental contamination or tag jumping issues.

      We have also expanded on the avoidance of contamination in our field sampling protocols. The ESB has been set up for monitoring even the tiniest trace amounts of chemicals. Carryover between samples would render the samples useless. Hence, highly clean and standardized protocols are implemented. All samples are only collected with sterilized equipment under sterile conditions. Each piece of equipment is thoroughly decontaminated before sampling.

      The cryomill is another potential source of cross-contamination. The mill is disassembled after each sample and thoroughly cleaned. Milled samples have already been tested for chemical carryover, and none was found. We have now added an additional analysis to rule out DNA carryover. We received the milling schedule of samples for the past years. Assuming samples get contaminated by carryover between milling runs, two consecutive samples should show signatures of this carryover. We tested this for singletaxon carryover as well as community-wide beta diversity, but did not find any signal of contamination. This gives us confidence that our samples are very pure. The results of this test are now reported in the manuscript (Suppl. Fig 12 & Suppl. Table 3).

      Reviewer #2 (Public Review):

      Krehenwinkel et al. investigated the long-term temporal dynamics of arthropod communities using environmental DNA (eDNA) remained in archived leave samples. The authors first developed a method to recover arthropod eDNA from archived leave samples and carefully tested whether the developed method could reasonably reveal the dynamics of arthropod communities where the leave samples originated. Then, using the eDNA method, the authors analyzed 30-year-long well-archived tree leaf samples in Germany and reconstructed the long-term temporal dynamics of arthropod communities associated with the tree species. The reconstructed time series includes several thousand arthropod species belonging to 23 orders, and the authors found interesting patterns in the time series. Contrary to some previous studies, the authors did not find widespread temporal α-diversity (OTU richness and haplotype diversity) declines. Instead, β-diversity among study sites gradually decreased, suggesting that the arthropod communities are more spatially homogenized in recent years. Overall, the authors suggested that the temporal dynamics of arthropod communities may be complex and involve changes in α- and β-diversity and demonstrated the usefulness of their unique eDNA-based approach.

      Strengths:

      The authors' idea that using eDNA remained in archived leave samples is unique and potentially applicable to other systems. For example, different types of specimens archived in museums may be utilized for reconstructing long-term community dynamics of other organisms, which would be beneficial for understanding and predicting ecosystem dynamics.

      A great strength of this work is that the authors very carefully tested their method. For example, the authors tested the effects of powdered leaves input weights, sampling methods, storing methods, PCR primers, and days from last precipitation to sampling on the eDNA metabarcoding results. The results showed that the tested variables did not significantly impact the eDNA metabarcoding results, which convinced me that the proposed method reasonably recovers arthropod eDNA from the archived leaf samples. Furthermore, the authors developed a method that can separately quantify 18S DNA copy numbers of arthropods and plants, which enables the estimations of relative arthropod eDNA copy numbers. While most eDNA studies provide relative abundance only, the DNA copy numbers measured in this study provide valuable information on arthropod community dynamics.

      Overall, the authors' idea is excellent, and I believe that the developed eDNA methodology reasonably reconstructed the long-term temporal dynamics of the target organisms, which are major strengths of this study.

      Thank you very much for the positive assessment of our work.

      Weaknesses:

      Although this work has major strengths in the eDNA experimental part, there are concerns in DNA sequence processing and statistical analyses.

      Statistical methods to analyze the temporal trend are too simplistic. The methods used in the study did not consider possible autocorrelation and other structures that the eDNA time series might have. It is well known that the applications of simple linear models to time series with autocorrelation structure incorrectly detect a "significant" temporal trend. For example, a linear model can often detect a significant trend even in a random walk time series.

      We have now reanalyzed our data controlling for autocorrelation and for non-linear changes of abundance and recover no change to our results. We have added this information to the manuscript under “Statistical analysis” (lines 629-644).

      Also, there are some issues regarding the DNA sequence analysis and the subsequent use of the results. For example, read abundance was used in the statistical model, but the read abundance cannot be a proxy for species abundance/biomass. Because the total 18S DNA copy numbers of arthropods were quantified in the study, multiplying the sequence-based relative abundance by the total 18S DNA copy numbers may produce a better proxy of the abundance of arthropods, and the use of such a better proxy would be more appropriate here. In addition, a coverage-based rarefaction enables a more rigorous comparison of diversity (OTU diversity or haplotype diversity) than the readbased rarefaction does.

      We did not use read abundance as a proxy for abundance, but used our qPCR approach to measure relative copy number of arthropods. While there are biases to this (see our explanations above), the assay proved very reliable and robust. We thus believe it should indeed provide a rough estimate of biomass. As biomass is very commonly discussed in insect decline (in fact the first study on insect decline entirely relies on biomass; Hallmann et al. 2017), we feel it is important go include a proxy for this as well. However, we also discuss the alternative option that a turnover of diversity is affecting the measured biomass. A pattern of abundance loss for common species has been described in other works on insect decline.

      We liked the reviewer’s suggestion to use copy number information to perform abundance-informed rarefaction. We have done this now and added an additional analysis rarefying by copy number/biomass. A parallel analysis using this newly rarefied table was done for the total diversity as well as single species abundance change. Details can be found in the Methods and Results section of the manuscript. However, the result essentially remains the same. Even abundance-informed rarefaction does not lead to a pattern of loss of species richness over time (see “Statistical analysis”).

      The overall results are supporting a scenario of no overall loss of species richness over time, but a loss of abundance for common species. And we indeed see the pattern of declining abundance for once-common species in our data, for example the loss of the Green Silver-Line moth, once a very common species in beech canopy (Suppl. Fig. 10). We have added details on this to the Discussion (lines 254-260).

      These points may significantly impact the conclusions of this work.

      Reviewer #3 (Public Review):

      The aim of Weber and colleagues' study was to generate arthropod environmental DNA extracted from a unique 30-year time series of deep-frozen leaf material sampled at 24 German sites, that represent four different land use types. Using this dataset, they explore how the arthropod community has changed through time in these sites, using both conventional metabarcoding to reconstruct the OTUs present, and a new qPCR assay developed to estimate the overall arthropod diversity on the collected material. Overall their results show that while no clear changes in alpha diversity are found, the βdiversity dropped significantly over time in many sites, most notable in the beech forests. Overall I believe their data supports these findings, and thus their conclusion that diversity is becoming homogenized through time is valid.

      Thank you for the positive assessment.

      While overall I do not doubt the general findings, I have a number of comments. Firstly while I agree this is a very nice study on a unique dataset - other temporal datasets of insects that were used for eDNA studies do exist, and perhaps it would be relevant to put the findings into context (or even the study design) of other work that has been done on such datasets. One example that jumps to my mind is Thomsen et al. 2015 https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/1365-2656.12452 but I am sure there are others.

      We have expanded the introduction and discussion on this citing this among other studies now (lines 71-72, 276-278).

      From a technical point of view, the conclusions of course rely on several assumptions, including (1) that the biomass assay is effective and (2) that the reconstructed levels of OTU diversity are accurate,

      With regards to biomass although it is stated in the manuscript that "Relative eDNA copy number should be a predictor for relative biomass ", this is in fact only true if one assumes a number of things, e.g. there is a similar copy number of 18s rDNA per species, similar numbers of mtDNA per cell, a similar number of cells per individual species etc. In this regard, on the positive side, it is gratifying to see that the authors perform a validation assay on 7 mock controls, and these seem to indicate the assay works well. Given how critical this is, I recommend discussing the details of this a bit more, and why the authors are convinced the assay is effective in the main text so that the reader is able to fully decide if they are in agreement. However perhaps on the negative side, I am concerned about the strategy taken to perform the qPCR may have not been ideal. Specifically, the assay is based on nested PCR, where the authors first perform a 15cycle amplification, this product is purified, then put into a subsequent qPCR. Given how both PCR is notorious for introducing amplification biases in general (especially when performed on low levels of DNA), and the fact that nested PCRs are notoriously contamination prone - this approach seems to be asking for trouble. This raises the question - why not just do the qPCR directly on the extracts (one can still dilute the plant DNA 100x prior to qPCR if needed). Further, given the qPCRs were run in triplicate I think the full data (Ct values) for this should be released (as opposed to just stating in the paper that the average values were used). In this way, the readers will be able to judge how replicable the assay was - something I think is critical given how noisy the patterns in Fig S10 seem to be.

      We agree with this point, and this is why we do not want to overstate the decline in copy number. This is an additional source of data next to genetic and species diversity. We have added to our discussion of turnover as another potential driver of copy number change (lines 257-260). We have also added text addressing the robustness of the mock community assay (lines 138-141).

      However, we are confident of the reliability and robustness of our qPCR assay for the detection of relative arthropod copy number. We performed several validations and optimizations before using the assay. We have added additional details to the manuscript on this (see “Detection of relative arthropod DNA copy number using quantitative PCR”, lines 548-556). We got the idea for the nested qPCR from a study (Tran et al.) showing its high accuracy and reproducibility. We show that our assay has a very high replicability using triplicates of each qPCR, which we will now include in the supplementary data on Dryad. The SD of Ct values is very low (~ 0.1 on average). NTC were run with all qPCRs to rule out contamination as an issue in the experiments. We also find a very high efficiency of the assay. At dilutions far outside the observed copy number in our actual leaf data, we still find the assay to be accurate. We found very comparable abundance changes across our highly taxonomically diverse mock communities. This also suggests that abundance changes are a more likely explanation than simple turnover for the observed drop in copy number. A biomass loss for common species is well in line with recent reports on insect decline. We can also rely on several other mock community studies (Krehenwinkel et al. 2017 & 2019) where we used read abundance of 18S and found it to be a relatively good predictor of relative biomass.

      The pattern in Fig. S10 is not really noisy. It just reflects typical population fluctuations for arthropods. Most arthropod taxa undergo very pronounced temporal abundance fluctuations between years.

      Next, with regards to the observation that the results reveal an overall decrease in arthropod biomass over time: The authors suggest one alternate to their theory, that the dropping DNA copy number may reflect taxonomic turnover of species with different eDNA shedding rates. Could there be another potential explanation - simply be that leaves are getting denser/larger? Can this be ruled out in some way, e.g. via data on leaf mass through time for these trees? (From this dataset or indeed any other place).

      This is a very good point. However, we can rule out this hypothesis, as the ESB performs intensive biometric data analysis. The average leaf weight and water content have not significantly changed in our sites. We have addressed this in the Methods section (see ”Tree samples of the German Environmental Specimen Bank – Standardized time series samples stored at ultra-low temperatures”, lines 308-311).

      With regards to estimates of OTU/zOTU diversity. The authors state in the manuscript that zOTUs represent individual haplotypes, thus genetic variation within species. This is only true if they do not represent PCR and/or sequencing errors. Perhaps therefore they would be able to elaborate (for the non-computational/eDNA specialist reader) on why their sequence processing methods rule out this possibility? One very good bit of evidence would be that identical haplotypes for the individual species are found in the replicate PCRs. Or even between different extractions at single locations/timepoints.

      We have repeated the analysis of genetic variation with much more stringent filtering criteria (see “Statistical analysis”, lines 611-615). Among other filtering steps, this also includes the use of only those zOTUs that occur in both technical replicates, as suggested by the reviewer. Another reason to make us believe we are dealing with true haplotypic variation here is that haplotypes show geographic variation. E.g., some haplotypes are more abundant in some sites than in others. NUMTS would consistently show a simple correlation in their abundance with the most abundant true haplotype.

      With regards to the bigger picture, one thing I found very interesting from a technical point of view is that the authors explored how modifying the mass of plant material used in the extraction affects the overall results, and basically find that using more than 200mg provides no real advantage. In this regard, I draw the authors and readers attention to an excellent paper by Mata et al. (https://onlinelibrary.wiley.com/doi/full/10.1111/mec.14779) - where these authors compare the effect of increasing the amount of bat faeces used in a bat diet metabarcoding study, on the OTUs generated. Essentially Mata and colleagues report that as the amount of faeces increases, the rare taxa (e.g. those found at a low level in a single faeces) get lost - they are simply diluted out by the common taxa (e.g those in all faeces). In contrast, increasing biological replicates (in their case more individual faecal samples) increased diversity. I think these results are relevant in the context of the experiment described in this new manuscript, as they seem to show similar results - there is no benefit of considerably increasing the amount of leaf tissue used. And if so, this seems to point to a general principal of relevance to the design of metabarcoding studies, thus of likely wide interest.

      Thank you for this interesting study, which we were not aware of before. The cryomilling is an extremely efficient approach to equally disperse even traces of chemicals in a sample. This has been established for trace chemicals early during the operation of the ESB, but also seems to hold true for eDNA in the samples. We have recently done more replication experiments from different ESB samples (different terrestrial and marine samples for different taxonomic groups) and find that replication of extraction does not provide much more benefit than replication of PCR. Even after 2 replicates, diversity approaches saturation. This can be seen in the plot below, which shows recovered eDNA diversity for different ESB samples and different taxonomic groups from 1-4 replicates. A single extract of a small volume contains DNA from nearly all taxa in the community. Rare taxa can be enriched with more PCR replicates.

    1. Author Response

      Reviewer #1 (Public Review):

      Previous studies have linked several lifestyle-related factors, such as body mass index and smoking, alcohol use with accelerated biological aging measured using epigenetic clocks, however, most of them focused on single lifestyle factors based on cross-sectional data from older adults. The current study has a couple of major strengths: it has a decent sample size, lifestyle was measured longitudinally during puberty and adolescence, it looked at the effect of multiple lifestyle measures collectively, it looked at multiple epigenetic clocks, and due to the data from twins, it could examine the contribution of genetic and environmental influences to the outcomes. I have a couple of comments that are mainly aimed at improving the clarity of the methods (e.g. how was multiple testing correction done, how did the association model account for the clustering of twin data, how many samples were measured on 450k vs EPIC and were raw or pre-QC'd data supplied to the online epigenetic age calculator), and interpretation of findings (why were 2 measures of Dunedin PACE of aging used, how much are results driven by BMI versus the other lifestyle factors, and the discussion on shared genetic influences should be more nuanced; it includes both pleiotropic effects and causal effects among lifestyle and biological ageing).

      Thank you for the encouraging comments and important suggestions.

      Reviewer #2 (Public Review):

      Kankaanpää and colleagues studied how lifestyle factors in adolescence (e.g., smoking, BMI, alcohol and exercise) associate with advanced epigenetic age in early adulthood.

      Strengths:

      The manuscript is very well written. Although the analyses and results are complex, the authors manage very well to convey the key messages.

      The twin dataset is large and longitudinal, making this an excellent resource to assess the research questions.

      The analyses are advanced including LCA capitalizing on the strength of these data.

      The authors also include a wider range of epigenetic age measures (n=6) as well as a broader range of lifestyle habits. This provides a more comprehensive view that also acknowledges that associations were not uniform across all epigenetic age measures.

      Weaknesses:

      The accuracy of the epigenetic age predictions was moderate with quite large mean absolute errors (e.g., +7 years for Horvath and -9 years for PhenoAge). Also, no correlations with chronological age are presented. With these large errors it is difficult to tease apart meaningful deviations (between chronological and biological age) from prediction error.

      The authors claim that 'the unhealthiest lifestyle class, in which smoking and alcohol use co-occurred, exhibited accelerated biological aging...'. However, this is only partially true. For example, PhenoAge was not accelerated in lifestyle class C5. Similarly, all classes showed some degree of deceleration (not acceleration) with respect to DunedinPACE (Figure 3D). The large degree of heterogeneity across different epigenetic age measures needs to be acknowledged.

      The authors claim that 'Practically all variance of AAPheno and DunedinPACE common with adolescent lifestyle was explained by shared genetic factors'. However, Figure 4 suggest that most of the variation (up to 96%) remained unexplained and genetics only explained around 10-15% of total variation. The large amount of unexplained variation should be acknowledged.

      Thank you for the encouraging comments and important notes.

      We have now acknowledged that the standard deviations of epigenetic age estimates were high (lines 409-418). Due to the narrow age range of this study, the correlations between chronological age and epigenetic age estimates were weak. We aimed to overcome these weaknesses and calculated the epigenetic age estimates using recently developed principal component (PC)-based clocks, which are shown to improve the reliability and validity of epigenetic clocks (Higgins-Chen et al., 2022). In our data, the standard deviations of epigenetic age estimates were similar or even higher compared with those obtained with the original clocks, but the correlations between epigenetic age acceleration measures assessed with different clocks were consistently higher when PC-based epigenetic clocks were used. Importantly, the observed associations with the adolescent lifestyle behavior patterns did not substantially change.

      Moreover, we have now more carefully reported and interpreted the results obtained using different epigenetic aging measures and acknowledged their heterogeneity (lines 459-467).

      Figure 4 presents the genetic and environmental influences on biological aging shared with adolescent lifestyle and biological aging. There are also unique genetic and environmental influences on biological aging not shown in the figure. Therefore, the unexplained variation in biological aging was not that large. Most of the total variation in biological aging was explained by the genetic factors unique to biological aging. We have now clarified the description of the estimation of genetic and environmental influences (lines 283-300) and the presentation of the results (lines 437-449).

      References:

      Higgins-Chen, A. T., Thrush, K. L., Wang, Y., Minteer, C. J., Kuo, P.-L., Wang, M., Niimi, P., Sturm, G., Lin, J., Moore, A. Z., Bandinelli, S., Vinkers, C. H., Vermetten, E., Rutten, B. P. F., Geuze, E., Okhuijsen-Pfeifer, C., van der Horst, M. Z., Schreiter, S., Gutwinski, S., … Levine, M. E. (2022). A computational solution for bolstering reliability of epigenetic clocks: implications for clinical trials and longitudinal tracking. Nature Aging, 2(7), 644–661. https://doi.org/10.1038/s43587-022-00248-2

    1. Author Response

      Reviewer #1 (Public Review):

      This report describes evidence that the main driving force for stimulation of glycolysis in cultured DGC neurons by electrical activity comes from influx of Na+ including Na+ exchanging into the cell for Ca2+. The findings are presented very clearly and the authors' interpretations seem reasonable. This is important and impactful because it identifies the major energy demand in excited neurons that stimulates glycolysis to supply more ATP.

      Strengths are the highly rigorous use of fluorescent probes to directly monitor the concentrations of NADH/NAD+, Ca2+ and Na+. The strategies directly test the roles of Na+ and Ca2+.

      A weakness is an ambiguity about the effects of ouabain to inhibit the Na+/K+ ATPase directly and the absence of biochemical controls to validate the interpretation of the ouabain experiment.

      We appreciate the reviewer's comments about the work. While we can not rule out non-specific effects of ouabain at the concentrations needed to block Na+/K+ ATPase in these experiments, we do think that we can rely on the prior biochemical work characterizing the multiple components of ouabain binding in fresh mouse brain tissue, which is a close match to the acute mouse brain slice tissue used here.

      Reviewer #2 (Public Review):

      This study seeks to determine how neuronal glycolysis is coupled to electrical activity. Previous studies had found that glycolytic enzymes cluster within nerve terminals (in C. elegans) during activity. Furthermore, the glucose transporter GLUT4 is recruited to synaptic surface during activity. The authors previously showed that Ca2+ does not stimulate glycolysis in active neurons. Here, the authors show that the cytosolic Na+, not Ca2+, and the activity of the Na+/K+ pump drive glycolysis. However, it is important to note that in this study, glycolysis was examined in the soma, not nerve terminals, where some of the previous studies were conducted. A few other caveats in the interpretation of the findings are listed below:

      1) The NADH/NAD+ ratio is used throughout as the only measurement reflecting glycolytic flux.

      In this and previous work, we have validated that increased cytosolic NADH production (whose major sources are related to glycolysis), rather than altered NADH reoxidation, produces the changes in NADH/NAD+ ratio.

      2) It has been hypothesized that the close association of glycolytic enzymes with ion transporters (such as the Na+/K+ pump) is meant to provide localized ATP to power these pumps. How does bulk glycolysis (monitored with NADH/NAD+ ratio) relate to localized/compartmentalized glycolysis?

      Even if glycolysis is indeed localized to the plasma membrane (an interesting and difficult-to-address hypothesis), we believe that because the mitochondrial shuttles are the main pathway for NADH re-oxidation, and most mitochondria are not localized to the plasma membrane, changes in glycolytic NADH production are likely to be reflected in changes of the bulk cytosolic NADH/NAD+.

      3) Related to point 2, most of the Peredox measurements in the paper have been made at baseline, in the absence of electrical activity. Therefore, it is not clear how the findings relate to activity-driven glycolysis.

      The ion exchange experiments and even the faster Ca2+ puff experiments can mimic but indeed cannot match the speed of activity-driven changes in ion concentrations. Unfortunately, it is impossible to induce normal electrical activity in neurons in the absence of extracellular Na+. We believe that the complete inability of Ca2+ elevation alone (without Na+-Ca2+ exchange) to stimulate glycolysis, combined with the substantial Ca2+ contribution to activity-driven glycolysis, makes a good argument that Ca2+ entering during activity is likely to stimulate glycolysis via Na+ entry and the Na+/K+ ATPase.

      4) The finding that inhibition of SERCA during stimulation actually elevates cytosolic NADH level argues against Na+ being the only ion that regulates glycolysis.

      The ability of SERCA inhibition to produce a small increase in activity-driven glycolysis is consistent with the simple argument that reduced SERCA-driven uptake of Ca2+ into ER results in additional Ca+ removal via Na+/Ca2+ exchange (which can then affect glycolysis via Na+ levels).

      5) The finding that "SBFI ΔF/F transients were longer in duration than the RCaMP LT transient" does not necessarily mean that Na+ elevation lasts longer than Ca2+ in the cell. This could be an artefact of the SBFI on/off rate relative to RCaMP. In fact, prolonged elevation of cytosolic Na+ would make neurons refractive to depolarization in AP trains.

      The rates of Na+ binding and unbinding to SBFI are likely to occur on the microsecond timescale (based on the known properties of crown ether molecules), much faster than the observed transient duration of approximately one minute. Prolonged elevation of cytosolic Na+ alone (to the levels seen here) should not cause neurons to be refractory to firing; refractoriness typically occurs in the setting of prolonged depolarization and consequent inactivation of NaV channels.

      Reviewer #3 (Public Review):

      Meyer et al have studied the mechanisms of glycolysis activation in the hippocampus during neuronal activity. The study is logically laid out, uses sophisticated fluorescence lifetime imaging technology and smart experimental designs. The support for intracellular [Na+] vs [Ca2+] rise driving glycolysis is strong. The evidence for the direct involvement of the Na+/K+ pump is based only on pharmacology using ouabain but the Na+/K+ pump is admittedly not an easy subject for specific perturbations. I still think that the Authors should strengthen the support for the pathway.

      We are happy that the reviewer feels that the evidence for Na+ rather than Ca2+ as the effector of glycolysis is strong. The tools for investigating the role of the Na+/K+ pump (NKA) are indeed limited to pharmacology, because (as the reviewer says) there are not many other options. The requirement for Na+ elevation (which stimulates NKA activity) to trigger glycolysis and the ability of ouabain, a specific NKA inhibitor, to prevent this seem like strong implication of NKA in the mechanism of glycolysis activation. Genetic manipulation of the NKA may be unable to change the level of pump activity, because of compensation by altered expression of other subunits (PMID 17234593); it also is unclear how any chronic manipulation would shed light on the role of NKA in triggering glycolysis. But perhaps future studies of knock-in mice in which the α1 isoform of NKA has made more sensitive to ouabain (PMIDs 15485817; 34129092) might allow the identification of the NKA as the target of ouabain in this situation to be made even more secure.

      Also, there is a long list of publications on the connection between the Na+/K+ pump and glycolysis. It might be useful to highlight the role of the NCX- Na+/K+ pump coupling in the activation of glycolysis in the title.

    1. Author Response

      Reviewer #1 (Public Review):

      Dotov et al. took joint drumming as a model of human collective dynamics. They tested interpersonal synchronization across progressively larger groups composed of 1, 2, 4 and 8 individuals. They conducted several analyses, generally showing that the stability of group coordination increases with group numerosity. They also propose a model that nicely mirrors some of the results.

      The manuscript is very clear and very well written. The introduction covers a lot of relevant literature, including animal models that are very relevant in this field but often ignored by human studies. The methods cover a wide range of distinct analyses, including modelling, giving a comprehensive overview of the data. There are a few small technical differences across the experiments conducted with small vs. large groups, but I think this is to some extent unavoidable (yet, future studies might attempt to improve this). Furthermore, the currently adopted model accounts well for behaviors where all individuals produce a similar output and therefore are "equally important". However, it might be interesting to test to what extent this can be generalized to situations where each individual produces a distinct sound (as in a small orchestra) and therefore might selectively adapt to (more clearly) distinguishable individuals.

      We agree that this is important. We discuss this in a new section (4.1) at the end of the discussion. We suggest that heterogeneity makes it possible for other modes of organization to compete with the attractive tendency towards the global average. We also point out that factors such as individual skill, task difficulty, delays, and selective attention enable such heterogeneity in the ensemble.

      Similarly, it would be interesting to test to what extent the current results (and model) can be generalized to interactions that more strongly rely on predictive behavior (as there is not much to predict here given that all participants have to drum at a stable, non-changing tempo).

      We can only speculate that the present results are less relevant to interactions that rely strongly on predicitive behavior, as behaviour in our simple task could be modeled well by our hybrid single oscillator Kuromoto model. We inserted the idea that the presence of a group rhythm can diminish the demands for individuals to predict each other’s notes, the end of paragraph 1, page 27.

      An important implication of this study is that some well-known behaviors typically studied in dyadic interaction might be less prominent when group numerosity increases. I am specifically referring to "speeding up" (also termed "joint rushing") and "tap-by-tap error correction" (Wolf et al., 2019 and Konvalinka et al., 2010, also cited in the manuscript, are two recent examples). I am not sure whether this depends on how the data is analyzed (e.g. averaging the behavior of multiple drummers), yet this might be an important take-home message.

      Thank you for the suggestion. We edited to emphasize that the relevant part of the analysis of the drumming data was performed at the individual level and using the same methods as typically done in dyadic tapping (first sentences of Section 2.7.2). Speeding up was the only variable where we used group-averages. For consistency, and to avoid confusion, in the present version we re-did the stats (the changed statistical parameters are highlighted) and figures using the individual data points and we did not observe major changes.

      I am confident that this study will have a significant impact on the field, bringing more researchers close to the study of large groups, and generally bridging the gap between human and animal studies of collective behavior.

      Reviewer #2 (Public Review):

      In this manuscript Dotov et al. study how individuals in a group adjust their rhythms and maintain synchrony while drumming. The authors recognize correctly that most investigation of rhythm interaction examines pairs (dyads) rather than larger groups despite the ubiquity of group situations and interactions in human as well as non-human animals. Their study is both empirical, using human drummers, and modeling, evaluating how well variations of the Kuramoto coupled-oscillator describe timing of grouped drummers. Based on temporal analyses of drumming in groups of different sizes, it is concluded that this coupled oscillator model provides a 'good fit' to the data and that each individual in a group responds to the collective stimulus generated by all neighbors, the 'mean field'.

      I have concerns about 1) the overall analysis and testing in the study and about 2) specific aspects of the model and how it relates to human cognition. Because the study is largely empirical, it would be most critical for the authors to propose two - or more - alternative hypotheses for achieving and maintaining synchrony in a group. Ideally, these alternatives would have different predictions, which could be tested by appropriate analyses of drummer timing. For example, in non-human animals, where the problem of rhythm interaction in groups has been examined more thoroughly than in humans, many acoustic species organize their timing by attending largely to a few nearby neighbors and ignoring the rest. Such 'selective attention' is known to occur in species where dyads (and triads) keep time with a Kuramoto oscillator, but the overall timing of the group does not arise from individual responses to the mean field. Can this alternative be evaluated in the drumming data ? Would this alternative fit the drumming data as well as, or better than , the mean field, 'wisdom of the crowd' model ?

      These are very important points. The present paper is restricted to a simple task where participants are instructed to synchronize with each other. However, we now more explicitly acknowledge the limitations of our study and include a new section, “Beyond the group average” at the end of the Discussion that is dedicated to this issue and discussed other organizing tendencies that are particularly relevant in larger and more diverse ensembles. In the context of the present task, the relative difference between local and global interactions was likely negligible because of the small differences in timing, from 4 to 16 ms, between the closest and most distant pairs.

      It will be interesting in future studies to introduce acoustic heterogeneity by varying the timbre of the instruments, for example. In the present study, the instruments had the same timbre with narrowly varying fundamental frequencies (117-129 Hz in the duets/quartets and 249-284 Hz in the octets), a situation that encourages integration of all the acoustic information. We do point out that the present approach needs to be expanded to be able to account for competitive pressure and selective attention.

      The well-known Vicsek model (discussed briefly in paragraph 2, page 15), related to the Kuramoto under certain assumptions, can account for a variety of dynamic behaviors in flocking animals. The ability for selective attention in the form of a heterogeneous coupling matrix, combined with the existence of competitive pressure in the form of negative coupling terms can result in spontaneous formation of clusters and spatiotemporal patterns of movement. This is consistent with prior research in chorusing animals (insects and anurans). Large musical ensembles also involve groupings of instruments such as separate sections that change their relative loudness across time. Typically these are not spontaneous but composed and conducted, yet they may satisfy the same constraints.

      We also pointed out that we see these as complementary organizing principles. Even in the Vicsek model, there is a notion of a ‘local order parameter’ whereby individuals are coupled to a group average within a narrow interaction radius. The relative importance of other organization tendencies depends on the layout of the acoustic environment and the competitive and collaborative aspects of the task. Hence, parameters such as delay and individual heterogeneity could act as symmetry breaking terms that enable different stabilities from the basic global group synchrony.

      A second concern arises from relying on a hybrid, continuous - pulsed version of the Kuramoto coupled oscillator. If the human drummers in the test could only hear but not see their neighbors, this hybrid model would seem appropriate: Each drummer only receives sensory input at the exact moment when a neighbor's drumstick strikes the drum. But the drummers see as well as hear their neighbors, and they may be receiving a considerable amount of information on their neighbors' rhythms throughout the drum cycle. Can this potential problem be addressed? In general, more attention should be paid to the cognitive aspects of the experiment: What exactly do the individual drummers perceive, and how might they perceive the 'mean field' ?

      This is all very relevant. We instructed participants to focus on X’s in the centers of their drums and not look at their peers (edited to mention that in at the end of Section 2.4, page 9). Additionally, the pattern of results for tempo change, cross-correlations, and variability in the dyadic condition was consistent with previous studies that involved purely auditory tapping tasks (emphasized in the begging of paragraph 2, page 26). The best way to address this limitation would be to repeat the study and block the visual contact among participants, as well as include a condition emphasizing visual contact.

      It is beyond the scope of the present paper to make model-based predictions of effects of coupling and information availability, but this should be done in future work. For the present paper, we now include a simulation involving continuous coupling (end of section 2.9.2, page 16) and Supplementary Figure 8A) which fails to reproduce the results for variability, results that are well captured by the hybrid continuous-pulsed model we developed, see the Supplementary Materials.

      Reviewer #3 (Public Review):

      The contribution provides approaches to understanding group behaviour using drumming as a case of collective dynamics. The experimental design is interestingly complemented with the novel application of several methods established in different disciplines. The key strengths of the contribution seem to be concentrated in 1) the combination of theoretical and methodological elements brought from the application of methods from neurosciences and psychology and 2) the methodological diversity and creative debate brought to the study of musical performance, including here the object of study, which looks at group drumming as a cultural trait in many societies.

      Even though the experimental design and object of study do not represent an original approach, the proposed procedures and the analytical approaches shed light on elements poorly addressed in music studies. The performers' relationships, feedbacks, differences between solo and ensemble performance and interpersonal organization convey novel ideas to the field and most probably new insights to the methodological part.

      It must be mentioned that the authors accepted the challenge of leaving the nauseatic no-frills dyadic tests and tapping experiments in the direction of more culturally comprehensive (and complex) setups. This represents a very important strength of the paper and greatly improves the communication with performers and music studies, which have been affected by the poor impact of predictable non-musical experimental tasks (that can easily generate statistical significant measurements). More specifically, the originality of the experiment-analysis approach provided a novel framework to observe how the axis from individual to collective unfolds in interaction patterns. In special, the emergence of mutual prediction in large groups is quite interesting, although similar results might be found elsewhere.

      Thank you for these comments.

      On another side, important issues regarding the literature review, experimental design and assumptions should be addressed.

      I miss an important part of the literature that reports similar experiments under the thematic framework of musical expressivity/expression, groove, microtiming and timing studies. From the participatory discrepancies proposed in 1980's Keil (1987) to the work of Benadon et al (2018), Guy Madison, colleagues and others, this literature presents formidable studies that could help understand how timing and interactions are structured and conceptualized in the music studies and by musicians and experts. (I declare that I have no recent collaborations with the authors I mentioned throughout the text and that I don't feel comfortable suggesting my own contributions to the field). This is important because there are important ontological concerns in applying methods from sciences to cultural performances.

      Thank you for the suggestions. We included a brief discussion in the newly added “Beyond the group average” section at the end of the Discussion, specifically the first paragraph, pages 27-8. We think that expressive timing naturally fits in continuation with the other reviewers’ concerns about how much the idea of the group average generalizes to real musical situations. By design and instruction, we stripped individual expression from the present task. Specific cultural contexts and performance styles may want to escape or at least expressively tackle this constraint of our task, and we believe that now that we have established the mean field as one factor affecting group behaviour, further studies can take on the challenge of developing models that make predictions in more complex situations closer to real musical interactions – and testing those models empirically.

      One ontological issue that different cultural phenomena differ from, for example, animal behaviour. For example, the authors consider timing and synchrony in a way that does not comply with cultural concepts: p.4 "Here we consider a musical task in which timing consistency and synchrony is crucial". A large part of the literature mentioned above and evidence found in ethnographic literature indicate that the ability to modulate timing and synchrony-asynchrony elements are part of explicit cultural processes of meaning formation (see, for example, Lucas, Glaura and Clayton, Martin and Leante, Laura (2011) 'Inter-group entrainment in Afro-Brazilian Congado ritual.', Empirical musicology review., 6 (2). pp. 75-102.). Without these idiosyncrasies, what you listen to can't be considered a musical task in context and lacks basic expressivity elements that represent musical meaning on different levels (see, for example, the Swanwick's work about layers/levels of musical discourse formation).

      Indeed, this is an important issue. We often use cultural phenomena merely as a motivation but do not dive in the relevant details. Here, in addition to the previous discussion, we now reiterate that the tendency towards the group average is one organizing tendency but there are additional ones, enabled by individual heterogeneity and context. For example, marching bands and chanting crowds probably impose different constraints than individual artistic expression by skillful musicians.

      Such plain ideas about the ontology of musical activities (e.g. that musical practice is oriented by precision or synchrony) generate superficial constructs such as precision priority, dance synchrony, imaginary internal oscillators, strict predictive motor planning that are not present in cultural reports, excepting some cultures of classical European music based on notation and shaped by industrial models. The lack of proper cultural framing of the drumming task might also have induced the authors to instruct the participants to minimize "temporal variability" (musical timing) and maintain the rate of the stimulus (musical tempo), even though these limiting tasks mostly take part of musical training in some societies (examples of social drumming in non-western societies barely represent isochronous tempo or timing in any linguistic or conceptual way). The authors should examine how this instruction impacts the validity of results that describe the variability since it was affected by imposed conditions and might have limited the observed behaviour. The reporting of the results in the graphs must also allow the diagnosis of the effect of timing in such small time frame windows of action.

      We agree totally. We made changes and tried to be more specific about the cultural framing, delineating contexts where the present ideas are more relevant and where they are less relevant, or at least incomplete (the bottom of page 3, and pages 27-8).

    1. Author Response

      Reviewer #1 (Public Review):

      Mitotic spindles are macromolecular machines that accurately segregate duplicate chromosomes between two daughter cells during cell division. To perform this task, spindles exert forces that are orchestrated in space and time. On the other hand, non-functioning spindles can generate chromosome segregation errors, which are present in cancers, miscarriages, and Down syndrome. Therefore, understanding spindle mechanics is a big biological challenge. In this elegant study, the authors explore the mechanical properties of the mitotic spindle. They combine a variety of experimental biophysical approaches, including microneedle manipulation and quantitative imaging, with theoretical modeling. By systematically exploring the shape of kinetochore fibers that are not manipulated, they find the force and moments that exist in the native spindles. Analyzing previously published data obtained by microneedle manipulations, where kinetochore fibers were mechanically perturbed, the authors observe a dramatic change in the shape of the kinetochore fibers. Comparing this observation and theoretical predictions, they discover a lateral anchorage near the chromosome. Taken together, this paper nicely demonstrates existence of lateral anchorage near chromosomes, offering exciting ideas about the balance of forces of the entire mitotic spindle.

      We appreciate the reviewer’s enthusiasm about the work and their thoughtful questions and suggestions to improve the manuscript.

      Major points:

      (1) In order to describe the shape of unmanipulated kinetochore fibers, the authors use a simple physical model in which they describe these fibers as a single elastic rod. They find that the observed shape is a consequence of compressive forces, or a combination of bending moments and perpendicular forces. However, it is well known that kinetochores are under the tension. For this reason, the plus end of kinetochore fibers should be under tension rather than under compression. In order to describe forces that shape unmanipulated kinetochore fibers, the authors should revise the model by setting the tensile force at the plus end of the kinetochore fiber.

      We thank the reviewer for their comment on this important point .

      (2) The authors compare the shapes of inner and outer kinetochore fibers. By using the model, they find that the forces and moments are similar for both, the inner and outer kinetochore fibers, whereas the difference arises because these fibers have a different length. In classical beam theory, we distinguish between buckling (caused by a compressive force) and bending (caused by a bending moment). In the case of buckling, which is caused by a same critical force, different curvatures can be obtained, whereas in the case of bending the curvature is proportional to the bending moment. Based on the data presented by the authors, it seems that their model operates in the buckling regime. It would be important to elaborate on this more systematically. Also, one should warn the reader that in the case of bending, the inner and outer kinetochore fibers will be characterized by different bending moments.

      We thank the reviewer for raising this nuanced point on the shape generation mechanisms in inner and outer k-fibers. We believe that the mechanisms that the reviewer suggested are valid ways to generate varying k-fiber deflections in the scenario where the k-fiber end-to-end length is held fixed. However, we argue that the natural variability in the lengths of inner vs. outer k-fibers is alone sufficient to give rise to diverse k-fiber shapes without requiring the end-forces to change.

      We added a new Appendix section 1.4 (pages S4-S5 in the revised appendix) in our revised submission where we provide the details of our argument. We demonstrate analytically that when only a moment at the pole is present and held at a fixed value, then the normalized maximum deflection scales linearly with the k-fiber’s end-to-end length (Appendix 1 – figure 3a,b). And in the case where both a moment at the pole and an axial force are present and held at fixed values, the dependence on k-fiber length is stronger (faster than linear), thereby allowing for a wide range of k-fiber deflections created with identical end-forces (Appendix 1 – figure 3c,d).

      Reviewer #2 (Public Review):

      Suresh and co-workers apply classical beam bending theory to analyze shapes of the microtubule bundles that push and pull on mitotic chromosomes and drive chromosome separation in dividing cells. The bundles attach at one end to chromosomes via specialized protein assemblies called kinetochores, and at the other end they are associated with spindle poles. The shapes of these k-fiber bundles are analyzed in unperturbed control cells and in cells where the bundles have been forcibly deformed using microneedles. From their analysis, the authors infer the extent and nature of mechanical anchorage at each end of the bundles, finding that anchorage is more extensive and more restrictive at the kinetochore-attached ends compared to the pole-proximal ends. Anchorage at the pole-proximal ends is apparently limited to the bundle tips, allowing some swiveling of the bundles around the poles. In contrast, the kinetochore-attached ends appear to have "lateral anchorage", i.e. force-bearing connections to the sides of the bundles, that extend several micrometers away from the kinetochores. This lateral anchorage resists swiveling of the bundles around their kinetochore-attached ends.

      A major strength of this study is its high degree of novelty. The microneedle data on which the analyses are based have been published previously, but are entirely unique - based on classic, groundbreaking experiments performed nearly half a century ago on cells from grasshoppers and mantids, and now being done only in the Dumont lab, in mammalian cells for the first time, and with the benefit of modern fluorescence and molecular perturbation techniques. Such a unique and interesting dataset certainly deserves careful analytical scrutiny, which is the focus of this new paper.

      The application here of classical beam theory to analyze k-fiber shapes is also clever, apparently well done, and well described. The unique approach provides a direct way to assess the extent to which k-fiber bundles are mechanically linked to surrounding material, including to non-k-fiber microtubules and potentially to neighboring k-fibers. The main conclusion that lateral anchorage of the k-fibers in the local vicinity (within a few micrometers) of kinetochores is needed to explain the shapes that the k-fibers adopt during manipulations seems well justified by the data and analyses - particularly by the negative curvatures measured near the kinetochore-attached ends, and the tendency for the orientations of the kinetochore-proximal portions to be maintained even 1 to 3 micrometers away from the kinetochore-attached ends. The assumptions of the analysis also seem mostly reasonable and are clearly explained. Under these assumptions, the analysis shows convincingly that forces and moments applied only at kinetochore-attached ends would be insufficient to explain the observed shapes.

      We appreciate the reviewer’s enthusiasm about the work and their thoughtful questions and suggestions to improve the manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper primarily assessed the host/phage interactions for bacteria in the order of Cornyebacteriales to identify novel host factors necessary for phage infection, in regards to genes responsible for bacterial envelope assembly. Bacteria in this order, such as Mycobacterium tuberculosis and Corynebacterium diphtheriae have unique, complex envelopes composed of peptidoglycan, arabinogalactan, and mycolic acids. This barrier is a potent protector against the therapeutic effects of antibiotics. Phages can be used to discover novel aspects of this bacterial envelope assembly because they engage with cell surface receptors. To uncover new factors, the researchers challenged a high-density transposon library of Corynebacterium glutamicum (called Cglu in the paper) with phages, Cog, and CL31. Results by transposon sequencing identified loci that were interrupted, leading to phage resistance. This study implicated the importance of Cglu genes, ppgS, cgp_0658, cgp_0391, and cgp_0393. They also identified a new gene called cgp_0396 necessary for arabinogalactan modification and recognized a conserved host factor called Ahfa (Cpg_0475) that plays a crucial role in Cglu mycolic acid synthesis. Ultimately, this work implicated the importance of mycomembrane porins, arabinogalactan, and mycolic acid synthesis pathways in the assembly of the Cornyebacteriales envelope.

      Strengths of the research:

      • Language choice: A major strength of the paper is that this could easily be given to an undergraduate student with introductory knowledge of biology and they would still be able to get the gist of this paper. The language is written in a clear, concise fashion with explanations of terms not everyone would immediately know unless they worked in the field specifically.

      • These figures are generally explained in a direct manner, clearly stating the major conclusions the reader should get after carefully analyzing the presented data

      We thank the reviewer for the enthusiasm for our work and our description of it.

      How the research could be strengthened:

      • It could be worthwhile to describe some of your results mathematically. For example, the differences you see in your phage infections relating to the differences in logs, etc. Bar graphs also should be described in mathematical terms, when "something is lower compared to the WT," how much is lower, etc?

      To keep the text streamlined, we refrained from adding descriptions of the results mathematically in the text. The reader can refer to the figures to get the magnitudes of any changes observed.

      • There were no p values relating to the statistical significance of any of the data presented, which should be changed for the final manuscript implicating the importance of this work.

      We added the p-values as requested.

      • Figure 8 was not entirely supported by the data, especially Figure 8A which either could be improved with better images that support the author's claims, etc.

      We do not understand why the reviewer believes that Figure 8A does not support our conclusions. The mutant cells do not label with the 6-TMR-Tre dye whereas the WT control does. The dye labels mycolic acid such that our conclusion that AhfA is involved in mycolic acid synthesis is valid. In any case, we have included an additional supplementary source data file of the uncropped image of the 6-TMR-Tre treated cells to show a larger number of mutant cells that fail to stain, further supporting our conclusion.

      Reviewer #2 (Public Review):

      In this manuscript, McKitterick and Bernhardt use genetic approaches to investigate genes in Corynebacterium glutamicum that are required for efficient phage infection. They make use of a high-density transposon library that was generated in the Bernhardt lab recently. They challenged the library with two phages, CL31 and Cog. Importantly, they elegantly adapted the phages to the laboratory strain MB001 before. The MB001 strain is ideal for genetic experiments since all prophage elements were removed in this strain. The evolved phages are likely a very useful tool for further investigations aiming to understand host/virus interactions in this model. The phage-infected libraries were plated and the collected colonies were sequenced. Genes involved in efficient phage infection had multiple transposon insertions. Using this method the authors identified specific genes required for infection with Cog and CL31. The Cog phage needs apparently the porin proteins in the mycolic acid membrane for efficient infection and the authors speculate that the porins may act as auxiliary receptors for phage adsorption. Furthermore, genes involved in putative arabinogalactan modification were found to be important. Mutants in these genes did not abolish phage adsorption and thus play a role in viral genome injection. For phage CL31 the authors show that in particular genes involved in mycolic acid synthesis are essential. The genes identified include one coding for a protein involved in protein mycoloylation. A candidate for such a lipidation is the porin protein complex PorAH. The trehalose-6-phosphate synthase OtsA was also identified as important for phage infection. Also strictly required for the establishment of the myco membrane, otsA deletions are viable in C. glutamicum. As part of their analysis, they also identified an unknown factor in mycolic acid synthesis in C. glutamicum. Analysis of a spontaneous resistant mutant to CL31 revealed a mutation in cg_0475 (renamed ahfA). Deletion of ahfA drastically reduced mycolic acid production. This was proven by thin layer chromatography and fluorescent staining. Interestingly, deletion of ahfA also results in a cell morphology defect, indicating the importance of a correct mycolic acid layer for cell shape.

      In summary, the authors provide an excellent paper that is clearly written and experiments are conducted nicely.

      We thank the reviewer for their kind words and enthusiasm for the work.

      Reviewer #3 (Public Review):

      In their manuscript, McKitterick and Bernhardt perform a screen to determine host factors, such as receptors, which are important for bacterial viruses (phages) to infect Corynebacterium glutamicum., an organism that shares the unique membrane of mycobacteria (mycomembrane), with M. tuberculosis. To do so, they challenge a previously described Tn-seq library with a high MOI of 2 phages - Cgl and Cog. The surviving strains are those in which genes important for phage infection (such as receptors) are disrupted. The authors' screen is successful, and the authors identify and validate several factors important for the infection of each phage, providing the first such screen in Corynebacterium. Moreover, the authors perform a suppressor screen to identify additional factors and experimentally follow up several genes of interest. Finally, the authors use the newly determined host specificity of te phages to implicate new genes in mycolic acid synthesis. As a whole, this is a strong work that paves the way to a deeper understanding of Corynebacterial and (by extension) Mycobacterial phages and should be of broad interest.

      Below, we suggest additional analyses, context, and elaboration that will help the ms. elaboration to fully realize its impact.

      Major points:

      1. Although the authors' experimental design is fundamentally sound, I am worried about the possibility of "jackpotting" in shaping their results, particularly in the uninfected control experiment. If the authors' Tn-seq library is ~200,000 strains, and they don't plate at least 10-100x times that many colonies then any given strain (regardless of its phenotype) may or may not be represented in the output of the experiment, causing false phenotypes to be ascribed to genes based on chance. This is particularly a problem for the uninfected control, where the authors choose to dilute the culture 1000fold to mimic the number of colonies that survive infection. They may be better served by plating the whole culture on the plates, to ensure adequate representation of the library. Part of the reason for this concern is that an overwhelming majority of statistically significant hits (something like 80-90%) appear to confer susceptibility rather than resistance (source data Fig 2) - something the authors' experimental design should not be able to measure. The lack of accurate representation of distributions of strains in the starting culture also calls into question the quantitative differences they present in the results

      We thank the reviewer for their thorough analysis of our experimental design. The Tn-Seq experiments were repeated with the uninfected controls plated at a density that maintains the representation of the original library. The overall results are largely unchanged because we maintain our focus on hits that become greatly enriched following phage infection not those that become depleted. The vast majority of these hits were validated for their involvement by constructing mutant strains, indicating the robustness of the current and previous analyses. With respect to the depletion of insertion mutants, we mentioned in the original submission that they are unlikely to be biologically meaningful.

      a. L138. Where the authors describe their initial experimental design it would be helpful to add more details. What is the size of the Tn library? What is the coverage in their experiment? Approximately how many colonies are recovered on the plates after phage infection and in the uninfected control?

      This information has been added (Fig. 2 table supplement 1).

      b. it is important to know how the number of colonies on the plates compares to the number of reads in the experiment. In the analysis of most HT screens, one implicitly assumes that each read corresponds to 1 cell, hence each read can be treated as statistically independent. This assumption is critical to the statistical methods used to analyze this data. By scraping a plate of colonies (which may be required for efficient phage infection), the authors potentially violate this assumption (since the number of cells → number of colonies, which are the actual statistically independent entities in the experiment). Does this assumption hold (or approximately hold) for the screen? If not, a different statistical method should be used to determine p-values.

      We respectfully disagree with the reviewer on this point. In our view, a slurry of colonies from a plate is no different than a culture. Both contain a mixture of cells containing an array of different transposon mutants each represented multiple times in the population due to replication of the original mutant. We do not think there is any meaningful difference to the analysis whether this replication occurs in liquid or on a plate. In both cases, a read corresponds to a single cell/molecule of purified genomic DNA from the population.

      1. The authors' Tn-seq methodology is different from previously published HT-phage screens (e.g. Mutalik et al., 2020 and Rousset et al., 2018). Based on my knowledge of classical phage biology, I agree that plating the infected cells has advantages. However, the rationale will not be clear for most people performing such experiments. Please explain the rationale for the experimental protocol.

      Although the authors in the Mutalik et al paper did do competition experiments in liquid over several infection cycles, they also made use of a solid platebased assay in which they adsorbed their phages to the library cells for 15 minutes before plating. These plates were incubated overnight and resistant colonies were scraped, pelleted, and DNA prepped in a similar manner to the approach we took.

      We prefer plating over liquid growth because colony formation is an easy way to ensure that the mutant population has undergone numerous rounds of doubling under a given condition before the analysis is performed.

      a. Why did the authors plate the cultures after initial phage absorption instead of remaining in liquid?

      We were concerned that some potential receptor-related mutants would be less fit and would therefore be lost in a competition experiment. As such, plating after phage adsorption would decrease the competition between phage survivors. Furthermore, we thought that plating would additionally ensure that the bacteria that are sequenced are true survivors and not just reflect remnant DNA in the culture.

      b. How reproducible are the authors' Tn-seq results? The SRA ascension shows multiple replicates but this is not described in the manuscript nor reflected in the supplementary data. Given the potential for bottleneck and jackpotting effects in this assay, some measure of reproducibility is important for interpreting the results (see point 1).

      We performed completely new Tn-seq experiments for each phage in duplicate. The hit lists remained largely unchanged from our initial analysis and those that were investigated further were enriched for insertions in both new data sets. Thus, the results are highly reproducible.

      c. L587 "Significant hits with fewer than 10 insertions on each strand were manually removed." Why did the authors choose this criterion? Almost all of the genes they removed have very asymmetric distributions (e.g. in the Cog experiment, cgp3051 has 47853 fwd reads and 6 rev reads. Asymmetric distribution of insertions suggests that overexpression of downstream genes has an important (positive or negative) effect. This is a worthwhile pursuit, and many automated analysis pipelines can disambiguate these effects, including those developed in the Walker Lab (e.g. doi: 10.1038/s41589018-0041-4). These genes shouldn't be thrown away when they are arguably some of the most informative hits!

      We have updated the criteria we used for selecting the most impactful insertion enrichments. Our concern in this report was to investigate mutants that affect phage infection when inactivated. We will pursue genes that affect phage infection when overexpressed (as indicated by asymmetric insertion orientation distributions) in a follow-on study. We think such a study would best be carried out with a different transposon containing a strong outward facing promoter.

      1. There is a somewhat extensive phylogeny of M. smegmatis phages (phagesdb.org). Are the phages that the authors work on related to any of these phages? If so, what cluster do they map to? What is the host range of other phages in that cluster? If not, may be worthwhile to mention that these are quite distinct from other studied phages.

      We agree that the phylogenetic history of corynephages is quite interesting. Very few phages that infect Cglu have been isolated and sequenced, let alone studied. Neither Cog nor CL31 share significant nucleotide identity with other sequenced phages, thus they do not have assigned clusters at the moment.

      1. Given that cgp_0475 was a strong hit in the Tn-seq, why was it not identified in the previous chemical genomics experiments from the lab (https://doi.org/10.7554/eLife.54761) ?

      We appreciate the reviewer’s interest in previous work from the lab. In the prior phenotypic analysis, cgp_0475 was identified as having severe fitness defects across many conditions. However, it was not possible to correlate its phenotype with other genes involved in mycolic acid synthesis like pks and fadD2 because they were found to be so sick in the phenotypic outgrowth that they were classified as essential.

      1. Is there any relationship between the growth-rate of the mutants and their phage susceptibility? This can be analyzed using the authors' previous studies of this library.

      While some of the phage resistant mutants are associated with poor fitness (namely those involved in mycolic acid synthesis), not all were associated with decreased growth. For example, there were minimal fitness defects associated with deletions of either porAH or the genes involved GalN decoration. However, loss of these genes greatly inhibited the ability of Cog to infect.

    1. Author Response

      Reviewer #1 (Public Review):

      Main concerns:

      1) Validation of the MCS reporters is not shown. This is particularly important for pCLIP and GoPo, which have not been reported before. Fluorescence complementation between two proteins that normally localize to different organelles is far from demonstrating the existence of a MCS between those organelles. It would be important to demonstrate using marker proteins and ideally electron microscopy/CLEM the existence of the mentioned MCS and the suitability of the fluorescent reporter.

      We thank the reviewer for pointing this out and have now added supplementary characterization of the pCLIP and GoPo contact sites. The pCLIP has been previously described by us (Shai et al. 2018 Nat Commun 9, 1761. doi:10.1038/s41467-018-03957-8) and so we have only added one new figure (Figure 2 S1A) which shows the co-localization of the contact site reporter with a LD marker (MDH) and a cell periphery marker (TRITC-ConA). For the GoPo, since this is the first demonstration of a reporter for this contact site, we have rigorously characterized it by looking at the frequency of co-localization between peroxisomes and the Golgi in the absence of the reporter (Figure 1 S1B), the co-localization of the contact site reporter with a peroxisome marker (CFP-SKL) and a Golgi marker (Sec7-mCherry) (Figure 1 S1C), and by identifying a condition where this contact site is increased (Figure 1 S1D).

      Since all supported their function as bone-fide reporters and since performing electron microscopy experiments on these reporters was not possible for us at this time and has not been the standard in the field for other reporters, we hope that this is satisfactory.

      2) As pointed out above, the identification of a phenotype in ergosterol distribution for Ypr097W/Lec1 is very interesting. However, it is unclear how this observation relates with the localization of Lec1 to LDs, which is observed only upon over-expression.

      We would like to clarify that at endogenous levels Lec1 also localizes to LDs. However, this localization is less pronounced. To clarify this in the text and show this experimentally we have now added an example of the endogenous GFP-tagged protein with the LD marker Faa4-mCherry (Figure 3 S1B), and added a section in the text.

      Instead, further characterization of Ypr097w phenotype (via mutagenesis, modulation of ergosterol biosynthetic pathway, test ability to bind ergosterol, etc) in ergosterol distribution would be a plus.

      To further characterize the Lec1 phenotype, we looked at changes in ADHpr-GFP-Lec1 localization in cells treated with 40 µg/ml of fluconazole for 3h (Figure 5 S2B-C). Fluconazole is a known inhibitor of Erg11 and treatment with this drug strongly reduces the overall levels of cellular ergosterol, which can be clearly observed by the loss of binding of mCherry-D4H to the plasma membrane (cytosolic signal) (Figure 5 S2B lower right panels). After 3h of treatment with fluconazole, there is a small increase in the number of cells with bud/ bud neck localization for GFP-Lec1. The GFP-Lec1 signal in these cells generally appears brighter than in untreated cells, suggesting that loss of ergosterol potentiates Lec1 accumulation at the bud / bud neck. This result suggests that Lec1 cellular localization is affected by the levels of ergosterol. However, since treatment with high concentration of fluconazole leads to growth arrest (Zhang et al. 2010. PLOS Pathogens 6:e1000939. doi:10.1371/journal.ppat.1000939), it is also possible that this signal increase is the result of Lec1 accumulation at the bud due to a stalling in budding. We now discuss this in the text.

      We have also extensively mutagenized Lec1 as requested in an attempt to find a mutant that is still localized to LDs and stable yet not causing sterol redistribution. However, despite great efforts this has proven to be challenging (See below in detailed response to this request from reviewer #2).

    1. Author Response

      Reviewer #1 (Public Review):

      The authors have employed a variety of techniques (single-molecule fluorescence kinetic and steady state measurements, cryo-EM structure determination, and in vivo measurements of protein synthesis and cell proliferation) to investigate the mechanism of action of two molecule products: Didemnin B and Ternatin-4. Both molecules have previously shown to target eEF1A and have potential as cancer therapeutics. In addition, the structure of Didemnin B, bound to eEIF1A and to an elongation complex, have previously been solved.

      The authors here show that both compounds disrupt the dynamic accommodation of tRNA driven by eEF1A and its activation by the GTPase activation center of the ribosomal large subunit, relying on previous assignment of the FRET intensities observed in pre-steady state single-molecule fluorescence experiments in which peptide-tRNA and incoming aminoacylated tRNA are labeled with donor and acceptor dyes, respectively. They further show that this inhibition is dose dependent for both compounds and sensitive to the A399V eEF1A mutant, which creates a steric clash with didemnin B in its usual binding site. Subsequent analysis of steady-state single-molecule FRET experiments shows that didemnin B more strongly inhibits transitions between the intermediate (0.45) FRET state and the high (0.8) FRET states (though the authors choose to focus only on the effect of transitions from 0.45 to 0.8) previously assigned to the GTPase activated and fully accommodated conformations of the ternary complex, respectively. Further single-molecule experiments provide initial evidence that Didemnin B remains more stably bound to elongation complexes than does Ternatin-4.

      The authors then turn to cryo-EM structures of each compound bound to elongation complexes purified either from lysate or assembled from purified components. The structure of the Ternatin-4 complex shows additional density in the same binding cleft observed for Didemnin B in a prior structure reported elsewhere, with which the Didemnin B structures reported here also agree. This binding location provides structural evidence for both compounds effects on ternary complex dynamics, as well as their previously described effects on tRNA accommodation and elongation. Further comparison of the Didemnin B and Ternatin-4 structures reveals decreased electron density in the Ternatin-4 structure for elements of eEF1A (switch loops 1 and 2 and helix alpha2), as compared to the Didemnin B structures. The authors interpret this as evidence for greater mobility of these elements, which might explain the more modest restriction of A-site tRNA dynamics they observe in the presence of Ternatin-4 (as opposed to Didemnin B). Certainly this decreased density (which might be more convincingly demonstrated using difference maps of the two structures) is consistent with that interpretation. That said, it is certainly not a smoking gun.

      We have worked to soften the language pertaining to this point and have updated Fig. 4 to more accurately highlight the observed differences between the didemnin and ternatin-4 structures.

      Finally, the authors turn to in vivo measurements of protein synthesis and effects on cellular proliferation or survival in the presence of both compounds. Consistent with their single-molecule experiments, they observe more severe and durable inhibition of protein synthesis in the presence of Didemnin B, whereas Ternatin-4 exhibits more modest effects that are more rapidly restored upon removal of the drug in solution. Interestingly, Ternatin-4 appears to elicit similar, and perhaps more rapid, effects on cellular survival, increasing apoptosis more rapidly than Didemnin B, though these effects (like those on protein synthesis rates) are once again more sensitive to removal of the drug. The authors describe these results as evidence that Didemnin-B "irreversibly inhibits" protein synthesis in cells. I find this assertion strange, given that the authors have previously measured a dissociation rate for this molecule from elongation complexes and they have not performed measurements to ensure that activity is not simply restored at timescales longer than their initial measurements. That said, I concede that this might be a semantic distinction if the vast majority of cells perish prior to dissociation of the drug. In either case, I would suggest the authors apply a somewhat more nuanced interpretation of these results lest they be misunderstood.

      We thank Reviewer 1 for bringing this point to our attention. We have changed the title of this section to “Protein synthesis inhibition by ternatin-4, but not didemnin, can be reversed in cells,” and have softened the interpretation.

      Overall, this is a rigorous and well reasoned study that employs multiple complementary techniques to investigate the mechanism of action of compounds of potential therapeutic interest. In places, the higher order interpretation of the experimental data leaks into the results section (as opposed to being fully explored in the discussion) and is at times somewhat aggressive. Nonetheless, the results presented here illuminate important questions at the intersection of translational mechanism, cell proliferation, and cancer.

      We are grateful to Reviewer 1 for their assessment of this work as rigorous and well-reasoned. We have made significant updates to the text and figures, and we hope they find that we have addressed all concerns.

      Reviewer #2 (Public Review):

      The manuscript of Juette et al presents a combined structural and dynamic view of how a class of inhibitors (Didemins) block human ribosomal elongation. Prior work had shown that these cyclic peptide drugs bind to eEF1A in the ternary complex on the ribosome, between the Domains I and III of the factor, blocking the dissociation of the elongation factor from elongator tRNA and ribosome during decoding. Here the authors use beautiful single-molecule and structural approaches to probe the mechanisms of two related drugs-Didemnin B and Ternatin-4. Their results expand on prior observations of drug mechanism, and provide clarity for the similarities and differences on how the two drugs work both in vitro and in vivo. Using single-molecule tRNA-tRNA FRET, the authors show that the drugs (at saturating concentrations) block progression of the tRNA from a mid-FRET (GTPase activating) state to the fully accommodated (high FRET) state; they observe slightly more transitions to high FRET in the presence Ternatin-4 than Didemnin B (more below on this). These results are consistent with the idea that the drugs trap the ternary complex on the ribosome after GTP hydrolysis. Using the fraction of ribosomes that lead to accommodation, the authors performed a titration to determine the apparent Ki for the drugs (which were similar in the range of 5-10nM). They also performed clever washout experiments (always in presence of cycloheximide to block further conformational dynamics once a tRNA accommodates). These experiments probed the drug dissociation rate and showed marked differences between Didemnin-B (slow rate) and Ternatin-4 (faster rate). The authors then recapitulate the prior structural work (at lower resolution in RRL) using a reconstituted system. Their results show a similar structure as that solved previously, but with more disordered loops in the presence of ternatin-4, although the resolution here is moderate (3.2 and 3.8Å for the two drug complexes). Finally, the authors perform in vivo analyses of drug action on protein synthesis using clickable amino acid incorporation. They show that the two drugs block protein synthesis in a dose dependent manner, and that the effect of ternatin-4 can be reversed by washout of the drug, whereas that of didemnin-2 is poorly reversed, explaining differences in drug action despite the similar binding site.

      Overall, this is a rigorous and well performed study probing the mechanisms of drug action in human translation elongation. The combination of dynamics measurements and structure are particularly novel, and will complement ongoing investigations (and publications) by the Blanchard lab on human elongation in general.

      We thank Reviewer 2 for their assessment of this work as rigorous and novel.

      Reviewer #3 (Public Review):

      In this article, Juette et al employed single-molecule FRET, cryo-EM, and Hpg incorporation (in cell translation assays) to compare the mechanisms by which Didemnin B and Ternatin-4 inhibit translation elongation. They found that, while binding to the same pocket of eEF1A and blocking accommodation after GTP hydrolysis, Didemnin B had an irreversible effect on protein synthesis, but Ternatin-4, while still a potent inhibitor, allowed more flexibility in complexes (increased disorder of regions in cryo-EM structures) that allowed increased sampling of on-pathway accommodated states (observed by smFRET), and reversibility of effects on protein synthesis in cultured cells (by Hpg incorporation). This is a straightforward study and the conclusions are well-supported by the data using appropriate techniques. The work will be of impact to the ribosome field, which may use these drugs in other mechanistic studies, and researchers wanting to employ the drugs to combat cancer and other diseases.

      We are thankful to Reviewer 3 for their assessment of this work as well-supported and impactful.

    1. Author Response

      Reviewer #2 (Public Review):

      I have only one concern with the study. I am not fully convinced that the disruption of behavioral updating is specifically due to NA signaling within OFC. In the first two studies, they observed non-specific anatomical effect likely due to the ablation of fibers of passage through OFC. The DREADD experiment is claimed to allay this concern. However, the DCZ was injected systemically. This means that any collaterals of LC NA neurons outside OFC will also be suppressed. While the lack of effect with the mPFC projection is interesting, this does not preclude an effect mediated in other target regions. Overall, I believe that none of the experiments truly demonstrate a specific effect of NA in OFC. A few experimental options that could be considered are injection of DCZ directly in OFC, optogenetic inhibition of fibers in OFC, or pharmacological disruption of NA signaling in OFC.

      The other options are to measure the effect of the toxin ablations from experiments 1 and 2 not just in mPFC but in other regions. If the non-specific effect is truly only in mPFC outside of OFC, that would lead to more confidence that mPFC projection is the only other viable pathway mediating the effect.

      As requested, we have quantified the effect of toxin ablations in neighbouring cortical regions known to be involved in the goal directed behavior, namely the insular cortex (IC, e.g., Balleine & Dickinson, 2000; Parkes & Balleine, 2013) the medial orbitofrontal cortex (MO, e.g., Bradfield et al., 2015; Gourley et al., 2016) and secondary motor cortex (M2, Gremel et al., 2016). Briefly, we found that injection of the saporin toxin in the VO and LO (Experiment 1) led to a significant decrease in NA fiber density in all examined regions. Injection of 6-OHDA also produced significant loss of NA fibres in MO and M2 but not insular cortex. These results are presented in Suppl. Figures 1 and 3 (pages 28 and 30) and the statistics are reported in the main text (page 6 and page 11)

      We have also added the following to our discussion on the reason for the off-target depletions that we observed and acknowledged the potential role of collateral LC neurons:

      Page 21, line starting 374: “The use of the saporin toxin led to a dramatic decrease of NA fiber density in all analysed cortical areas (Suppl Fig 1). This may be due to diffusion of the toxin from the injection site, the existence of collateral LC neurons and/or fibers passing through the ventral portion of the OFC but targeting other cortical areas (Cerpa et al 2019). However, injection of 6OHDA led to much less offsite NA depletion suggesting that a large part of the previous observation is toxin-specific. Indeed, no significant loss of NA fibers was visible in the insular cortex, which has been previously implicated in goal-directed behaviour (Balleine & Dickinson, 2000; Parkes et al., 2013; 2015; 2017). We did nevertheless observe an offsite depletion in more proximal prefrontal areas (prelimbic and medial orbitofrontal cortices) albeit a more modest depletion that what was observed using the saporin toxin. Several studies have described the projection pattern of LC cells. These studies, using various techniques, indicate that LC cells mainly target a single region, and that only a small proportion of LC neurons collateralize to minor targets (Plummer et al., 2020, Kebschull et al 2016, Uematsu et al 2017, Chandler et al 2014). Therefore, even if the OFC noradrenergic innervation is presumably specific (Chandler et al 2013), we cannot rule out a possible collateralization of some neurons toward neighbouring prefrontal areas (PL and MO). We have previously discussed that the posterior ventral portion of the OFC is an entry point for LC fibers en passant, which ultimately target other prefrontal areas (Cerpa et al 2019).

      To achieve a greater anatomical selectivity we used a CAV-2 vector carrying the noradrenergic promoter PRS to target either the LC:A32 or the LC:OFC pathways (Hayat et al., 2020; Hirschberg et al., 2017). It has been shown that the CAV-2 vector can infect axons-of-passage, however the vector does not spread more than 200 µm from the injection site (Schwarz et al 2015). Therefore, when targeting the OFC we injected anteriorly to the level where the highest density of fibers of passage is expected (Cerpa et al 2019) in order to minimize infection of such fibers and restrict inhibition to our pathway of interest.

      Overall, the current behavioural results are in line with our previous work showing that the ability to associate new outcomes to previously acquired actions is impaired following chemogenetic inhibition of the VO and LO (Parkes et al., 2018) or disconnection of the VO and LO from the submedius thalamic nucleus (Fresno et al 2019). These results point to a necessary role of the ventral and lateral parts of the OFC and its noradrenergic innervation for updating A-O associations. However, it is worth mentioning that different subregions of the OFC, both along the medio-lateral and antero-posterior axes of OFC, display clear functional heterogeneities (Dalton et al 2016, Izquierdo 2017, Panayi & Killcross, 2018, Bradfield et al 2018, Barreiros et al 2021). Therefore, while we have previously focused on the anatomical heterogeneity of the noradrenergic innervation in these prefrontal subregions (Cerpa et al 2019), a thorough characterization of its functional role in each of these subregions still needs to be addressed.”

      One last concern is that the lack of the effect due to disruption of the mPFC projection is not guaranteed to not be from experimental issues. If the authors have some evidence that the mPFC projection disruption produced some other behavioral effect, that would make the lack of effect in this case more convincing.

      Unfortunately, we do not provide evidence in the current paper that disrupting the LC:mPFC (now termed LC:A32 in the current study, based on the recommendation of reviewer 1) projection produces some other behavioural effect. However, in an on-going series of experiments, using the same tools as the current study, we found that inhibiting the LC:A32, but not LC:OFC, pathway impairs Pavlovian contingency degradation as shown in the figure below. We therefore believe that the failure of LC:mPFC pathway inhibition to effect outcome identity reversal in the present study is not due to experimental issues. Please note that in the figure below mPFC is referred to as area 32 (A32), as requested by reviewer 1.

      Figure 1. A) Experimental timeline for the Pavlovian contingency degradation procedure. Prior to behavioural training, rats were injected with CAV2-PRS-hM4D-mCherry into either the vlOFC or area 32 (A32). Number of food port entries during the non-degraded CS and degraded CS for rats injected with vehicle and rats injected with DCZ during degradation training (B, D) and the test in extinction (C, E). Inhibition of the LC:vlOFC had no effect on Pavlovian contingency degradation, whereas inhibition of LC:A32 during degradation training rendered rats insensitive to the change in the causal relationship between the CS and the US.

      Reviewer #3 (Public Review):

      I would be curious about the authors' thoughts regarding the recent Duan ... Robbins Neuron paper (https://pubmed.ncbi.nlm.nih.gov/34171290/), in which marmosets displayed paradoxical responses to VLO inactivation and stimulation in contingency degradation tasks. Are there ways to reconcile these reports?

      We previously argued that the updating processes underlying changes in causal contingency versus outcome identity may be supported by different prefrontal regions (Cerpa et al., 2021, Behav Neurosci). Unfortunately, the tasks used in the current study do not allow us to test if our rats are sensitive to changes in the action-outcome contingency. In fact, the effect of inactivation (or overactivation) of the ventral and lateral regions of OFC on an instrumental contingency degradation task similar to that used in Duan et al (2022) has not yet been examined in rats.

      Indeed, while it is stated in Duan et al (2022) that rats with lesions of lateral OFC are insensitive to contingency degradation, none of the citations provided support this conclusion (Balleine & Dickinson, 1998; Corbit & Balleine, 2003; Ostlund and Balleine, 2007; Yin et al., 2005). Balleine and Dickinson (1998) assessed the effect of prelimbic and insular cortex lesions (insular anteroposterior coordinate +1.2), with only the former affecting instrumental contingency degradation. Ostlund and Balleine (2007) assessed the effect of orbitofrontal lesions on Pavlovian contingency degradation (degradation of the S-O contingency) not instrumental contingency degradation. Finally, Corbit and Balleine (2003) and Yin et al (2005) assessed the effect of prelimbic and dorsomedial striatum lesions, respectively. Nevertheless, there are some reports on the effect of chemogenetic inhibition of VO/LO on degradation in a nose-poke response task but the results are conflicting (e.g., Whyte et al., 2019; Zimmerman et al., 2017; 2018). It would be very interesting to study the impact of both inactivation and overactivation of VO and LO in rats to compare with the results found in marmosets, using comparable tasks.

      We have added the following to our discussion, which cites Duan et al (2022) and the need to better understand the role of VO and LO in contingency degradation.

      Page 24, line starting 450: “However, it is not yet clear if the NA-OFC system is also involved in detecting the causal relationship between an action and its outcome (see Cerpa et al., 2021 for a discussion). Some have reported impaired adaptation to contingency changes following inhibition of VO and LO or BDNF-knockdown in these regions (Whyte et al., 2019; Zimmerman et al., 2017), while another study shows that inhibition of VO/LO leaves sensitivity to degradation intact, at least during an initial test (Zimmerman et al., 2018). Interestingly, a recent paper in marmosets demonstrates that inactivation of anterior OFC (area 11) improves instrumental contingency degradation, whereas overactivation impairs degradation (Duan et al., 2022). The potential role of the rodent ventral and lateral regions of OFC, and the NA innervation of OFC, in adapting to degradation of instrumental contingencies requires further investigation.”

  4. Sep 2022
    1. Author Response

      Reviewer #2 (Public Review):

      The role of cMAF in the formation of iNKT10 is only suggested ny the transcriptional signatures analyzed here. There is no direct evidence that cMAF is indeed needed to generate iNKT10. This should be investigated.

      We thank the reviewer for their comments on the link between IL-10, NKT10 cells, and cMAF. We agree that our study provides evidence that cMAF is a promising candidate regulator of IL-10 production by iNKT cells, and we attempted to address this using gene-specific knockout mice. Since mice lacking expression of cMAF exhibit post-natal mortality and severe developmental defects13⁠⁠ we attempted to breed Maffl/flCd4-cre mice, which have previously been used to study the role of cMAF in T cell function14⁠⁠. However, we were not able to successfully breed enough of these mice to assay whether or not cMAF is required for the production of IL-10 by iNKT cells. Therefore, our study can only suggest that cMAF is a promising candidate regulator of NKT10 cells based on our transcriptomic data and flow cytomery data showing that production of IL-10 is associated with expression of cMAF. However, we present further correlative or indirect evidence to this effect. It has previously been demonstrated that restimulation of activated iNKT cells at 72 hours post-⍺GalCer results in increased production of IL-10 compared to the stimulation of iNKT cells at steady state15⁠⁠. We found that the frequency of splenic cMAF+ iNKT cells was greatly increased at 72 hours post-⍺GalCer compared to steady state (Figure S3B, Figure S3D) and this increase in expression of cMAF correlated with increased production of IL-10 (Figure S3E-S3F). Therefore, we believe that cMAF is a promising candidate for future work examining the functional landscape of NKT10 cells and we anticipate that our study will be a useful transcriptomic reference for such studies.

      The Kronenberg group recently published a similar analysis, using RNAseq and ATACseq. Although I don't believe the cMAF signature was highlighted at the time, one could argue that this previously published study dampens the originality of this manuscript. Although this study (Murray et al.) is clearly acknowledged, the similarities and differences in both the methodology and findings should be clearly discussed.

      As the reviewer stated, the excellent study by Murray et al. (2021) did not identify or highlight a population of cMAF+ iNKT cells expressing a regulatory gene signature, as presented in our study, and as the reviewer mentions, we cite and discuss the Murray et al. study in our manuscript. We believe that both studies together provide a comprehensive transcriptomic analysis of iNKT cells after activation, and that ours provides unique insight not found in Murray et al. Our study uses scRNA-Seq rather than bulk RNA-Seq or bulk ATAC-Seq methods, enabling us to study transcriptomic characteristics of activation among heterogeneous iNKT cell subsets without needing to sort pre-identified iNKT cell populations or subsets. It is the use of unbiased scRNA-Seq that allowed us to identify cMAF+ iNKT cells, since this population has not been previously described in the literature. Notably, we also sequenced the largest number of iNKT cells to date, 48,813 cells, to the best of our knowledge, which provides deeper insight. We also performed transcriptomic characterization of activated iNKT cells at different stages of activation to those characterized by Murray et al. Importantly, we profiled the phenotype of iNKT cells at 4 hours post-⍺GalCer and 72 hours post-⍺GalCer, when iNKT cells engage in a rapid cytokine production or undergo proliferation and expansion. This revealed several novel transcriptional insights including rapid metabolic gene reprogramming that occurs twice during this activation timeline. By contrast, Murray et al. focused on analysis of iNKT cells at steady state and 6 days post-⍺GalCer. Finally, we performed transcriptional characterization of adipose iNKT cells in our study, which are known to represent an unusual regulatory population of iNKT cells at steady state16⁠⁠, whereas the study by Murray et al. (2021) did not study adipose iNKT cells. Therefore, we propose that our study complements the excellent work performed by Murray et al. (2021) but provides novel insight in terms of focus, discovery, and scope.

      The authors should clearly describe the genes that were used to define iNKT1/2/17 identity in their study. This is important in order to track that identity over time following activation, at it is well known that the expression of some of the markers typically used change following activation. This would bring clarity to the manuscript.

      We agree with the reviewer and we had originally removed two clarifying figures for iNKT cell subset identification due to space, but now we have included these two clarifying supplemental figures (Figure S4, Figure S5) to illustrate how we identified NKT1, NKT2, and NKT17 cell subsets in our scRNA-Seq data. We have also added further details to the Methods section (please see the “Downstream scRNA-Seq data analysis” section) and we have changed the title of the activated iNKT cell data in Figure 2A-2D and Figure 4C from “4 hours post-⍺GalCer” to “Activated (4 hours post-⍺GalCer)” to reflect our subset identification protocol as accurately as possible (please see below).

      Steady state and activated splenic NKT1, NKT2 and NKT17 cell subset identification was performed as follows: We identified spacial separation and graph-based clustering of five main populations of cells at steady state (Figure S4A). We then used the expression of the published marker genes Tbx21, Zbtb16, Rorc and Mki67 to identify NKT1, NKT2, NKT17 and Cycling cells (Figure S4B). We identified spacial separation and graph-based clustering of three main populations at 4 hours post-αGalCer (Figure S4D). However, we found that Tbx21 and Zbtb16 expression was increased across multiple clusters and did not effectively demarcate NKT1 and NKT2 cells at the RNA level (Figure 2B), and so we instead used the flagship cytokines Ifng, Il4, Il13 and Il17a and Il17f to demarcate NKT1, NKT2 and NKT17 cells. We then combined the identified NKT1, NKT2 and NKT17 cell populations from steady state and 4 hours post-⍺GalCer together (i.e. cells from the same subset at the two different time points were combined together) and performed reclustering of the cells within each subset (Figure S5). It has previously been shown that there can be differences in the activation kinetic of different splenic iNKT cell subsets, for example NKT1 versus NKT2 cells, which may be in part due to physical localization, for example in the red pulp versus the white pulp of the spleen (see Lee et al. 2015)17⁠. We observed a similar phenotype for NKT2 cells in our data, whereby a proportion of NKT2 cells at 4 hour post-⍺GalCer clustered with NKT2 cells from mice that received no ⍺GalCer (Figure S5A). To prevent differences in activation kinetic from biasing our analysis of transcriptional signatures of iNKT cell subset activation, we performed low-level graph-based reclustering within each iNKT cell subset to accurately segregate activated and steady state iNKT cells (Figure S5B). We validated our reclustering using the expression of activation markers and flagship cytokines (Figure S5C-S5D). Finally, these reclustered subset data were recombined and renormalized to generate the final analysis as shown in Figure 2 of the manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      Champer et al. evaluate two homing drives that have been developed in the Anopheles mosquito. Variants of one of these (zpg) are possibly being further investigated for an eventual release. Work with the other has seemingly been discontinued because of unintended fitness costs. The authors argue that this second drive may be in fact better if the experimental results are interpreted more favourably. An important point if true, but somewhat separate from the findings in the paper. To a large extent, this point could be made without any of the results in the paper. However, the authors do show through modelling that this difference may in fact be relevant.

      This careful justification of the model parameters increases its relevance to the evaluation of those specific gene drives. The zpg drive will likely be extensively investigated and the specific relevance of this work is a valuable contribution. While a range of parameters is tested for each expression pattern, there are no step-by-step investigations of how the drive outcomes are effect by changes to the underlying DNA-repair/deposition/fitness parameters. So while a reader may learn one drive is better than the other, the ability to get a deeper understanding of the underlying relationship is limited. This means this work has a more limited scope and relies on the relevance of the chosen parameters. In that regard, there may be room for improvement. The chosen parameters for zpg and nos may not be completely fair in regards to the target site and I believe this needs to be addressed.

      The second aspect of this paper is the comparison between the commonly used panmictic modelling approach and spacial models. This also somewhat relies on the drive parameters being chosen well, as a more comprehensive evaluation of the spacial approach has been done in prior work by this group. However, showing that these particular extremely efficient drives may still struggle when additional spacial factors are considered is useful and relevant. That a second Anopheles-specific spacial model further reduces the drive performance is a relevant finding. This is helped by a specific analysis of the effect of changes to the migration rates and the low-density growth rate. This spacial modelling also has relevant findings for the homing X-shredder design.

      In our previous study (Champer, Kim, et al, 2021 in Molecular Ecology) that we reference, we varied some of these drive performance parameters, which may address some of the reviewer’s concerns. We view this study as building off that one, but with a more specific focus (mosquitoes and existing drives). We also now discuss how using parameters for a different target site may have affected our results (see below - nos may actually have been shortchanged since zpg performs better at dsx than at nudel).

      Reviewer #2 (Public Review):

      Champer and colleagues present forward simulations of several gene drive systems that have been designed to suppress the malaria mosquito, Anopheles gambiae. These gene drives have all been validated in laboratory cage experiments but have not yet progressed to field trials. The authors are particularly concerned with the phenomenon of "chasing," in which local success of the drive will lead to continuous cycles of recolonization by wildtype mosquitoes, preventing complete suppression of the population. In addition to their spatially-explicit model, they additionally present results from a model in which the parameters are tuned to the ecology of the mosquito.

      Though there are a few additions that would improve the manuscript, the authors achieved these aims and their conclusions are supported by their modeling, which appears to be technically sound and well executed.

      Strengths:

      The work represents a useful, model-based comparison of the various Anopheles gene drives that have had success in laboratory conditions. With the incorporation of spatial dynamics, the authors are thereby able to focus on the problem of chasing, or a fluctuating equilibrium state that is impossible to study in laboratory colonies. Through a comparative framework, the authors additionally provide key information on the differences in predicted success between the various gene drive systems. For these reasons, the work will be a useful addition to the other published forecasts of gene drive success. Given the importance of the topic to diverse stakeholders that vary in their familiarity with gene drives and ecological modeling, I was glad to see the authors summarize their findings cogently and accessibly.

      We thank the reviewer for these kind comments.

      Weaknesses:

      The main area in which the manuscript could be strengthened is the description of the Anopheles-specific model. Based in part on the differences observed between their discrete generation model and Anopheles-specific model, the authors correctly note that "the outcome of a drive release could be very sensitive to the precise ecological characteristics of the targeted population." It was sometimes unclear which model parameter choices were informed by literature and how much confidence was had in each. A more explicit summary or perhaps a table of parameters, references, and estimates with confidence ranges informed by the authors' knowledge of literature would strengthen this section.

      We now have added Table S1 showing all model parameters. The parameters themselves often require more text for justification than just references, but we have improved our methods section throughout to increase the visibility and clarity of these sections. Note that mosquito ecology is a fairly understudied field, resulting in widely varying parameter ranges throughout different studies, so it is difficult to provide confidence ranges for our parameters, other than that they are designed to fall within estimates from different studies (see “Anopheles-specific spatial model” methods subsection).

      In the framing of the work, the authors imply that their modeling study "suggest(s) an alternative interpretation of [the] performance [of homing gene drives]" from recent studies (e.g., Simoni et al. 2020 and Kyrou et al. 2018). I am not certain this framing is justified, given the original authors' circumspection in correctly noting their drives had success in the cage experiments, without claiming they would be successful in the wild. I would prefer this study be presented as building on those previous studies and extending their work.

      In crafting this sentence, we had in mind very specific technical interpretations of drive performance mechanisms (paternal deposition vs. more somatic fitness cost, existing of somatic fitness cost in nos males) rather than general performance in any given environment. To more clearly convey our intended meaning here, we have adjusted our wording. The sentence now reads: “Here, we analyze data associated with each of these gene drives and consider both the original and alternative interpretations of these drives’ characteristics and performance parameters.”

      Likely impact:

      This work will be of interest to research scientists whose interests range from transgenic mosquitoes to ecological modelers to post-release assessment. The authors correctly note that additional refinement of the ecological parameters will increase the utility of the model, but the framework as it stands will be an important contribution to the literature. Given the timeliness of this topic, the subject is of interest to other stakeholders in the regulatory or policymaking realm, as well as governmental and funding agencies deciding between gene drive systems.

      Reviewer #3 (Public Review):

      This is a computational modeling study to evaluate the merits (likely success) of different 'suppression' gene drive systems. Gene drives offer a possible simple and low-effort means of suppressing or even extinguishing pest populations. Using CRISPR technology, several gene drive systems have been developed in the last decade for key mosquito vector species. As no gene drive has been approved for release in the wild, efforts to evaluate their likely success are limited to cage trials and modeling, the latter as done here. In contrast to some modeling studies, the effort here is to develop and analyze models that match the gene drive and mosquito biology closely. The models are thus parameterized with values representative of what is known about mosquito biology and of the various gene drive constructs that have been developed for lab studies.

      In these models, gene drive success or failure in population suppression largely depends on (i) how well the drive spreads throughout the population, and (ii) whether the population persists because of a type of ongoing spatial 'group selection' in which local pockets invaded by the drive die out and are then repopulated by migrants lacking the drive. Formal evolution of functional resistance is not allowed. The numerical results show striking differences in suppression success with different gene drive constructions, and these differences are likely to be of use when designing drives for actual releases.

      The basic group selection outcome that allows population persistence amid a suppression gene drive has been shown before, as cited in the ms. The novelty provided by the present study is to tie the models to the biology of known gene drive constructions. Given the high specificity of the models, the audience for this work is likely to be somewhat narrow, confined to those involved in gene drive design. The work is nonetheless significant in view of the strong potential of gene drives in global public health efforts.

      The software used to generate the trials is freely available from one of the authors for anyone wishing to repeat the simulations. There is an extensive supplement of results referenced (but not otherwise included) in the main text.

      We thank the reviewer for these comments, and we note that an analysis of functional resistance was performed in our previous study. Because the results of this study are likely to be fully applicable to our new results, we did not repeat it with our mosquito model and with our parameterized drives. However, it is certainly an important topic, and we explicitly mention in the discussion how the possibility of chasing requires further consideration in the acceptable rate of functional resistance allele formation. The text reads, “We did not consider the possibility of.... functional resistance, which can evolve more readily during lengthy chases11.”

    1. Author Response

      Reviewer #1 (Public Review):

      Overall, it is an interesting work exploring stochastic and deterministic aspects of embryonic cell division in plants. The power of the authors' approach lies in the quantitative analysis of 3D cell geometries combined with quantitative computer modelling.

      I am a bit confused about how authors relate stochasticity as an emergent property of a deterministic process. Typically, stochasticity is the low-level process resulting in variation of subcellular components those also related to the positioning of the cell division plane. Perhaps a more elaborated and clearer connection between stochasticity at the subcellular level and phenotypic variability should be provided.

      Actually, our interpretation does not directly relate variability in division patterns to a deterministic selection of division plane orientation. A key intermediate between these two scales is the variability in cell geometry. Based on our results, we propose that the selection of division plane orientation would obey a deterministic principle based on geometrical constraints. Variability in cell division patterns would ensue from the expression of this deterministic rule in the variable context of cell geometries. Variability in cell geometry would itself result from noise in the precise positioning of the division plane along the optimal, deterministic orientation. We have added a new summary Figure 10 to illustrate and clarify this interpretation.

      I have a number of specific questions/concerns that I would like the authors to address as listed below:

      1) Major variability of cell shapes is observed in the apical domain as opposed to the basal domain. What would be an underlying principle to asymmetric shaping of the apical-basal domain? The authors describe beautifully the observations but give relatively little discussion on this matter, leaving the reader guessing.

      We added a new paragraph (before the last) in the Discussion on this point. The origin for a larger variability in the apical than in the basal domain can be found in part in the different cell shapes present in the two domains at stage 16C. Along the path tetrahedron -> triangular prism -> cuboid, the apical domain indeed appears farther from the final absorbing state represented by the cuboid shape, hence more time will be spent in the intermediate shapes in this domain. In addition, the tetrahedral and triangular prism shapes are closer to rotational symmetry and thus represent a larger source of variability in division plane orientation. Lastly, apical and basal cells have distinct environments, the basal cells being constrained between the suspensor, on one side, and apical cells, on the other. We can only speculate about the functional significance, if any, of the larger variability in the apical domain. For example, we can relate it to the future morphological transition that will characterize the apical domain with the emergence of the cotyledons at the heart stage. A variable pattern of cell walls could be required to establish a specific mechanical pattern at the tissue scale to favor this shape transition.

      2) Authors used graph theory to explain variability in cell division for the same topological feature. In light of quite a discrepancy between predictions and observations (i.e., Figure 4C) question arises of how this prediction could be affected by undergoing cell expansions as this element I believe is neglected in their graph theory approach?

      It is indeed the case that cell edge lengths are ignored in the graph-theoretical approach described in Section 2.4. In this part of the paper, our objective was to objectively test if non-topological factors were implicated in the determination of division orientations. The rationale was thus to compare observed patterns to predictions obtained using topological information only. The strong discrepancy between observations and predictions (Figure 4C) confirmed that topology alone was not sufficient to predict observed patterns. The integration of cell geometry (including edge lengths) into the predictions is considered in the next sections (Sections 2.5 and 2.6).

      We believe the modifications we made in Section 2.4 to answer Reviewer 3’s comment on this part (see below) should make this point clearer.

      3) Tetrahedron shape repeats in only 4% of embryos at the 16-cell stage. What could be the criticality of this shape for the entire embryo patterning?

      At the 16C stage, there are four domains (apical/basal x inner/outer) represented each by exactly one cell shape. The triangular prismatic shape is observed in two domains (outer apical and inner basal). The two other cell shapes, cuboid and tetrahedron, are specific to the two other domains, respectively outer basal and inner apical. Hence, the tetrahedron shape represents 25%, not 4% of cell shapes at the 16-cell stage, a proportion large enough to potentially impinge on embryo patterning.

      Our graph analysis shows that, under a topologically random regime of cell division, the tetrahedron shape should progressively vanish because, due to 4-way junction avoidance, a tetrahedron cannot divide into two tetrahedra and because divisions of triangular prisms and cuboids generate a minor proportion of tetrahedra only in comparison with the other cell shapes (Figure 4C). In addition, our analysis also shows that the triangular prismatic shape is a necessary intermediate to transit from a tetrahedron to a cuboid shape.

      Altogether, the presence of the tetrahedron shape in the inner apical domain at 16-cell stage could be responsible for the large variability in cell shape subsequently observed in this domain. (See also our answer to Comment 1 of the Reviewer).

      4) It is not clear to me whether stochastic cell division modelling takes into account the mechanical influence of adjacent cells? In any case, authors should discuss how this could potentially affect their analysis.

      Indeed, our stochastic cell division model only takes into account the geometry of the mother cell, ignoring the possible influence of the environment of the cell within the tissue (through mechanical signals or other, such as hormonal signals) - except of course for indirect effects that the environment could exert on cell shape. The possible mechanical influence of the cell environment was already discussed in the original version of our manuscript (last paragraph of the Discussion). In particular, we mentioned that the specific localization deep inside the embryo of the inner basal cells could mechanistically influence the positioning of the division plane, thus explaining the strong discrepancy between observations and predictions in this domain. This negative result illustrates how the cell-autonomous model can be useful in pointing to possible environmental influence (or alternative geometrical rules yet to be identified) and in suggesting future directions of investigation.

      To remove any ambiguity, we have made more explicit the cell-autonomous nature of the model (Section 2.5 and Material and Methods).

      5) Authors should perform model parameter sensitivity analysis (i.e., position of surfaces) to confirm the convergence and robustness of their approach.

      In the first version, we already reported in Supplementary Figures S8 to S15, for each domain and each orientation of division, the distribution plots (surface area x distance to cell center) of simulations performed in different cells. In each of these figures, the cells shared the same shape (cuboid, triangular prism or tetrahedron) but differed in their exact geometry. As can be seen from these graphs, similar point distributions were obtained for different cells and, more importantly, the simulations matching best with observed patterns shared the same relative localization within these distributions (except of course when the geometrical rule was not valid, as in the basal inner domain). Therefore, these results already provide a sensitivity analysis to shape fluctuations. To make this point more explicit, we now show in a new Supplementary Figure S12 the 3D shapes associated with the first set of graphs (basal outer domain; Figure S11) and added reference to this figure in Section 2.5.

      In addition to biological variability, possible minor errors and uncertainties at the image processing and segmentation step may also affect mother cell geometry. To illustrate the robustness of our approach to this potential source of geometrical variability, we have added a new Supplementary Figure S10 showing the distribution plots of simulations performed within a raw mother cell mask and within its mask following filtering using a mathematical morphological opening with radius of 1, 2, or 3 voxels. Morphological opening is an image processing operation that smoothes binary objects, removing extrusions having a radius smaller than the prescribed radius. The obtained results show the robustness of our results to such alterations of mother cell geometry. In the four conditions (R=0,1,2,3), the simulated patterns matching best with the observed pattern are located at the same bottom left position of the plot, corresponding to the geometrical rule.

      Lastly, we have also added a new Supplementary Figure S9 to illustrate the reproducibility of simulations results obtained within a given mother cell. In this figure, we show the distribution plots for two independent sets of 1000 simulations each. The graphs show similar distributions, with identical locations at the bottom left of the distribution of the simulations that best matched with the observed division plane.

      Concerning the convergence of the 3D cell division computer model, we have added in Section 4.3 (Material and Methods: Computer modeling of cell divisions) the justification about the number of Monte Carlo cycles. We have added a new Supplementary Figure S7 illustrating the convergence of the algorithm over different independent runs. We also corrected a typo on the number of Monte Carlo cycles (which was 500 instead of 5000 as initially written).

      Reviewer #2 (Public Review):

      This is an interesting manuscript aiming at identifying minimal rules that account for cell divisions in early Arabidopsis embryos. This research has two main strengths. The authors consider cell division in 3 dimensions, whereas most other studies on the orientation of cell divisions are restricted to 2 dimensions. Based on their observations, the authors proposed that the previously proposed probabilistic rule for cell division can be replaced by a deterministic rule, with sources of stochasticity coming from irregularities/imperfections in cell geometry. The manuscript is overall well-written. I nevertheless have a few concerns.

      1) What is the effect of embryo fixation on cell geometry? Could the irregularities be an artefact due to fixation? How robust are the conclusions to numerical perturbations of the position of cell surfaces?

      We used the fixation and staining protocol developed by one of us (JCP) (Truernit et al 2008). Yoshida et al. Developmental Cell (2014) used this same protocol, which they validated by comparison with live imaging data. The fixation and the following treatment could have an impact on cell geometry. For this reason, we have selected among a thousand embryo acquisitions, the embryos that are not or very few damaged with this treatment. The robustness of our results and conclusions to variability and alterations of cell geometries was also questioned by Reviewer #1. Please see above our in-depth answer to this point.

      2) Section 2.7 on attractor patterns is essentially descriptive and the conclusions seem to be based on qualitative observation of a few cases. Can the authors support them with quantitative measures? Or with simulations?

      We have completed this section with quantitative data when it was missing. In the apical outer domain, we had 135 observations at G6, which had been reached from G4 according to one or the other of the two main pathways shown in Figure 9A. These two possibilities accounted for 40% and 42% of observations, respectively.

      The other attractors shown in Figure 9 are rare cases (Fig. 9B: 1 case over 173; Fig. 9C: less than 9 cases over 309). The case shown in Figure 9B was previously documented in Scheres et al 1995 (cited in the manuscript).

      The lower frequency in the basal domain of alternative sequences leading to a same attractor pattern is consistent with the lower variability in this domain. However, the conclusion is the same as in the apical domain where the distribution between alternative sequences is more balanced: different sequences of division over several generations can lead to similar cell patterns.

    1. Author Response

      Reviewer #2 (Public Review):

      Fibular hemimelia (FH) is a rare genetic disorder with unknown mechanisms. In this study, the authors generated Axin1 conditional knockout (cKO) mice by depleting Axin1 gene specifically in Prx-1 expressing mesenchymal cells and demonstrated that Axin1 cKO mice developed FH phenotype with various severities. FH phenotype in Axin1 cKO mice can be rescued by either β-catenin or BMP inhibition if the inhibition was applied to the pregnant mother at E9.5 to E12.5. For mechanistic study, the authors showed elevated expression of BMP signaling molecules in limb tissue of Axin1 cKO mice and Axin1 regulated the degradation of pSmad5 in mesenchymal cells.

      The study has many strengths. 1) The study was performed with high rigor. Utilization of various cre lines to conditional KO Anix1 in different cell types to formally demonstrate the expression of Anix1 in Prx-1-expressing mesenchymal cells, but not in Sox9-, Col2-, and Osx-expressing cells is required for normal fibular development. 2) Treatment of Axin1 cKO mice with β-catenin and BMP inhibitor at different time points to demonstrate that inhibitors should be given during the early embryonic development, a very important point for considering the translational potential of the study. 3) Detailed in vitro experiments were performed to investigate the molecular mechanisms of Axin 1 on Smad 5 stability. 4) Both β-canenin and BMP signaling pathways are important, including skeletal development, this study used Axin1 cKO mice to integrate these two pathways together, which is a important and new contribution.

      Weaknesses of the study have been described below:

      1) Authors need to report/describe findings/pheotypes in bones other than fibula in Axin1 cKO mice (4-8-week-old) first, and then focus on fibular development. From the X-ray data shown in Figure 1D-E, it appears that Axin1 cKO mice have high bone mass or osteopetrosis. Thus, histology of bones (femur, tibia, knee joint) other than fibula should be provided.

      We have performed histology in femur, tibia and knee joint in Axin1 KO mice as the reviewer suggested.

      2) Fig. 2 described Axin1/2 dKO mice. I suggest to remove Figure 2 or move it to supplemental data. Including Axin1/2 dKO mice in the main text makes the story complicated and difficult to explain because the most of figures in this manuscript were on Axin 1, such as rescue experiments and molecular mechanistic study. Further, various severity of FH in Axin1 cKO mice are closer to human FH cases (various severity) than Axin1/2 dKO mice that have a completed loss of fibula. The title is also on Axin1. If Axin1/2 dKO mouse data are included in the main text, authors need to provide molecular explanation why Axin1/2 dKO mice have more severe phenotypes.

      To make the entire story more straightfoward, we have removed the Axin1/2 double KO data (Fig. 2) as the reviewer suggested.

      3) Please include a paragraph in the discussion regarding the limitation of the study. Is there any human report that FH patients have mutation in Axin 1 and its related downstream signal proteins such as β-catenin and BMP? Can FH being directed before birth and to treat pregnant mother? Do authors plan to use unbiased approaches such as RNAseq or proteomics to discover new gene/proteins that are regulated by Axin1 in mesenchymal cells?

      We have added a paragraph to discuss the limitation in the discussion section as the reviewer suggested. We have collaborated with Dr. Qinglin Kang and collected 9 samples from patients with FH disease and identified a mutation of β-catenin gene, encoding a potential phosphorylation site, which may lead to upregulation of β-catenin protein levels. In the future, we will investigate if the mutation of β-catenin affects its function in mesenchymal cells. We are currently planning to perform the RNA-Seq and proteomics experiments to identify novel downstream target gene(s) of Axin1.

    1. Author Response

      Reviewer #1 (Public Review):

      The present study by Zander et al. aims at improving our understanding of CD4+ T cell heterogeneity in response to chronic viral infections. The authors utilize the murine LCMV c13 infection model and perform single cell RNA seq analysis on day 10 post infection to identify multiple, previously unappreciated, T cell subsets. The authors then go on and verify these analyses using multi-color flow cytometry before comparing the transcriptome of CD4 T cells from chronic infection to a previously generated data set of CD4 T cells obtained from acutely-resolved LCMV infection.

      The analyses are very well done and provide some interesting novel insights. In particular, the comparison of CD4 T cell subsets across acute and chronic infections is very exciting as they provide a very valuable platform that can answer a long-standing question: do CD4 T cells in chronic infection undergo exhaustion similar to CD8 T cells. While this has been proposed for an extended period, this new dataset by Zander et al. can provide some novel insights by comparing individual cell subsets cross-infection. The manuscript would, however, benefit from a more extensive analysis and focus on this interesting point.

      We thank the reviewer for their time and careful assessment of our manuscript. We were happy to hear that the reviewer found our work interesting.

      On that note, the authors should take advantage of more accurate and present gene datasets to compare the 'dysfunctional' state of CD4 T cells in chronic infection vs acute infection. Also, a different illustration to demonstrate the module score analyses would be more intuitive.

      We have now included T cell “exhaustion” genesets from recently published data (Zander et. al 2019 Immunity), and we have also displayed the relative expression of select signature genes from these genesets in an updated supplemental figure 3.

      Also, at multiple sections in the manuscript, the authors are missing the accurate citations as they are still mentioned as '(Ref)'.

      We apologize for this oversight and have corrected these citations.

      Nevertheless, this study does not require major revisions.

      Reviewer #2 (Public Review):

      In their study "Delineating the transcriptional landscape and clonal diversity of virus-specific CD4+ T cells during chronic viral infection" Zander and co-workers analyze the phenotypic and clonotypic distributions of T cells specific to a LCMV epitope following infection with a chronic LCMV strain in mice. The paper largely follows an earlier study from the same group (Khatun JEM 2021) that has used a similar experimental strategy to analyze T cells responding to an LCMV strain establishing acute infection, and it adds a scTCRseq component to another earlier study of chronic LCMV (Zander Immunity 2022). The main contributions of the paper are to demonstrate that interesting differences between gene expression profiles between chronic and acute LCMV exist, and to identify a new T cell subset (of unknown functional significance).

      While the paper is framed around differences between T cell responses to acute and chronic infections, all analysis is done on T cells at day 10 post primary infection. At such an early time point even the acute LCMV strain virus is likely not completely cleared, or at the very least viral antigens are still presented. The relevance of the presented phenotypic differences to other settings with long-term chronic infection is thus questionable. Additionally, there are a number of methodological concerns regarding the robustness of the statistical and bioinformatic analyses that put in doubt some of the conclusions. Most notably, the analysis of fate biases needs to be substantiated by tests against baseline expectations from random assortment to test for statistical significance.

      We thank the reviewer for their careful review of our manuscript as well as their helpful comments.

      Regarding the day 10 time point-post LCMV Armstrong infection, several groups have previously reported that LCMV viral load is undetectable by day 10 post-infection (see one published example below), although we completely agree with the reviewer that there is still likely to be viral antigens being presented at this time point, as well as ongoing inflammation, which we believe (and as discussed further below) is actually a strength of the study as it allows for a more fair comparison of the transcriptional state of recently stimulated virus-specific CD4 T cells under different contexts (acute vs chronic LCMV infection) . We chose day 10 post LCMV Cl13 and LCMV Armstrong infections as the timepoint for analysis, as this is approximately the peak of the endogenous Gp66-77 CD4+ T cell response (see previously published data below), and is also when there is a more balanced distribution of Th1, Tfh, and T central memory precursor (Tcmp)/ or memory-like cells in these settings, thereby allowing for sufficient numbers of cells/cluster to conduct an in-depth analysis and high-resolution comparison of these subsets between the two different infections. Further, as some degree of TCR stimulation is still likely being experienced at this timepoint during LCMV Armstrong infection, we believe that this is a more useful comparison than at a memory time point (when CD4 T cells are in a quiescent state) as it gives us a better picture of the differentially expressed genes at the peak of the CD4 T cell response, and also provides insight into how chronic viral infection perturbs the transcriptional program of CD4 T cells.

    1. Author Response

      Reviewer #1 (Public Review):

      Several questions have remained regarding the characteristics of these cells:

      1) Based on the transcriptome data in Figure 2, the authors inferred that thymic macrophages are "specialized in lysosome degradation of phagocytosed material and antigen presentation" yet did not show functional data to support these claims. Functional assays such as phagocytosis and antigen presentation are desirable, especially in comparison to other well characterized macrophage populations.

      We agree with the reviewer that additional functional characterization of thymic macrophages will strengthen the conclusions of our manuscript. We have performed antigen presentation assay and in vitro phagocytosis assay to functionally characterize the thymic macrophages. Indeed, thymic macrophages seem to be quite good antigen presenting cells – not as good as thymic DCs, but much better than peritoneal macrophages. This is documented in Fig. 3A and B. They were also good phagocytes both in vitro and in vivo as demonstrated in Fig. 3C-G. Surprisingly, peritoneal macrophages were better in the in vitro phagocytosis assay. We attribute this result to thymic macrophages’ poor survival during the sorting and in vitro culture.

      2) Do transcriptomes of CX3CR1+ thymic macrophages in old mice significantly differ from those of young mice?

      This is a very interesting question that we plan to explore in the future, but we feel it is beyond the scope of the current manuscript.

      3) It would be helpful to better graphically show the compositions (both cell number and cell ratio) of thymic macrophage subsets (TIM4+, CX3CR1+, and others) in mice at different ages (1 week, 6 weeks, and 4 months old). It is not straightforward to deduce all the information based on the current data presentation.

      We thank the reviewer for the suggestion! Plotting the cell numbers did reveal a peak in young age and then significant decline in the number of Tim4+ cells and a trend for accumulation of Tim4+ cells with age. Unfortunately, older mice show great variability in thymus size, which prevented the Tim4- result from being statistically significant. We have added these data to Fig. 8F.

      4) The description of the gating strategy of thymic macrophages for Figure 1 is quite verbose. Adding a step-wise gating strategy of thymic macrophages as a figure panel would be helpful for readers to follow the experimental details.

      We thank the reviewer for the suggestion. The description of the gating strategy has been stripped to 2 panels that capture its essence (Fig. 1B).

      Reviewer #2 (Public Review):

      This work provides by far the most thorough characterization of thymic macrophages. The authors used bulk RNA-seq, single-cell seq and fate mapping animal models to demonstrate the phenotype, origin and diversity of thymic macrophages. Overall the manuscript is well written and the conclusions of the paper are mostly well supported by data.

      Some aspects of data acquisition and data analysis need to be clarified.

      1) the authors should state what does row min row max in figure2 b,d refer to. is this expression value on log scale? In figure 2d, the authors compared their own RNAseq data with ImmGen seq data, what kind of normalization did the authors apply?

      We appologize for not making this clear. The values in Fig. 2b and d (current Fig. 2A and C) are expression values on log scale. We have included this information in the figure.

      Our data is part of the IMMGEN dataset. We sorted the cells and sent them to the US for RNA sequencing. That is why we referred to it as “our” data. However, to avoid confusion we changed the wording to clearly reflect that the data are from IMMGEN.

      2)The authors used immunofluorescent to identify the localization of two populations of macrophages, where they used merTK staining to indicate all macrophages. However, MerTK expression may not restrict to immune cells. The authors are encouraged to confirm that MerTK only labels macrophages in thymus by co-staining with F4/80 or CD45. Tim4 can also be used in immunofluorescence.

      We agree that staining with additional macrophage markers will strengthen our conclusions about ThyMacs localization. We have performed staining with CD64 together with MerTK or Tim4. CD64 and MerTK almost completely overlapped and so did CD64 and Tim4 in the cortex. We could not stain MerTK and Tim4 together because the antibodies are raised in the same species (rat). Additional evidence for the specificity of these markers for thymic macrophages comes from Fig. 3E and F showing the high degree of co-localization of apoptotic cells (TUNEL+) with MerTK or Tim4. Finally, Fig. 4 figure supplement 1 also clearly shows the distribution of TIM4 and CD64 in the whole thymus.

      3) The data of Cx3cr1+ cells accumulation with age in thymus is very interesting, and as the author has discussed, might indicate their contribution to thymus involution. However, the authors only showed change of percentage. As the total macrophages numbers decreased with age, it is not clear whether these cells actually "accumulate" with age. It will help us to assess if this increased percentage of Cx3Cr1+ cells is an actual increase of "influx" or due to the decrease of the self-maintain Tim4+ macrophage subsets.

      The reviewer is raising a very important point. As the changes in the Tim4+ and Tim4- thymic macrophages proportions with age occur at the background of thymic involution, it is difficult to judge whether Tim4+ cells self-maintain and whether Tim4- cells accumulate. Plotting the cell numbers revealed a peak in young age and then significant decline in the number of Tim4+ cells and a trend for accumulation of Tim4+ cells with age. Unfortunately, older mice show great variability in thymus size, which prevented the Tim4- result from being statistically significant. We have added these data to Fig. 8F.

      Reviewer #3 (Public Review):

      This study by Zhou et al. focuses on thymic macrophages and shows that two populations can be distinguished with different identities, localization and origin. Authors use several murine reporter and fate-mapping models, coupled with flow cytometry and transcriptomics approach to support their claims.

      Overall, the question tackled by this study is interesting, thymic macrophages having a bit being forgotten in the last decade which has seen many studies similar to the one presented here in other organs. So, the stated aim to closing this gap is relevant. But the actual version of the study suffers from many defects, more or less severe, which affect the clarity and the persuasiveness of it.

      • About the plan, authors study the origin of the thymic population and provide data in fig 2, 3 & 4 assuming that thymic macs form a homogeneous population. But from fig 5, they distinguish 2 populations and study them separately. So the end of the paper renders obsolete the beginning, that asks for a revision of the whole plan.

      We agree with the reviewer that there is more than one way to tell this story and we have been agonizing over our plan. However, we respectfully disagree that the beginning of the paper is made obsolete by the ending for several reasons:

      1) The initial figures in our manuscript contain very fundamental characterizaition of ThyMacs. Just as the revelation of a heterogeneity in liver macrophages or lung macrophages (ref) does not render all prior research on these cells obsolete, the initial figures in our manuscript are an essential part of the story. Such data are available for all other studied tissue resident macrophage populations. Removing them will be a disservice to the community.

      2) Another reviewer asked for deeper characterization of ThyMacs based on the data in Fig. 2. Accommodating this request will be very difficult if we remove this part.

      Nevertheless, we agree that ThyMacs heterogeneity is the central claim of the manuscript and should be introduced earlier. Now, the original figure 5 (current Fig. 4) that described the heterogeneity has been moved before the original figures 3 and 4 (current Fig. 5 and 6). Additional analyses distinguishing Tim4+ and Tim4- ThyMacs has been incorporated in current Fig. 5 and 6.

      • The figure 1 is not very clear. The backgating should be added in 1a. Or why not using the color map axis mode from FlowJo to show 3 parameters at a glance? The gating strategy should be more clearly displayed on the figure. On fig 1S3, there are clearly 2 pops in the CX3CR1-GFP mice. Why not starting from this to introduce the two populations?

      We thank the reviewer for the suggestion. We have included a color map axis to show MerTK, CD64, and F4/80 in one plot. The description of the gating strategy has been stripped to 2 panels that capture its essence. \We agree that there are several indications for heterogeneity among thymic macrophages, starting with Fig. 1E – the expression of Tim4, and Fig S4c – the expression of CX3CR1-GFP. We have added extra text at the beginning of the paragraph describing current Fig. 4 to point out these facts.

      • The figure 2 could be revised also. First, the panel 2a is useless and should be removed. A PC analysis of all the macs would be more useful here. Also, the color code used for the genes is confusing. Why genes up in ThyMacs are red in 2b but only half of them in 2d? Info can be found in the legend but it should be more clear on a graphical point of view.

      We have revised Fig. 2 according to the reviewer’s suggestions. The PCA analysis is consistent with the hierarchical clustering and shows that splenic and liver macrophages are most closesly related to ThyMacs. We agree that the presence of red in both heatmaps is confusing and we have changed the color code – color was removed from current Fig. 2A but retained in Fig. 2C.

      • For figure 3, what is the timepoint of the panel 3b? Here, authors should show microglia and ThyMacs for both timepoints and conclude based on the comparison. If ThyMacs are as stable as the microglia, no replacement. If not, replacement. For the panel 3f, n=3 is too low to be convinced notably with the standard variation here. And displaying the dot plot with 11% of blood mono from donor while the median being around 20 is not fair, authors should present the most representative plot. For the panel 3h, there are more GFP (in term of MFI) for TEC and ThyMacs than for total cells. How is it possible? TECs and ThyMacs should be in the total cells? Or the gating is not clear enough?

      We thank the reviewer for pointing our omissions. Fig. 3b (current Fig. 5B) is from E19.5 and we have added this information to the figure. We also agree that in Fig. 3f (current Fig. 5F) the sample number is too small and the variation too large to make solid conclusions. That is why we have repeated the partial chimeras experiment trying to irradiate as much as possible of the mice without affecting the thymus. We have substituted the data in the Fig. 3e and 3f with the new data. For Fig. 3h, we appologize for not labeling the data clearly. The panels labeled “single, live cells” should be labeled as “thymocytes” as they were obtained without enzymatic digestion that is essential for both TECs and ThyMacs. However, we found an important caveat in the thymus transplant experiment. It appeared that some of the thymus macrophages were GFP positive not because they express GFP but because they have engulfed GFP+ cells. As a result our experiments with embryonic GFP+ thymus transplants overestimate the percentage of donor-derived ThyMacs (all of them were GFP+). We have repeated the thymus transplantation experiments with congenically marked thymuses (CD45.2 donor and CD45.1 host). While this set up did not allow us to use the thymic epithelial cells as positive control because they are CD45-, we did identify host-derived ThyMacs, consistent with Tim4- cells originating from adult HSCs. Thus, we have replaced the previous data in Fig. 3H and 3I with current figures 5H and 5I.

      • For figure 4, the EdU staining (4e) is not convincing at all. The signal is very low (as compared to 4c for example.

      We agree that signal after 21d chase is a lot weaker than after 2 h (Fig. 4c) or 21d (Fig. 4e) of EdU pulse. The reason we decided to keep this data is that: 1) the thymocytes also have much lower EdU staining after 21d chase compared to 2h and 21d of EdU pulse; 2) The results from EdU staining are very consistent with the data from Ki67 staining, cell cycle analysis, and scRNA-Seq revealing a small population (~5%) of cycling ThyMacs.

      • For figure 7, the interpretation of the data and the way to present them are not clear. Authors use an inducible fate-mapping model. The fact that Tim4- loose their signal with time argue for a replacement by non-labelled cells (blood monocytes) whereas Tim4+ ones are stable meaning they self-maintain. It is what authors claim. But how it fits with previous data where they say that Tim4+ derived form CX3CR1+? The explanation that is a bit subtended here but not enough clearly shown is that CX3CR1+ give rise to Tim4+ during embryonic development but is stops after, Tim4 self-renew independently, and CX3CR1+ are slowly replaced by monocytes. As this is the central claim of the paper, it should be most clearly reported and for this, a substantial change of the whole plan is required.

      We thank the reviewer for pointing out the need for better explanation. The maintenance of the different populations of ThyMacs is indeed complex and proceeds in different ways in the different periods of life. We have added some extra data to Fig. 7 (current Fig. 8) that we hope will add some clarity to the maintenance of thymic macrophages with age. The new Fig. 8F shows the dynamics of the cell numbers of Tim4+ and Tim4- macrophages with age. Tim4+ cells reach a peak in young mice and decline significantly as mice age. So, we do not think that they are self-maintaining but instead, undergo slow attrition with very limited replacement. These results are consistent with Fig. 6I showing low levels of Mki67 in Tim4+ cells. Tim4- are a different story: they progressively accumulate with age. Although the variability in thymus size and Tim4- macrophages in very old mice is too great for the data to reach significance, the trend is clear.

      As for the dynamics of the populations in the embryonic period, we added data formally demonstrating that TIM4+CX3XR1- are derived from CX3CR1+ cells by fate mapping (Fig. 7E-G). We induced re-combination in pregnant ROSA26LSL-GFP mice pregnant from Cx3cr1CreER males at E15.5 when almost all ThyMacs are Cx3cr1+ (Fig. 7A). Just before birth, at E19.5, we could find a substantial proportion of TIM4+CX3CR1- cells among the fate mapped GFP+ macrophages, indicating that Cx3cr1+ cells, indeed, give rise to TIM4+CX3CR1- cells. As pointed out before, this pathway gets exhausted by the first week after birth – at d7 all ThyMacs are TIM4+.

    1. Author Response

      Reviewer #1 (Public Review):

      High resolution mechanistic studies would be instrumental in driving the development of Cas7-11 based biotechnology applications. This work is unfortunately overshadowed by a recent Cell publication (PMID: 35643083) describing the same Cas7-11 RNA-protein complex. However, given the tremendous interest in these systems, it is my opinion that this independent study will still be well cited, if presented well. The authors obviously have been trying to establish a unique angle for their story, by probing deeper into the mechanism of crRNA processing and target RNA cleavage. The study is carried out rigorously. The current version of the manuscript appears to have been rushed out. It would benefit from clarification and text polishing.

      We thank the reviewer for the positive and helpful comments that have made the manuscript more impactful.

      To summarize the revisions, we have resolved the metal-dependence issue, updated the maps in both main and supplementary figures that support the model, re-organized the labels for clarity, and added the comparison between our and Kato et al.’ structures.

      In addition, we describe a new result with an isolated C7L.1 fragment that retains the processing and crRNA binding activities.

      Reviewer #2 (Public Review):

      In this manuscript, Gowswami et al. solved a cryo-EM structure of Desulfonema ishimotonii Cas7-11 (DiCas7-11) bound to a guiding CRISPR RNA (crRNA) and target RNA. Cas7-11 is of interest due to its unusual architecture as a single polypeptide, in contrast to other type III CRISPR-Cas effectors that are composed of several different protein subunits. The authors have obtained a high-quality cryo-EM map at 2.82 angstrom resolution, allowing them to build a structural model for the protein, crRNA and target RNA. The authors used the structure to clearly identify a catalytic histidine residue in the Cas7-11 Cas7.1 domain that is important for crRNA processing activity. The authors also investigated the effects of metal ions and crRNAtarget base pairing on target RNA cleavage. Finally, the authors used their structure to guide engineering of a compact version of Cas7-11 in which an insertion domain that is disordered in the cryo-EM map was removed. This compact Cas7-11 appears to have comparable cleavage activity to the full-length protein.

      The cryo-EM map presented in this manuscript is generally of high quality and the manuscript is very well illustrated. However, some of the map interpretation requires clarification (outlined below). This structure will be valuable as there is significant interest in DiCas7-11 for biotechnology. Indeed, the authors have begun to engineer the protein based on observations from the structure. Although characterization of this engineered Cas7-11 is limited in this study and similar engineering was also performed in a recently published paper (PMID 35643083), this proof-of-principle experiment demonstrates the importance of having such structural information.

      The biochemistry experiments presented in the study identify an important residue for crRNA processing, and suggest that target RNA cleavage is not fully metal-ion dependent. Most of these conclusions are based on straightforward structure-function experiments. However, some results related to target RNA cleavage are difficult to interpret as presented. Overall, while the cryo-EM data presented in this work is of high quality, both the structural model and the biochemical results require further clarification as outlined below.

      We thank the reviewer for the positive and helpful comments that have made the manuscript more impactful.

      To summarize the revisions, we have resolved the metal-dependence issue, updated the maps in both main and supplementary figures that support the model, re-organized the labels for clarity, and added the comparison between our and Kato et al.’ structures.

      In addition, we describe a new result with an isolated C7L.1 fragment that retains the processing and crRNA binding activities.

      1. The DiCas7-11 structure bound to target RNA was also recently reported by Kato et al. (PMID 35643083). The authors have not cited this work or compared the two structures. While the structures are likely quite similar, it is notable that the structure reported in the current paper is for the wild-type protein and the sample was prepared under reactive conditions, resulting in a partially cleaved target. Kato et al. used a catalytically dead version of Cas7-11 in which the target RNA should remain fully intact. Are there differences in the Cas7-11 structure observed in the presence of a partially cleaved target RNA in comparison to the Kato et al. structure? Such a comparison is appropriate given the similarities between the two reports. A figure comparing the two structures could be included in the manuscript.

      We have added a paragraph on page 12 that describe the differences in preparation of the two complexes and their structures. We observed minor differences in the overall protein structure (r.m.s.d. 0.918 Å for 8114 atoms) but did observe quite different interactions between the protein and the first 5’-tag nucleotide (U(-15) vs. G(-15)) due to the different constructs in pre-crRNA, which suggests an importance of U(-15) in forming the processing-competent active site. We added Figure 2-figure supplementary 3 that illustrates the similarities and the differences.

      2.The cryo-EM density map is of high quality, but some of the structural model is not fully supported by the experimental data (e.g. protein loops from the alphafold model were not removed despite lack of cryo-EM density). Most importantly, there is little density for the target RNA beyond the site 1 cleavage site, suggesting that the RNA was cleaved and the product was released. However, this region of the RNA was included in the structural model. It is unclear what density this region of the target RNA model was based on. Further discussion of the interpretation of the partially cleaved target RNA is necessary. Were 3D classes observed in various states of RNA cleavage and with varied density for the product RNAs?

      We should have made it clear in the Method that multiple maps were used in building the structure but only submitted the post-processed map to reviewers. When using the Relion 4.0’s local resolution estimation-generated map, we observed sufficient density for some of the regions the reviewer is referring to. For instance, the site 1 cleavage density does support the model for the two nucleotides beyond site 1 cleavage site (see the revised Figure 1 & Figure 1- figure supplement 3).

      However, there are protein loops that remain lack of convincing density. These include 134141 and 1316-1329 that are now removed from the final coordinate.

      The “partially cleaved target RNA” phrase is a result of weak density for nucleotides downstream of site 1 (+2 and +3) but clear density flanking site 2. This feature indicates that cleavage likely had taken place at site 1 but not site 2 in most of the particles went into the reconstruction. To further clarify this phrase, we added “The PFS region plus the first base paired nucleotide (+1*) are not observed.” on page 4 and better indicate which nucleotides are or are not built in our model in Figure 1.

      1. The authors argue that site 1 cleavage of target RNA is independent of metal ions. This is a potentially interesting result, but it is difficult to determine whether it is supported by the evidence provided in the manuscript. The Methods section only describes a buffer containing 10 mM MgCl2, but does not describe conditions containing EDTA. How much EDTA was added and was MgCl2 omitted from these samples? In addition, it is unclear whether the site 1 product is visible in Figures 2d and 3d. To my eye, the products that are present in the EDTA conditions on these gels migrated slightly slower than the typical site 1 product. This may suggest an alternate cleavage site or chemistry (e.g. cyclic phosphate is maintained following cleavage). Further experimental details and potentially additional experiments are required to fully support the conclusion that site 1 cleavage may be metal independent.

      As we pointed out in response to Reviewer 1’s #8 comment, this conclusion may have been a result of using an older batch of DiCas7-11 that contains degraded fragments.

      As shown in the attached figure below, “batch Y” was an older prep from our in-house clone and “batch X” is a newer prep from the Addgene purchased clone (gel on right), and they consistently produce metal-independent (batch Y) or metal-dependent (batch X) cleavage (gel on left). It is possible that the degraded fragments in batch Y carry a metal-independent cleavage activity that is absent in the more pure batch X.

      We further performed mass spectrometry analysis of two of the degraded fragments from batch Y (indicated by arrows below) and discovered that these are indeed part of DiCas7-11. We, however, cannot rationalize, without more experimental evidence, why these fragments might have generated metal-independent cleavage at site 1. Therefore, we simply updated all our cleavage results from the new and cleaner prep (batch X) (For instance, Figure 3c). As a result, all references to “metal-independence” were removed.

      With regard to the nature of cleaved products, we found both sites could be inhibited by specific 2’-deoxy modifications, consistent with the previous observation that Type III systems generate a 2’, 3’-cyclic product in spite of the metal dependence (for instance, see Hale, C. R., Zhao, P., Olson, S., Duff, M. O., Graveley, B. R., Wells, L., ... & Terns, M. P. (2009). RNA-guided RNA cleavage by a CRISPR RNA-Cas protein complex. Cell, 139(5), 945-956.)

      We added this rationale based on the new results and believe that these characterizations are now thorough and conclusive

      1. The authors performed an experiment investigating the importance of crRNA-target base pairing on cleavage activity (Figure 3e). However, negative controls for the RNA targets in the absence of crRNA and Cas7-11 were not included in this experiment, making it impossible to determine which bands on the gel correspond to substrates and which correspond to products. This result is therefore not interpretable by the reader and does not support the conclusions drawn by the authors.

      Our original gel image (below) does contain these controls but we did not include them for the figure due to space considerations (we should have included it as a supplementary figure). We have now completely updated Figure 3e with much better quality and controls. Both the older and the updated experiments show the same results.

      Original gel for Figure 3e containing controls.

    1. Author Response

      Reviewer #1 (Public Review):

      This is a very interesting paper showing that postsynaptic bursts in the presence of dopamine produce input-specific LTP in hippocampal synapses 10 minutes after they were primed with negatively coincident pre- and postsynaptic activity. LTP requires NMDAR activation during priming and involves a cAMP-PKA cascade and protein synthesis. When this synaptic rule is incorporated into a computational model, reinforced learning is possible through selective reactivation of neurons. Experiments in behaving mice confirmed that neurons reactivated after an exploratory period display more activity than non-reactivated neurons.

      We thank the Reviewer for their positive comments on our manuscript. We have incorporated the Reviewer‘s suggestions.

      Reviewer #2 (Public Review):

      Building on their previous 2015 study with Brzosko, Fuchsberger et al. propose a potential solution for how the brain associates with memory events that are separated in time. The authors find that in the presence of dopamine, postsynaptic bursts produce input-specific LTP at hippocampal CA3-CA1 synapses ten minutes after priming with a post-before-pre spiking-pairing protocol. They explore the signalling somewhat, for example showing a need for postsynaptic NMDARs as well as for protein synthesis. Using a computer model, they find that this form of plasticity enables reinforcement learning. A few key predictions were verified using an in-vivo spatial learning model.

      This is a strong study that addresses a long-standing fundamental problem in modern neuroscience research, namely the temporal credit assignment problem of how temporally well-separated signals can be meaningfully associated and learned in the brain. The experiments are carefully executed, the rationale is clearly explained, and - excepting Fig 6-8 - the figures are for the most part easy to understand. The study ranges from in-vitro electrophysiology across computer modelling to awake-behaving in-vivo experiments to persuasively argue that their novel findings may provide a candidate solution to the temporal credit assignment problem. Taken at face value, this work is likely to be highly impactful, however, some control experiments were missing or are perhaps just not shown (e.g., stability, stability in the presence of anisomycin, the effect of anisomycin on firing, and similar), which makes the validity of the findings a bit hard to evaluate at times.

      We thank the Reviewer for their positive evaluation of our study and address all the points raised below.

      Reviewer #3 (Public Review):

      Fuchsberger et al. demonstrate that an otherwise LTD-inducing STDP protocol can produce LTP if followed by burst reactivation of post-synaptic neurons in the presence of dopamine. Using computational modeling and single-photon imaging in the CA1 in mice, they propose these findings are relevant to spatial over-representation at a reward location.

      This is a follow-up of the two previous studies from the same group (Brzosko et al., 2015 and Andrade-Talavera et al., 2016) where they showed a post-before-pre STDP protocol, which by default induces a (pre-synaptic) LTD, will induce synaptic potentiation in the presence of dopamine and continuous synaptic activity. The main conceptual difference between this manuscript and these previous studies is that continuous synaptic activity can be replaced by post-synaptic burst. This means that reactivation of post-synaptic neurons without any further pre-synaptic instruction is sufficient for successful LTP induction.

      Mechanistically, the two protocols (continuous vs burst activation) appear to be similar (but not identical). For example, both require the activation of post-synaptic NMDAr during STDP pairing, and both depend on the AC/PKA pathways. Additionally, there are two new observations here: The activity of voltage-gated calcium channels during bursting is required for potentiation; the burst-induced potentiation also requires protein synthesis.

      The evidence provided at this stage is strong.

      Major point:

      It is not clear to me how the STDP studied here relates to the next part of the study, the reward-based navigation task. My interpretation is that the authors consider the activity before reaching the reward location (approaching time) as resembling the STDP priming protocol, the activity at the reward location as equivalent to the bursting protocol, and consumption of the reward as similar to dopamine application. If so, what is the circumvential evidence that the activity during the approach induces any form of plasticity?

      The link between the two is not obvious and I see the manuscript as two interesting but not naturally linked stories.

      The Reviewer’s interpretation is correct. We considered the activity during navigation on the maze as the animal approaches the reward resembling the STDP priming protocol. Substantial evidence supports a role of NMDAR-dependent STDP in the formation of place fields during navigation (Mehta, Hippocampus 2015; Moore et al., 2021). It has been postulated that both LTP and LTD are involved in place field formation. This was based on the observation that place fields shift backwards with experience (Mehta & McNaughton PNAS 1997), and a computational model predicted that without LTD place field broadening would occur (Mehta et al. Neuron 2000). Thus LTP is required when entering the place field, and LTD when the animal exits the place field (Mehta et al. Neuron 2000). This is specific to navigation, as opposed to just walking on a linear track without task, and place field plasticity is predictive of navigational performance (Moore et al. Nature 2021).

      We have added this to the Discussion section (page 13, line 344).

      Mehta MR. 2015. From synaptic plasticity to spatial maps and sequence learning. Hippocampus 25:756-62.<br /> Mehta MR, Quirk MC, Wilson MA. 2000. Experience-dependent asymmetric shape of hippocampal receptive fields. Neuron. 25: 707-15. Moore JJ, Cushman JD, Acharya L, Popeney B, Mehta MR. 2021. Linking hippocampal multiplexed tuning, Hebbian plasticity and navigation. Nature. 599: 442-448.

    1. Author Response

      Reviewer #1 (Public Review):

      In this paper, the authors ask a key question in the field of adult plasticity, and in particular, amblyopia treatment: whether transient dark exposure followed by light re-introduction disrupts neural representation for basic stimulus attributes in a manner that could negatively impact vision. Prior work by Rose and colleagues using calcium imaging showed that closing one eye in adult mice leaves the responsiveness of V1 neurons unchanged but alters their orientation preference and pairwise correlations; such representational drift may require downstream areas to adjust how they readout V1 signals. The question posed here is whether binocular visual deprivation in adult mice does the same. The authors use 2-photon calcium imaging in 6 awake, head-fixed [transgenic - GCaMP6f driven by the EMX1 promoter] mice before and after transient dark exposure to record ensemble responses of layer 2/3 excitatory V1 neurons to oriented gratings of varying spatial frequencies. Data were acquired twice at baseline (allowing for an assessment of representational drift during exposure to the natural [cage] environment), once immediately after 8 days of dark exposure and once about 8 days after animals were once again exposed to their natural [cage] environment.

      The study appears to be generally well designed with multiple analytical approaches trained on the same questions. Major strengths include the ability to analyze a large number of neuronal responses simultaneously in the awake-behaving state using calcium imaging in transgenic mice, and the ability to record activity in the same neurons across several weeks and following different behavioral manipulations. A relative weakness was the implication of only being able to elicit relevant visual responses from a small fraction of V1 neurons for comparison purposes. This begs the question of what may have happened to the neurons that were not tracked, and whether this in fact may have been significant.

      A consist finding across laboratories is that 30-50% of the neural population in rodent V1 is visually responsive to grating stimuli, and drifting gratings recruit neurons to a greater extent that static gratings1–5. This is unrelated to tracking, as it is the case for single-session analysis. The reviewer brings up an interesting question, given we are tracking neurons across sessions we are in a unique position to gain insight into properties that might correlate with responsiveness. To that end, we performed additional analysis to determine whether low trial reliability is predictive of whether a specific neuron will ‘drop out’ from being visually responsive on a subsequent session. The new analysis shows that under control conditions, trial reliability is correlated with reliability on the subsequent session. Consistent with our observation that reliability across the population decreases following dark exposure and then improves during light reintroduction, the new analysis also shows that the change in reliability for individual neurons is significantly skewed to lower values in the DE condition (single-sample KS test), while in the light reintroduction condition values are significantly skewed in the positive direction (Figure 3 – figure supplement 3A).

      1. Ohki, K., Chung, S., Ch’ng, Y. H., Kara, P. & Reid, R. C. Functional imaging with cellular resolution reveals precise micro-architecture in visual cortex. Nature 433, 597–603 (2005).
      2. Montijn, J. S., Meijer, G. T., Lansink, C. S. & Pennartz, C. M. A. Population-Level Neural Codes Are Robust to Single-Neuron Variability from a Multidimensional Coding Perspective. Cell Rep. 16, 2486–2498 (2016).
      3. Ko, H., Mrsic-Flogel, T. D. & Hofer, S. B. Emergence of feature-specific connectivity in cortical microcircuits in the absence of visual experience. J. Neurosci. 34, 9812–9816 (2014).
      4. Jeon, B. B., Swain, A. D., Good, J. T., Chase, S. M. & Kuhlman, S. J. Feature selectivity is stable in primary visual cortex across a range of spatial frequencies. Sci. Rep. 8, 15288 (2018).
      5. de Vries, S. E. J. et al. A large-scale standardized physiological survey reveals functional organization of the mouse visual cortex. Nat. Neurosci. 23, 138–151 (2020).

      Reviewer #3 (Public Review):

      This paper uses transient dark exposure to induce plasticity in the adult visual cortex. It shows that transient dark exposure in the adult mice has opposing effects at the single neuronal level versus the population level. At the population level, the stimulus representation is degraded following dark exposure but rebounds back to normal within 8 days of light re-introduction. Thus, dark exposure does not have a lasting negative impact on the visual cortex. Unexpectedly, at the single neuronal level, following dark exposure a fraction of neurons show more stable responses and higher correlations among pairs of neurons. It is inspiring to hypothesize that this fraction of neurons may form a plastic substrate for representation of complex natural scenes.

      Strengths:

      The paper uses a combination of single neuron and population analyses to identify the effects of transient dark exposure on visual responses in the adult mouse visual cortex. It succeeds in identifying degradation of stimulus representation at the population level following dark exposure, and stabilization of visual stimulus preference at the single neuron level as well as stabilization of stimulus correlations among pairs of neurons. This success is in part due to an impressively large set of simple visual stimuli used (180 different stimuli). This large set allows the authors to identify even small changes in stimulus preferences at the single neuronal level. This paper uses transient dark exposure to induce plasticity. An alternative and commonly used method to induce plasticity is monocular deprivation. This paper shows that at the single neuron level, the effects of transient dark exposure are different from the previously reported effects of monocular deprivation. This is an important finding for the field.

      Weaknesses:

      The analysis methods used are thoughtful and complementary. The statistical tests are mostly performed on visual responses pooled across 6 mice. These statistical tests support the claims of the paper. However, we are left wondering whether the effects identified would also be significant for visual responses of each individual mouse.

      Further analysis of individual mice is now included. From this analysis we can verify that the effects observed are not driven by one or two animals, rather are representative of the majority of the animals included in the study.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors' results revolutionize our understanding of the mechanism of arrestin-mediated GPCR internalization. They identified previously unknown elements on the non-receptor-binding side of arrestins participating in the process. The findings are ground-breaking and very important to the large field of GPCR signaling.

      We are pleased that the reviewer appreciates the significance of our findings. We appreciate the important critiques and corrections, and have done our best to address them.

      Reviewer #2 (Public Review):

      This manuscript from the Von Zastrow laboratory proposes an additional site on Beta-arrestin2 (arrestin 3) to the well characterised Ctail (AP-2+clathrin binding) is responsible in significant part for the downregulation and likely onward signalling from endosomes of a range of GPCRs. The cell biology appears to me to be thoroughly carried out and data presented in a statistically appropriate manner.

      The conclusions made seem appropriate and justified although considerably more information could be extracted with little extra effort I think - including formerly proving that internalisation is by CME by using CME-specific CME inhibitors or inhibitory constructs.

      The major weakness is the lack of mechanistic information, most specifically what does the Clobe bind to in order to allow Beta arrestin2 incorporation into CCVs?

      The referencing of the relevant literature is sometimes careless or inappropriate, especially with respect to CME.

      We are pleased that the reviewer found our conclusions generally appropriate and well-justified. The reviewer is correct that we presently do not know the interaction(s) responsible for CLB activity. We have addressed the reviewer’s critiques with new data and / or changes to the text as follows.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors ask an interesting question as to whether working memory contains more than one conjunctive representation of multiple task features required for a future response with one of these representations being more likely to become relevant at the time of the response. With RSA the authors use a multivariate approach that seems to become the standard in modern EEG research.

      We appreciate the reviewer’s helpful comments on the manuscript and their encouraging comments regarding its potential impact.

      I have three major concerns that are currently limiting the meaningfulness of the manuscript: For one, the paradigm uses stimuli with properties that could potentially influence involuntary attention and interfere in a Stroop-like manner with the required responses (i.e., 2 out of 3 cues involve the terms "horizontal" or "vertical" while the stimuli contain horizontal and vertical bars). It is not clear to me whether these potential interactions might bring about what is identified as conjunctive representations or whether they cause these representations to be quite weak.

      We agree it is important to rule out any effects of involuntary attention that might have been elicited by our stimulus choices. To address the Reviewer’s concern, we conducted control analyses to test if there was any influence of Stroop-like interference on our measures of behavior or the conjunctive representation. To summarize these analyses (detailed in our responses below and in the supplemental materials), we found no evidence of the effect of compatibility on behavior or on the decoding of conjunctions during either the maintenance or test periods. Furthermore, we found that the decoding of the bar orientation was at chance level during the interval when we observe evidence of the conjunctive representations. Thus, we conclude that the compatibility of the stimuli and the rule did not contribute to the decoding of conjunctive representations or to behavior.

      Second, the relatively weak conjunctive representations are making it difficult to interpret null effects such as the absence of certain correlations.

      The reviewer is correct that we cannot draw strong conclusions from null findings. We have revised the main text accordingly. In certain cases, we have also included additional analyses. These revisions are described in detail in response the reviewer’s comments below.

      Third, if the conjunctive representations truly are reflections of working memory activity, then it would help to include a control condition where memory load is reduced so as to demonstrate that representational strength varies as a function of load. Depending on whether these concerns or some of them can be addressed or ruled out this manuscript has the potential of becoming influential in the field.

      This is a clever suggestion for further experimentation. We agree that observing the adverse effect of memory load is one of the robust ways to assess the contributions of working memory system for future studies. However, given that decoding is noisy during the maintenance period (particularly for the low-priority conjunctive representation) even with a relatively low set-size, we expect that in order to further manipulate load, we would need to alter the research design substantially. Thus, as the main goal of the current study is to study prioritization and post-encoding selection of action-related information, we focused on the minimum set-size required for this question (i.e., load 2). However, we now note this load manipulation as a direction for future research in the discussion (pg. 18).

      Reviewer #2 (Public Review):

      Kikumoto and colleagues investigate the way visual-motor representations are stored in working memory and selected for action based on a retro-cue. They make use of a combination of decoding and RSA to assess at which stages of processing sensory, motor, and conjunctive information (consisting of sensory and motor representations linked via an S- R mapping) are represented in working memory and how these mental representations are related to behavioral performance.

      Strengths

      This is an elaborate and carefully designed experiment. The authors are able to shed further light on the type of mental representations in working memory that serve as the basis for the selection of relevant information in support of goal- directed actions. This is highly relevant for a better understanding of the role of selective attention and prospective motor representations in working memory. The methods used could provide a good basis for further research in this regard.

      We appreciate these helpful comments and the Reviewer’s positive comments on the impact of the work.

      Weaknesses

      There are important points requiring further clarification, especially regarding the statistical approach and interpretation of results.

      • Why is there a conjunction RSA model vector (b4) required, when all information for a response can be achieved by combining the individual stimulus, response, and rule vectors? In Figure 3 it becomes obvious that the conjunction RSA scores do not simply reflect the overlap of the other three vectors. I think it would help the interpretation of results to clearly state why this is not the case.

      Thank you for the suggestion, we’ve now added the theoretical background that motivates us to include the RSA model of conjunctive representation (pg. 4 and 5). In particular, several theories of cognitive control have proposed that over the course of action planning, the system assembles an event (task) file which binds all task features at all levels – including the rule (i.e., context), stimulus, and response – into an integrated, conjunctive representation that is essential for an action to be executed (Hommel 2019; Frings et al. 2020). Similarly, neural evidence of non-human primates suggests that cognitive tasks that require context-dependency (e.g., flexible remapping of inputs to different outputs based on the context) recruit nonlinear conjunctive representations (Rigotti et al. 2013; Parthasarathy et al. 2019; Bernardi et al. 2020; Panichello and Buschman, 2021). Supporting these views, we previously observed that conjunctive representations emerge in the human brain during action selection, which uniquely explained behavior such as the costs in transition of actions (Kikumoto & Mayr, 2020; see also Rangel & Hazeltine & Wessel, 2022) or the successful cancelation of actions (Kikumoto & Mayr, 2022). In the current study, by using the same set of RSA models, we attempted to extend the role of conjunctive representations for planning and prioritization of future actions. As in the previous studies (and as noted by the reviewer), the conjunction model makes a unique prediction of the similarity (or dissimilarity) pattern of the decoder outputs: a specific instance of action that is distinct from others actions. This contrasts to other RSA models of low-level features that predict similar patterns of activities for instances that share the same feature (e.g., S-R mappings 1 to 4 share the diagonal rule context). Here, we generally replicate the previous studies showing the unique trajectories of conjunctive representations (Figure 3) and their unique contribution on behavior (Figure 5).

      • One of the key findings of this study is the reliable representation of the conjunction information during the preparation phase while there is no comparable effect evident for response representations. This might suggest that two potentially independent conjunctive representations can be activated in working memory and thereby function as the basis for later response selection during the test phase. However, the assumption of the independence of the high and low priority conjunction representations relies only on the observation that there was no statistically reliable correlation between the high and low priority conjunctions in the preparation and test phases. This assumption is not valid because non-significant correlations do not allow any conclusion about the independence of the two processes. A comparable problem appeared regarding the non-significant difference between high and low-priority representations. These results show that it was not possible to prove a difference between these representations prior to the test phase based on the current approach, but they do not unequivocally "suggest that neither action plan was selectively prioritized".

      We appreciate this important point. We have taken care in the revision to state that we find evidence of an interference effect for the high-priority action and do not find evidence for such an effect from the low-priority action. Thus, we do not intend to conclude that no such effect could exist. Further, although it is not our intention to draw a strong conclusion from the null effect (i.e., no correlations), we performed an exploratory analysis where we tested the correlation in trials where we observed strong evidence of both conjunctions. Specifically, we binned trials into half within each time point and individual subject and performed the multi-level model analysis using trials where both high and low priority conjunctions were above their medians. Thus, we selected trials in such a way that they are independent of the effect we are testing. The figure below shows the coefficient of associated with low-priority conjunction predicting high-priority conjunction (uncorrected). Even when we focus on trials where both conjunctions are detected (i.e., a high signal-to-noise ratio), we observed no tradeoff. Again, we cannot draw strong conclusions based on the null result of this exploratory analysis. Yet, we can rule out some causes of no correlation between high and low priority conjunctions such as the poor signal-to-noise ratio of the low priority conjunctions. We have further clarified this point in the result (pg. 14).

      Fig. 1. Trial-to-trial variability between high and low priority conjunctions, using above median trials. The coefficients of the multilevel regression model predicting the variability in trial-to-trial highpriority conjunction by low-priority conjunction.

      • The experimental design used does not allow for a clear statement about whether pure motor representations in working memory only emerge with the definition of the response to be executed (test phase). It is not evident from Figure 3 that the increase in the RSA scores strictly follows the onset of the Go stimulus. It is also conceivable that the emergence of a pure motor representation requires a longer processing time. This could only be investigated through temporally varying preparation phases.

      We agree with the reviewer. Although we detected no evidence of response representations of both high and low priority action plans during the preparation phase, t(1,23) = -.514, beta = .002, 95% CI [-.010 .006] for high priority; t(1,23) = -1.57, beta = -.008, 95% CI [-.017 .002] for low priority, this may be limited by the relatively short duration of the delay period (750 ms) in this study. However, in our previous studies using a similar paradigm without a delay period (Kikumoto & Mayr, 2020; Kikumoto & Mayr, 2022), response representations were detected less than 300ms after the response was specified, which corresponds to the onset of delay period in this study. Further, participants in the current study were encouraged to prepare responses as early as possible, using adaptive response deadlines and performance-based incentives. Thus, we know of no reason why responses would take longer to prepare in the present study. But we agree that we can’t rule this out. We have added the caveat noted above, as well as this additional context in the discussion (pg. 16-17).

      • Inconsistency of statistical approaches: In the methods section, the authors state that they used a cluster-forming threshold and a cluster-significance threshold of p < 0.05. In the results section (Figure 4) a cluster p-value of 0.01 is introduced. Although this concerns different analyses, varying threshold values appear as if they were chosen in favor of significant results. The authors should either proceed consistently here or give very good reasons for varying thresholds.

      We thank the reviewer for noting this oversight. All reported significant clusters with cluster P-value were identified using a cluster-forming threshold, p < .05. We fixed the description accordingly.

      • Interpretation of results: The significant time window for the high vs. low priority by test-type interaction appeared quite late for the conjunction representation. First, it does not seem reasonable that such an effect appears in a time window overlapping with the motor responses. But more importantly, why should it appear after the respective interaction for the response representation? When keeping in mind that these results are based on a combination of time-frequency analysis, decoding, and RSA (quite many processing steps), I find it hard to really see a consistent pattern in these results that allows for a conclusion about how higher-level conjunctive and motor representations are selected in working memory.

      Thank you for raising this important point. First, we fixed reported methodological inconsistencies such as the cluster P-value and cluster-forming threshold). Further, we fully agree that the difference in the time course for the response and conjunctive representations in the low priority, tested condition is unexpected and would complicate the perspective that the conjunctive representation contributes to efficient response selection. However, additional analysis indicates that this apparent pattern in the stimulus locked result is misleading and there is a more parsimonious explanation. First, we wish to caution that the data are relatively noisy and likely are influenced by different frequency bands for different features. Thus, fine-grained temporal differences should be interpreted with caution in the absence of positive statistical evidence of an interaction over time. Indeed, though Figure 4 in the original submission shows a quantitative difference in timing of the interaction effect (priority by test type) across conjunctive representation and response representation, the direct test of this four way interaction [priority x test type x representation type (conjunction vs. response), x time interval (1500 ms to 1850 ms vs. 1850 to 2100 ms)] is not significant, t(1,23) = 1.65, beta = .058, 95% CI [-.012 .015]). The same analysis using response-aligned data is also not significant, t(1,23) = -1.24, beta = -.046, 95% CI [-.128 .028]). These observations were not dependent on the choice of time interval, as other time intervals were also not significant. Therefore, we do not have strong evidence that this is a true timing difference between these conditions and believe this is likely driven by noise.

      Further, we believe the apparent late emergence of difference in two conjunctions when the low priority action is tested is more likely due to a slow decline in the strength of the untested high priority conjunction rather than a late emergence of the low priority conjunction. This pattern is clearer when the traces are aligned to the response. The tested low priority conjunction emerges early and is sustained when it is the tested action and declines when it is untested (-226 ms to 86 ms relative to the response onset, cluster-forming threshold, p < .05). These changes eventually resulted in a significant difference in strength between the tested versus untested low priority conjunctions just prior to the commission of the response (Figure 4 - figure supplement 1, the panel on right column of the middle row, the black bars at the top of panel). Importantly, the high priority conjunction also remains active in its untested condition and declines later than the untested low priority conjunction does. Indeed, the untested high priority conjunction does not decline significantly relative to trials when it is tested until after the response is emitted (Figure 4 - figure supplement 1, the panel on right column of the middle row, the red bars at the top of panel). This results in a late emerging interaction effect of the priority and test type, but this is not due to a late emerging low priority conjunctive representation.

      In summary, we do not have statistical evidence of a time by effect interaction that allows us to draw strong inferences about timing. Nonetheless, even the patterns we observe are inconsistent with a late emerging low priority conjunctive representation. And if anything, they support a late decline in the untested high priority conjunctive representation. This pattern of the result of the high priority conjunction being sustained until late, even when it is untested, is also notable in light of our observation that the strength of the high priority conjunctive representation interferes behavior when the low priority item is tested, but not vice versa. We now address this point about the timing directly in the results (pg. 15-16) and the discussion (pg. 21), and we include the response locked results in the main text along with the stimulus locked result including exploratory analyses reported here.

      Reviewer #3 (Public Review):

      This study aims to address the important question of whether working memory can hold multiple conjunctive task representations. The authors combined a retro-cue working memory paradigm with their previous task design that cleverly constructed multiple conjunctive tasks with the same set of stimuli, rules, and responses. They used advanced EEG analytical skills to provide the temporal dynamics of concurrent working memory representation of multiple task representations and task features (e.g., stimulus and responses) and how their representation strength changes as a function of priority and task relevance. The results generally support the authors' conclusion that multiple task representations can be simultaneously manipulated in working memory.

      We appreciate these helpful comments, and were pleased that the reviewer shares our view that these results may be broadly impactful.

    1. Author Response

      Reviewer #2 (Public Review):

      Zuber, et al. report structural and thermodynamic properties of 6 domains from the NusG superfamily of transcription factors, conserved in all kingdoms of life. This superfamily is characterized by an N-terminal NGN domain that binds RNA polymerase, affecting its activity. NGN domains are covalently linked to C-terminal domains (CTDs) that typically assume a single completely beta-sheet (KOW) fold. Recent work has shown, however, that one such domain, from E. coli RfaH, can switch from a completely alpha-helical fold into the all-beta-sheet KOW fold. Here, the authors identify a second fold-switching member of the NusG superfamily and investigate the physical basis of the dramatic switching transition by comparing thermodynamic and structural properties of fold-switching and single-folding CTDs.

      Strengths:

      To my knowledge, this is the first in-depth thermodynamic analysis of fold switching in the NusG protein family. One striking result is the stability difference between E. coli NusG (single-folding) and E. coli RfaH (fold-switching). It can be difficult to compare stabilities across organisms since their environments differ. For example, a fold-switching domain from a thermophile and single-folding domain from a mesophile might have similar stabilities. Clearer stability differences can be seen by comparing variants from the same species, which the authors show.

      The NMR experiments showing minor species in both fold-switching CTDs and one single-folding CTD suggest that the unfolded state plays an important role in fold switching. The 13C-alpha CEST experiments showing that the minor species E. coli RfaH CTD has helical character hints at a mechanism for how the RfaH CTD is poised to assume two different folds.

      Weaknesses:

      The thermodynamic and structural properties one single-fold domain (hSpt5-KOW) do not differ appreciably from a fold-switching domain, suggesting an incomplete mechanistic explanation of fold switching. Specifically, both the thermostabilities and the folding free energies of hSpt5-KOW (single-folding) and VcRfaH-KOW (fold-switching) were comparable. Furthermore, their 15N shift differences from CEST experiments (Figure 5 supplement 1B&C) appear similar. Thus, it is possible that the minor species of hSpt5-KOW has helical character like Ec- and VcRfaH. Furthermore, the secondary structure predictions showing hSpt5-KOW has largely beta-sheet propensities are suspect because the secondary structure predictions of MtNusG-KOW (single-folding) are inaccurate-they show helical propensities comparable to Ec- and VcRfaH (fold-switching, Figure 5 supplement 3). These propensities are not experimentally supported for MtNusG-KOW, indicating that predicted secondary structures are not always reliable.

      It is not clear why the authors state that the minor species of EcRfaH-KOW is in exchange between helical and completely unfolded conformations. The chemical shift differences in Figure 6A appear comparable, indicating one population.

      We agree that the chemical shift differences are similar (for both 15N and 13C). However, the increased 15N R2 values of the minor species indicate further exchange processes and together with the NMR-based chemical denaturation experiments our interpretation of this finding is that the minor species is an ensemble of largely unfolded species, some states of which are completely unfolded and some of which exhibit helical elements in regions 1 and 2. A detailed explanation is given in our reply to “Essential revisions #3”) and the manuscript has been modified to make clear which conclusions are experimentally proven and where we hypothesize.

    1. Author Response

      Reviewer #1 (Public Review):

      This is a well-done analysis using the very robust Swedish national population registry.

      The study strengths include large size, prolonged follow-up, and use of two comparison populations.

      Thank you for the encouraging comments on our study.

      The main limitations which need to be addressed by the authors are accounting for reverse causality, namely if a psychiatric illness (PI) developed before or about the same time as the CVD. The much steeper risk relationships early after a CVD event are so suggestive. Some further analyses to tease out those with clearly NO PI before CVD would be in order.

      Thank you for the comment. Previous studies have consistently reported an association between psychiatric disorders and CVD [1,2], thus, we agree that reverse causality may, in principle, explain some of the observed results indicating a rise in incident psychiatric disorders after incident CVD, particularly during the immediate period. Yet, it is reasonable to assume that a diagnosis of a lifethreatening disease, such as CVD, is in many cases a traumatic experience resulting in an immediate rise in risks of psychiatric disorders. Others have reported such associations e.g. after natural disasters and we have indeed observed such a pattern in our previous work, e.g., after cancer diagnosis [3]. However, we agree that reverse causality cannot be excluded and may partly contribute to the highly increased risk of psychiatric disorder immediately after CVD diagnosis. Indeed, some of these patients may have been attended for their psychiatric disorders in primary care before the incident CVD. As the Patient Register only captures in- and outpatient hospital care, we have conducted an additional analysis, also excluding individuals with previous prescriptions of psychotropic drugs (ATC codes: N05, N06) before their incident CVD – thereby adding a detection of patients with prevalent mental health problems attended by primary care. The results show similar point estimates (Supplementary Appendix Table S5, listed also as below) thus not supporting the notion that reverse causality is a major concern. Furthermore, the association is noted up to 28 years after CVD diagnosis, which is unlikely due to reverse causality.

      We have now added our motivation for this additional analysis on the Method (Page 9), as below. “Because the Swedish Patient Register includes only information related to specialist care, we might have misclassified patients with a history of milder psychiatric disorders diagnosed before index date attended only in primary care. To account for the reverse causality of having undetected psychiatric disorders or symptoms before the incident CVD, we performed a sensitivity analysis additionally excluding study participants with prescribed use of psychotropic drugs before the index date (ascertained from the Swedish Prescribed Drug Register including information on all prescribed medication use in Sweden since July 2005), and followed the remaining participants from 2006 to 2016.”

      Second, for the robust matched cohort design, the authors age and sex matched each patient with 10 individuals from the general population and then also stratified their model by the matching variables. Could adjusting for matched factors in such cohort studies re-introduce bias into these estimates?

      Thank you for the comment. Adjusting for matching factors should provide estimates with the same validity as using a stratified model. In our study, we matched individuals diagnosed with a CVD with their unaffected full siblings as well as 10 randomly selected, unexposed individuals, on the same age and sex, without such diagnosis. As controlling for matching variables is recommended when there are additional confounders [1,2], we used a stratified Cox model commonly applied in family-based studies [3,4].

      References:

      1.Sjölander A, Greenland S. Ignoring the matching variables in cohort studies - when is it valid and why? Stat Med. 2013 Nov 30;32(27):4696-708.<br /> 2.Mansournia MA, Hernán MA, Greenland S. Matched designs and causal diagrams. Int J Epidemiol. 2013 Jun;42(3):860-9.<br /> 3.D'Onofrio BM, Lahey BB, Turkheimer E, Lichtenstein P. Critical need for family-based, quasiexperimental designs in integrating genetic and social science research. Am J Public Health. 2013 Oct;103 Suppl 1(Suppl 1):S46-55.<br /> 4.Song, H., Fang, F., Arnberg, F. K., Mataix-Cols, D., de la Cruz, L. F., Almqvist, C., ... & Valdimarsdóttir, U. A. (2019). Stress related disorders and risk of cardiovascular disease: population based, sibling controlled cohort study. bmj, 365.

      Third, the range of PIs associated with CVD is a lot broader than would be expected or unexpected (eg eating disorders!).

      Thank you for the comment. We agree with the reviewer that the strong association between CVD and incident eating disorders is somewhat surprising although the link between cardiovascular risk factors (e.g. obesity) and binge eating have indeed been reported [1,2]. We have now performed the analysis on the association between first-onset CVD and following incident eating disorder, additionally excluding individuals with a history of psychotropic medication use. We found that the associations became even stronger after this exclusion (Supplementary table 5). It is possible that individuals suffering their first CVD indeed drastically alter their lifestyle, in some cases resulting in dysfunctional eating and may therefore be vulnerable to eating disorders. Given that the evidence assessing the risk of eating disorder among CVD patients is still limited, our study adds a valuable piece of knowledge on this regard and calls for further investigations to better understand this association.

      References:

      1.Mitchell JE. Medical comorbidity and medical complications associated with binge-eating disorder. Int J Eat Disord. 2016 Mar;49(3):319-23.<br /> 2.Bulik CM, Sullivan PF, Kendler KS. Medical and psychiatric morbidity in obese women with and without binge eating. Int J Eat Disord 2002;32:72–78.

      Lastly, the authors should try to account for secular changes in smoking and alcohol consumption or BMI over the study period. In particular, while Sweden never had very high smoking rates (due to Snus) alcohol use within specific cohorts might have both affected CVD risk (particularly stroke) and PI risk. Examining trends in for example liver cirrhosis over the study time period might help or use sales/consumption data. The authors do recognize a limitation in being unable to adjust for smoking, alcohol, and adiposity.

      Some additional analyses to address these points and some more caution in the discussion are required.

      Thank you for the comment. As the reviewer points out, we do recognize the potential unmeasured influence of lifestyle factors (e.g. smoking and alcohol consumption) on the studied associations as these data are not collected in the Swedish registries. However, the associations between CVD and psychiatric disorders were quite stable across calendar time, although somewhat stronger by the end of the observation period. The evidence does not suggest a drastic change in lifestyle factors in Sweden during the latter part of the observation period except for a slight increase in alcohol consumption [1,2] and liver cirrhosis [3]. Although we find it implausible that such underlying secular trends in lifestyle are a major contributor in the reported associations, we have now conducted additional analyses, excluding individuals with alcoholic cirrhosis of liver (ICD-10 code: K70.3) or COPD (chronic obstructive pulmonary disease, ICD-10 code: J44) as a proxy for heavy drinking or smoking. The results remained virtually unchanged.

      We have now added reasons for stratified analysis by calendar years in Method (Pages 8-9), and as below:

      “We performed subgroup analyses by sex, age at index date (<50, 50-60, or >60 years), age at follow-up (<60 or ≥60 years), history of somatic diseases (no or yes), and family history of psychiatric disorder (no or yes). We also performed subgroup analysis by calendar year at index date (1987-1996, 1997-2006, or 2007-2016) to check for potentially different associations over time (i.e., due to lifestyle factors that changed over time, including smoking and alcohol use).”

      We found somewhat higher risk of psychiatric disorder observed in recent calendar years than earlier years (as in shown Supplementary Table S3).

      We found similar associations between first-onset CVD and incident psychiatric disorder with and without excluding individuals with a history of alcoholic cirrhosis of liver or COPD, used as a proxy for heavy drinking or smoking. The table has now added as Supplementary Table S8, and also shown as below).

      We have now added justifications in Method (Page 10) and in Discussion (Page 21), and as below: In method, Page 10:

      “To account for potential impact of unmeasured confounding due to lifestyle factors, we performed a sensitivity analysis excluding individuals with a history of alcoholic cirrhosis of liver (ICD-10 code K703) or chronic obstructive pulmonary disease (COPD, ICD-10 code J44), as proxies for heavy drinking or smoking.”

      In Discussion (Page 21):<br /> “although we found similar results with and without excluding individuals with a history of liver cirrhosis or COPD, as proxies for heavy drinking or smoking (Supplementary Table S8). We did not have direct access to hazardous behaviors that could potentially modify this association, and therefore cannot exclude the possibility of residual confounding not fully controlled for in the sibling comparison.”

      References:

      1.Statista. https://www.statista.com/statistics/693505/per-capita-consumption-of-alcohol-in-thenordic-countries/. Retrieved on 19 Aug.<br /> 2.Alcohol and Drug Report. Nordic Baltic Region. https://www.nordicalcohol.org/swedenconsumption-trends. Retrieved on 19 Aug. 3.Gunnarsdottir SA, Olsson R, Olafsson S, Cariglia N, Westin J, Thjódleifsson B, Björnsson E. Liver ;cirrhosis in Iceland and Sweden: incidence, aetiology and outcomes. Scandinavian journal of gastroenterology. 2009 Jan 1;44(8):984-93.

      Reviewer #2 (Public Review):

      Shen et. al investigated the associations between CVD and subsequent risk of psychiatric disorders using a prospective study design. The authors also performed subgroup analysis by sex, age at cohort entry and at follow-up, calendar year, history of somatic diseases, family history of psychiatric disease, and finally assessed the potential role of psychiatric comorbidity in cardiovascular mortality in CVD patients. The main takeaway of the analyses are the increased risk of psychiatric disorders in CVD patients compared to the different comparison groups.

      Though the study uses nationwide registers in a prospective study design setting, there are some methodological flaws with respect to study design.

      For assessing the primary aim the authors chose a rather unusual starting point by preselecting the exposure (CVD) group, rather than depicting the nationwide cohort of the general population followed up for a disease outcome with each category having exposed and unexposed individuals. Assuming that the population comparison group comes from the same study population as CVD patients, it is not clear why a similar strategy of study design as those cited in the manuscript (Zhang et. al 2015, Kivimäki et. al 2012, Godin et. al, 2012) was not followed. Similarly, one would expect sibling comparison group w.r.t outcome (psychiatric disorders) and not for exposure (CVD).

      Thank you for the comment. As correctly pointed out by the reviewer, we used a matched cohort design, both in the population- and sibling comparison. We firstly identified a nationwide cohort of general population who were born after 1932 and were residing in Sweden 1987-2016. We then identified all exposed individuals with first-ever diagnosis of CVD and matched population controls from this same nationwide population.

      A matched cohort design is applied here due to the strong confounding effects of some variables, e.g., age and sex, on the studied association between CVD and risk of psychiatric disorder. Exact matching on age and sex in our study makes the exposed and unexposed groups comparable and relief the confounding effects from matching factors in the design phase. Another practical viewpoint for why we use a matched cohort is a straightforward understanding of the comparison between exposed and unexposed groups being always at the same time, providing measures (such as risks and rates) during the follow-up period that are easily interpreted. Further, we have used this matched cohort design in many of our previous works [1,2] to maintain an identical design in both sibling and population comparison, so that the point estimates can be directly compared. The matched cohort design generates results of equal validity of the more conventional cohort design suggested by the reviewer [3] but has the additional quality of making the results from the various cohorts (here: population- and sibling comparison) more comparable. Our study therefore takes advantage of using a siblingcontrolled matched cohort, which is indeed a cohort design recommended for family-based studies [4] and provides results with similar validity as a full cohort.

      We have now added a sentence and a reference in Method to motivate the use of matched cohort design (Page 7).

      “We constructed a sibling-controlled matched cohort to control for the familial confounding according to guidelines for designing family-based studies.24”

      We have now updated the flowchart to add a box in the top reflecting the source population where both groups were identified from, shown in Supplementary Figure S1.

      References:

      1.Song H, Fang F, Arnberg FK, Mataix-Cols D, Fernández de la Cruz L, Almqvist C, Fall K, Lichtenstein P, Thorgeirsson G, Valdimarsdóttir UA. Stress related disorders and risk of cardiovascular disease: population based, sibling controlled cohort study. BMJ. 2019 Apr 10;365:l1255.<br /> 2.Song H, Fang F, Tomasson G, Arnberg FK, Mataix-Cols D, Fernández de la Cruz L, Almqvist C, Fall K, Valdimarsdóttir UA. Association of Stress-Related Disorders With Subsequent Autoimmune Disease. JAMA. 2018 Jun 19;319(23):2388-2400.<br /> 3.Sjölander A, Greenland S. Ignoring the matching variables in cohort studies–when is it valid and why?. Statistics in medicine. 2013 Nov 30;32(27):4696-708. 4.D'Onofrio BM, Lahey BB, Turkheimer E, Lichtenstein P. Critical need for family-based, quasiexperimental designs in integrating genetic and social science research. Am J Public Health. 2013 Oct;103 Suppl 1(Suppl 1):S46-55.

      Reviewer #3 (Public Review):

      Shen et al. investigated the relationship between the diagnosis of cardiovascular disease (CVD) and subsequent diagnosis of psychiatric disorders using national databases and health records over a 30year period in Sweden. They also investigated the association between the diagnosis of psychiatric disorder and subsequent CVD-related mortality. Comparisons were made between participants diagnosed with CVD and siblings without CVD, and between the CVD participants and random age- and sex-matched controls from the general population.

      They show that diagnosis of all types of CVD investigated was associated with increased risk of all types of psychiatric disorders considered, both in comparison to non-CVD siblings and general population controls. They also showed that diagnosis of psychiatric diagnosis subsequent to CVD diagnosis was associated with greater CVD-related mortality.

      A key strength of this study is the use of national databases and populations, as it has allowed for sufficiently large numbers for important subgroup analyses investigating specific types of CVD and psychiatric disorders. In addition to disease and disorder subtypes, the authors have investigated many other factors that may be important for understanding these relationships, including time of diagnosis during follow-up, year of diagnosis, age of participant, and various comorbidities. The duration of follow-up is another important strength of this study, as is the use of sibling controls to mitigate the potential confounding effect of genetic and early-life environment.

      However, while it is acknowledged as a limitation by authors, the lack of lifestyle data is a notable weakness of the study. The authors allude to causal inference in the abstract and discuss controlling for important confounding factors, but this is somewhat undermined by not being able to account for lifestyle factors, particularly since there are shared biological pathways such as inflammation linked to both CVD and many psychiatric disorders. As such, the associations reported in this study are potentially influenced substantially by unmeasured confounding related to lifestyle factors.

      Overall, this is important data, and the conclusions around these findings supporting surveillance of psychiatric disorders in individuals diagnosed with CVD due to its association with increased risk of mortality may be of interest to clinical settings.

      Thank you for the very positive comments.

    1. Author Response

      Reviewer #1 (Public Review):

      The paper of De Agro et al. proposes a new paradigm to measure wanting (binary choices) and liking (pheromone deposition) in ants in order to test bundling and segregation effects on reward processing.

      By using three different treatments: A) rewards (sugar drops) and costs (runway segments) are segregated; B) rewards are segregated and costs bundled; C) rewards and costs are bundled, the authors observed that the main predictor of pheromone release was the segregation of the runaway segments rather than segregation of the reward. Furthermore, no effect of treatment was observed on preferences for the odor associated with the treatment.

      The authors interpret their finding as a clear demonstration of segregation effects on liking, but not wanting, which was present only for costs but not rewards.

      Strengths: I appreciated the creativity and effort in conducting complex experiments and measurements in insects. Overall, the paper is the first of its kind to propose a method to test reward processing in insects. The design is well thought and the results are straightforward. The analyses seem to be appropriate.

      Weaknesses: My main concern relates to the interpretation of the pheromone release as an index of liking. I am not an expert in the field, but I would probably go for a more parsimonious explanation: the effect could be simply due to the quantity of liquid ingested (and therefore corresponding caloric intake). Did you check whether, in the conditions showing the biggest pheromone release, the ants consumed the biggest quantity?

      First, this could explain for example the puzzling difference observed in the 3 cohorts and the sequence effects.

      Second, a reduced overall caloric intake could also explain why segregated costs seem to drive the results. Digestive processes are possibly kicking in at different times in the segregate all conditions compared to the other two, due to the more time-delayed ingestion of food (i.e. we tend to eat less if we have longer time between meals).

      Finally, this account may also explain the reported difference between wanting and liking, as here the release of pheromone is simply the byproduct of how much sugar has been ingested (and possibly nothing to do with reward processes).

      If pheromones are released proportionally to sugar intake and if sugar intake was different between conditions, is an important point that should be clarified in the manuscript, in order to guarantee interpretability of the results.

      We understand the reviewer’s position, and agree with the need of reserving “high-level” processes as explanation to situations that have no alternative, more parsimonious ones. Indeed, we cannot be certain of what is happening in the ant’s mind, and if its hedonistic experience is indeed separate to its memory evaluation process. To this end, we propose a mechanism that can explain this difference in terms of the memories formed for food quality and path length.

      We have now reduced our claims on the Liking vs Wanting framework. Regarding the origin of pheromone deposition being linked to caloric consumption, we believe this is not the case.

      Reviewer #2 (Public Review):

      Only a few decades ago ants were considered little machines without learning capabilities or personality. Ever since then, we have been able to attribute more and more personality to them. In this study, De Agro et al. have been able to use psychological tricks to manipulate the decision-making process of an ant species. By bundling or segregating costs (distance) and gains (food) they were able to demonstrate that ants, just as humans, experience gains and costs (in most cases) on a logarithmic scale. Moreover, they suggest a quantitative way to disentangle "wanting" and "liking" in ants, allowing for further interesting scientific designs to test theories long applied by behavioral economists on humans.

      The strength of this study clearly lies in the simplicity of its design and its strong foundation on current theories and models. It is clearly written and easily followed even by a non-specialist reader.

      I also particularly liked the exhaustive discussion and the interdisciplinary links it proposes. Including (but not limited to) the potential ecological implications in plant-pollinator interactions, with flowering plants potentially abusing segregation of flower rewards to manipulate the pollinator.

      The weaknesses:

      The statistics seem to lack any control for random factors like individual ant or colony of origin. While the results are quite clear and will likely not change with these additions they could add a little bit more resolution in some cases or help explain certain trends better. Especially since apparently a result with a highly significant p-value of 0.0036 is considered a false positive due to a lack of rational explanations. Individual experience, age, or fitness/size of the colony of origin could all affect the decision-making processes in individuals and should be controlled for (and discussed).

      We thank the reviewer for the comment. All the models we used in the analysis did include a random factor, always specified as ants nested into colony of origin, as appreciable in the R full script available in ESM2. Indeed, we failed to mention it in the paper and discuss it, which the reviewer is correct in requiring. We now added this specification in L261-262 and L271-274.

      Moreover, based also on the comment of another reviewer, we reconsidered our random-effects structure. See point 5 for the full explanation.

      In line with my previous comment, I would also have liked to see a bit more data on individual variation to better appreciate cross-condition comparisons. For instance, the fact that in Figure 4 ants that experienced the "segregated all" effect laid overall more pheromones than the ones that experience "bundled" first is barely acknowledged in the manuscript. These kinds of variations in pheromone deposition rate (not just relative, but in absolute numbers) need to be better discussed.

      Yes, there do indeed seem to be interesting patterns in figure 4, and spent many hours exploring the data in great depth, looking into visit-level pheromone depositions (see updated supplementary figure), to try to understand the patterns we see. We then discussed in detail how to present our findings. The main finding to explain is in condition 3: the overall pheromone deposited in the “good (in this case segregated all) encountered first” is lower, whereas it was higher for all the others conditions.

      We did develop an explanation for the pattern of findings (see below). However, we freely admit to being unsatisfied with it – the explanation is ad-hoc, and there are no strong biological or psychological reasons for it to be true, apart from fitting the data. Ultimately, we decided not to discuss these patterns very extensively, since we felt it added greatly to the length and complexity of the discussion, while not adding a lot of biological insight. Nonetheless, we crafted a manuscript-ready addition outlining our current best guess (and it is a guess) explaining the patterns in absolute pheromone deposition level. The text would be added directly to the end of the paragraph ending at line 428. If the editor and reviewer agree that this is a worthwhile addition, we would be happy to add it.

      Note that there are no significant differences between any of the groups or interactions in condition 1, although a purely visual inspection of the figure might suggest one.

      "In addition, the absolute amount of pheromone deposited (independently from the currently experienced option) varies depending on the first encounter treatment. This effect is puzzling, as it seems to cause a reverse in the absolute pheromone deposition in conditions 2 and 3, where the lower amount is in correspondence with the “bad” option being encountered first in the one, but the “good” option in the other. Indeed, we do not have a fully satisfying explanation for it. Our best guess is that it may be due to an inertia effect in pheromone deposition: when low to no pheromone is deposited for 2 or more visits in a row (the first visit being always low, and the second being of a non-preferred treatment), the ants never subsequently raise their pheromone deposition. This pattern is visible in Condition 3, as both option available are of low value, while in Condition 2, the Reward Segregated option is marginally better, and as such at least some pheromone is deposited even in visit 1 . We provided a visit-by-visit analysis and graphs in the supplement S2. Similar patterns have been reported for this species (Beckers et al., 1993)."

      I would also have liked to see a graph or results focusing on the pheromone deposition rate ONLY at the first experience trial, rather than always in combination with the subsequent trials.

      We would too! Again, we extensively discussed including one, as the first visit is very valuable – it is the only visit in which we can exclude contrast effects, making the results in principle much easier to interpret.

      However, the problem is that these ants deposit a lot less pheromone on their very first visit. This makes biological sense – they may be lost, and don’t know how reliable the food source is yet (see e.g. Czaczkes et al., 2013 figure 5A, where this pattern can be clearly seen). The same is true in the current dataset – which is why data from visit 1 is excluded from the figure (we repeat the analysis with and without visit 1, and find no differences in the results – see supplement).

      As a visual demonstration, we provide two (ugly, sorry) figures below: the pheromone deposition per treatment for only visit 1, and for visits 2-8. Note the massive zero inflation in visit 1.

      Pleasingly, the broad pattern (considering the mean) in just visit 1 follows our expectations. However, any reasonable statistical test on data from just the first visit would find no significant difference.

      In addition, even though the study focuses strongly on differences between "wanting" and "liking" it barely touches upon the data looking at "wanting". A graphical illustration of the Y-maze experiment and the binomial decision would have helped appreciate this result better (even if it is non-significant).

      We thank the reviewer for the comment. We generally try not to overburden our manuscripts with figures, as we aim to maintain the message of the paper focused on what we believe to be the most important finding. For this reason, we believe that a figure for the binomial response would be somewhat wasted, as all it would show are 3 points for each of the 3 conditions, all around 0.5 probability of choosing the predicted option. Below is an example of what such a figure would look like:

      On the other hand, we agree that a graphical illustration of the Y-maze may be of use. We now added Figure 2, showing both the Y maze and the pheromone deposition behavior, as the two main behaviours recorded.

      I also believe that the authors are overstating their claims of showing for the first time that ants prefer closer food sources. The cost of distance has already been demonstrated indirectly in Frank & Linsenmair 2017: "Individual versus collective decision making: optimal foraging in the group-hunting termite specialist Megaponera analis" for instance. While the current study does more directly imply the preference for closer food in a controlled experimental design I would argue that there is sufficient knowledge with indirect observations in natural settings, making the claim of showing it here for the first time unnecessarily hyperbolic.

      We agree with the reviewer. We have now added a reference to Frank and Linsenmair 2017 and weakened our claim. L534-535

      While the results of this study are novel and very interesting to a broad readership, I would suggest including in the discussion and introduction also a newer study on "food wanting is mediated by transient activation of dopaminergic signaling in the honey bee brain" by Huang et al. 2022 in Science and also recommend the accompanying perspective article by Garcia and Dyer on "Why do animals want what they like?".

      Thank you for the comment. We are aware of this new paper but we could not reference it in the earlier version of this manuscript as our submission to eLife happened on the 10th of April, prior to the publication of Huang et al. Considering the suggestions of the other reviewers, we have now reduced our claims about the liking vs wanting framework (see point 1). The reference has now been added in L518-532.

      Reviewer #3 (Public Review):

      This work aims at testing hypotheses derived from the field of behavioral economics (Kahneman's theories), related to subjective value perception in ants foraging for food. The work was conceived to test how ants react to a specific feature which is the segregation or the bundling of food resources. Behavioral economics posits that individuals value more segregated resources than the same amount of resources presented in a bundled way. At the same time, if accessing the segregated resources implies an increase in energetic costs to access them (i.e. longer displacements), then costs would be also perceived as higher in the segregated-resource case than in the bundled-resource case.

      Whether ants conform or not to this model is an interesting question, and irrespective of the results obtained, the experiments presented by the authors have been conceived to address this model as the experimental parameters varied refer to resource separation (drops of sucrose solution with different degrees of spacing between them) and to walking distances.

      Yet, the manuscript suffers from various serious deficits that preclude being enthusiastic with respect to its present form. Various problems are listed below, which reduce the quality of this work. Hopefully, the authors can amend some of these problems to reach a more consistent version.

      1. The inconsistent and unjustified "wrapping" with a "wanting vs liking" framework

      While it is unquestionable that the question raised by the authors revolves around behavioral-economic hypotheses on value perception and is fully addressed by the experiments performed, the "extra wrapping" of the "wanting/liking" framework added, probably to make the manuscript more attractive, is unjustified and excessively speculative. The use of a "wanting vs liking" interpretation framework is inappropriate as neither the experiments were conceived to address this topic, nor the results allow any robust conclusion on this point. These concepts originate in neuroscience analyses of neural-circuit activation in the mammalian brain upon situations that allow distinguishing several components related to reward: 1) the hedonic effect of pleasure itself (liking); 2) motivation to obtain the reward (wanting or incentive salience); and 3) and reward-related learning(1-3). These components refer to different identified neural circuits and brain areas as wanting for reward is generated by a large and distributed dopaminergic brain system including the frontal cortex, while liking is generated by a smaller set of hedonic hot spots within limbic circuitry and which are not dopamine-dependent.

      Clearly, the use of the wanting vs liking terminology requires accuracy and appropriate studies to support it. This is not the case in the present manuscript which was not conceived to tackle this issue. Moreover, inconsistent testing procedures (see below point 3) undermine the use and interpretation of choice data as wanting. The authors have no proof of the involvement of wanting vs. liking systems in their design and even more, cannot disentangle between these components based on their behavioral data. Considering that pheromone deposits after food experience express "liking" can be questioned as it does not dissociate between individual liking and social information transfer (the liking and wanting systems are individually based systems). Moreover, the assignment of a choice in a binary-choice test to a wanting system is also questionable as the experiments cannot disentangle between the eventual individual wanting and the reward-related learning as animals are making choices based on odorant cues they have learned during their previous foraging bouts. In the absence of neurobiological data, the hypotheses of wanting vs. liking remain on a shaky, highly speculative ground.

      Thus, the whole "wanting vs liking interpretation" (which attains alarming speculative levels in the Discussion section) should be omitted entirely from the manuscript if the authors want to provide a solid convincing framework articulated exclusively around the bundling vs. the segregation effects, which is precisely what their experiments tested. The rest is speculation in the absence of analyses supporting the wanting vs liking dissociation. An example of the kind of analysis necessary to go in this direction is provided by a recent work in which a dopamine-based wanting system was shown in honey bees(4), a work that the authors did not consider. We are clearly far from this kind of analysis in the present manuscript. As the authors wrote, "the present study is the first to examine bundling vs. segregation in an animal (line 99)", yet not liking vs. wanting.

      The reviewer makes a very well-argued case for this study not being sufficient evidence for distinct “wanting” and “liking” systems in an insect – a point echoed by the other reviewers. Their comments were helpful and insightful, and we fully agree with them. We have thus omitted the concepts of “wanting” and “liking” from the title, introduction, methods, and results.

      However, we feel that, especially given the results of Huang et al. (which were not published when we submitted the manuscript), the idea that the mismatch between the choice and pheromone data is driven by them acting on two separate systems reasonable: while it is not well supported, it is certainly consistent with this. The discussion seems to us to be the appropriate place to speculate about the meaning of results – especially results we do not fully understand. We would thus like to maintain a short discussion of this hypothesis in the discussion. Perhaps other researchers will be inspired to collect the necessary data to test whether such segregation effects really do affect “liking” but not “wanting” – something which is beyond the capabilities of our strictly behavioral lab.

      1. Some experimental assumptions are not substantiated by data

      The experimental procedure relies on separating or aggregating reward (drops of sucrose solution) and determining the impact of this variation on pheromone deposition while returning to the nest and subsequent choice in a dual test situation in which two of the three treatments designed - distinguished by the odorant experienced en route to reward - were presented. While the "Segregated All Treatment" (Fig. 2A) managed to space the 0.2 µl reward drops by significant 25-cm segments, thus enhancing potentially both reward appreciation (segregated food drops) and cost appreciation (successive segments to be negotiated), the "Segregate Reward Treatment" (Fig. 2B) raises doubts about its validity.

      In this case, three drops were offered at the end of three consecutive 25-cm segments, with the assumption that drops spaced by 5 mm should be perceived as being segregated (two of 0.2 µl and 1 ad libitum). Yet, there is no proof - at least in the manuscript - that spacing two food drops by 0.5 mm induces a segregated perception in ants. The first experience with the first drop may induce both sensitization and a local search that may last until the very close next drop is detected so that for the ant, these drops would be perceived as belonging to the same resource rather than being perceived as segregated resources. The same applies to the vicinity between the 0.2 µl drop and the ad libitum drop.

      This raises the question of the real volume of the ad libitum drop, which is not mentioned (it is just described as beings "large"; line 205). One could argue that if drops separated by 5 mm were bound together, the results would be similar to those of the "Bundled Treatment" (Fig. 2C). Strictly speaking, this is not necessarily true if the volume of the large drop was known. If this were the case, the Bundled Treatment offered a volume that was 0.4 µl smaller than the total food provided in the "Segregate Reward Treatment".

      Overall, further controls are needed to support the assumptions of the different treatments chosen.

      See detailed response to main concern 4 – “The segregation effect”. In brief: we agree that the current experiment cannot distinguish not sensing a difference between a big drop and three little drops from sensing a difference but not responding. However, the inclusion of the “segregated reward” treatment was only added to aid result interpretation in the event of reward segregation fully balancing out cost segregation. Since the response of ants to “all bundled” and “all segregated” treatments were different, the “segregated reward” treatment is in fact not needed to support our claim that segregation affects perceived value in these ants.

      1. Unclear design in the testing procedures

      The authors did not specify in the methods if a reward was provided in the tests in which a Y maze was presented to the ants having experienced a succession of short and long segments. This information was provided later, in the Results section (line 309) and, as expected, no reward was provided during the tests, thus raising the question of the necessity of the three consecutive tests, with no refreshment trials in between. This procedure is puzzling because it induces extinction of the odor-length association - as verified by the authors (see lines 306-309) - and makes the design questionable. Only the results of the very first test should be kept and analyzed in the manuscript.

      The same remark applies to the three tests performed after comparing the experimental treatments, which - one discovers only in the Results Section - were also performed in the absence of refreshment trials. In fact, the absence of coherence in the results of these tests (e.g. lines 328-332) could be precisely due to a change of strategy between the tests following the absence of reward in the first test. This underlines the necessity of focusing exclusively on the first test and dismissing the data of the 2nd and 3rd tests in which performance may have been affected by extinction and strategy change. This again shows why speaking about "wanting" in this inconsistent framework makes no sense at all.

      We thank the reviewer for the comment. Please see point 2 were we provide the full answer. We initially included the subsequent testing in our experimental design so as to gather as much information as possible. A change in preference linked to the absence of the reward is indeed expected. However, the rapidity and direction of change can give valuable information that would be lost if the data were to not be collected. We agree with the reviewer that in this specific experiment the data was not particularly useful, but we believe it would be wrong from us to just not report it. As we wrote above, please note that in ESM2 we report the choice probability of the first choice only, showing the same exact result as when all three choice are considered.

      Reviewer #4 (Public Review):

      The manuscript reports an experiment testing how the distribution of rewards and costs influences perceived reward value in ants. Using a bundling manipulation where rewards and costs were presented either in small separated amounts (segregated) or together in a larger amount (bundled), the results show that ants deposited a greater quantity of pheromones (which was used as an index of "liking") when rewards were segregated and costs bundled compared to when both rewards and costs were bundled (although that difference was statistically significant only in ants experiencing the segregated reward condition first during training) and when both rewards and costs were segregated. By contrast, no evidence was found for a bundling effect in terms of choice behaviour (which was used as an index of "wanting"). The authors suggest that these findings demonstrate a bundling effect and a dissociation between "wanting" and "liking" in ants.

      Overall, the experiment provides a worthy contribution to the study of the biases that affect the perceived value of rewards in a translational perspective from humans to invertebrate animals. The experimental manipulation is clever, and the results clearly indicate that manipulating bundling affected pheromone deposition in ants. However, the data reported do not appear to fully support the conclusions of an increased "liking" of the segregated rewards and bundled costs compared to bundled rewards and costs. In addition, more evidence (along with stronger justifications) would be needed to establish that choice behaviour and pheromone deposition are appropriate and sensitive measures of "wanting" and "liking", respectively. This aspect renders any claim of a dissociation between "wanting" and "liking" in ants somewhat premature and speculative at this stage. I describe these concerns in more detail below.

      1. The main hypothesis tested is that segregated rewards with bundled costs should be the most "liked" option relative to bundled rewards and costs and segregated rewards and costs. The results are interpreted as fully in line with this hypothesis. However, the data reported do not suggest this is the case: The difference between the 'segregated rewards' condition and the 'bundled' condition is not statistically significant when all ants are considered (that difference being statistically significant only for ants that first experienced the 'segregated rewards' condition during training). Although this point is briefly acknowledged in the discussion, more nuance and extra caution are needed in the overall interpretation of the findings, so that this statistically nonsignificant result does not appear as being treated as if it were statistically significant.

      We thank the reviewer for the comment. Indeed, our initial hypothesis was the one described here. However, the results of the segregated rewards vs bundled condition, being not significantly different, forced us to consider an alternative hypothesis. We believe that our current experiment managed only to bundle and segregate costs, not gains. Given this, we would expect segregated rewards vs bundled to perform at chance level, since in both the cost is equally bundled. As such, we are ultimately treating the result as non-significant. We are, however, clearly stating our initial hypothesis, and discussing how the data fits it, as we feel it would be dishonest of us to give the impression we had the second hypothesis from the start, or on the other hand to treat a p-value so near 0.05 as definitely random.

      1. An important requirement to adequately evaluate the findings from the choice behaviour test is to ensure that ants did learn the associations between the reward conditions and the runway scents. Ruling out potential learning confounds is in fact essential to interpret the results as reflecting the operation of motivational processes such as "wanting". Whereas the results from the pilot experiment suggest that ants learned the contingencies between the runway length and its associated scent, the pilot experiment and the main experiment differ in significant ways. Therefore, it is unclear whether the ants learned the contingencies in the main experiment, which could be advanced as an alternative explanation for the lack of preferences between the two scented arms of the Y-maze during the choice test. Another important aspect to consider is that the reward still has to be valued by the organism to appropriately assess "wanting" processes. Indeed, "wanting" is generally conceptualised as conjointly determined by the associative history between the cue or context (scent) and the reward (sucrose solution) on one hand, and the organism's homeostatic or physiological needs such as hunger on the other hand (e.g., Zhang et al., 2009. https://doi.org/10.1371/). In the main experiment, the question arises as to whether reward devaluation could have occurred-resulting in the reward having a diminished value as the ants were able to consume the sucrose solution to satiation multiple times across the experiment. For these reasons, it would be important to provide information showing that (a) the ants learned with which condition the scent was associated and (b) that the reward was still valued during the choice test. These points are key preconditions that need to be fulfilled for ruling out potential confounds that could explain the findings of the choice test as well as for suggesting a dissociation between "wanting" and "liking".

      We thank the reviewer for the comment. As stated in point 1, the “liking” vs “wanting” framework has been greatly reduced in the paper, only being raised as a possible explanation of the observed results, with the lack of learning being raised as a reasonable alternative explanation. We have reason to believe that the ants are actually learning the association presented, as we detail in point 2.1. Of course, we cannot be completely certain, as it is impossible to disentangle preference from learning in such experiments. As such, the possibility is mentioned in the manuscript.

      1. Relatedly, a strong justification needs to be formulated to substantiate that the choice test provides a reliable indicator of "wanting". This is critical to conclude that the results can be interpreted as reflecting a dissociation between "wanting" and "liking". In rodents and humans, "wanting" is typically measured as an increased effort mobilisation during the presentation of a cue associated with a reward (e.g., Pool et al., 2016. https://doi.org/10.1016/j.). It remains however unclear how choice can capture such effects. This questions the extent to which choice represents an adequate operationalisation and measure of "wanting" as described in the incentive salience hypothesis (Berridge & Robinson, 2016. https://doi.org/10.1037/). Moreover, it should be clearly explained and motivated whether, and if so how, choice purely measures "wanting" without being contaminated or influenced by liking-based processes, such as preferences or expected pleasantness for instance.

      We agree with the reviewer. Indeed, our linking of choice with “wanting” and pheromone with “liking” is highly speculative. According to point 1, we strongly reduced our claims and propose the association only as one of several potential explanations.

      1. Little information is provided on how pheromone deposition was measured and on the specificities of this measure, such as its physiological bases, timing properties, and granularity. However, detailed information about this measure is of high relevance to be able to assess if pheromone deposition represents a sensitive measure of "liking". "Liking" is typically measured as hedonic reactions during reward consumption across the rodent and human literature (e.g., Pool et al., 2016. https://doi.org/10.1016/j.). Accordingly, a good index of "liking" should be specifically responsive to reward consumption. By extension, an increased pheromone deposition should be particularly evident after the ants consumed the sucrose drop. As it stands, it is unclear whether this is the case as the pilot experiment showed no statistically significant difference in pheromone deposition between the way towards the sucrose drop or back. If the measure of pheromone deposition allows for distinguishing between pheromones deposited on the way towards the drop and pheromones deposited on the way back in the main experiment, a further test that could be run would be to compare the pheromone deposition on the way towards the drop in the 'segregated all' condition versus the 'segregated rewards' and 'bundled' conditions. A higher pheromone deposition on the way towards the sucrose drop in the 'segregated all' condition could provide converging evidence that pheromone deposition is a sensitive indicator of "liking".

      Unfortunately, in our current setup it was impossible to collect pheromone deposition data on the way towards the drop. Pheromone deposition has to be collected by eye, and the experimenter needed to maintain attention on the delivery of the successive rewards. A camera was not an option either, as the distance and resolution needed to record the whole runway would be insufficient to notice deposition, which instead requires the experimenter to follow the ants and count the individual stereotyped behaviours. We do observe the effect of higher deposition near the drop relative to further down the runway on the way back, which seems to be congruent with the response to consumption. We are however aware this is not sufficient, as it may just be linked with the distance. Regardless, as per point 1, we are decreasing our claims for the liking vs wanting framework.

    1. Author Response

      Reviewer #1 (Public Review):

      I'm curious about whether the microscopy provided any information about when secretory vesicles leave the TGN. Do they leave throughout the lifetime of a TGN structure, or do they leave in a burst when a TGN structure disperses as marked by loss of Sec7? This information might take us a step closer to understanding how secretory vesicles are made.

      Given the limitations of our current imaging set-up with regards to high-speed 3D two-color microscopy, we were unable to capture a large number of these events and therefore cannot make concrete statements about this, however, the quantified events did not appear to be preceded or followed by additional events, suggesting some temporal separation.

      Reviewer #2 (Public Review):

      The authors are encouraged to integrate their data together better with published biochemistry and structural work into more complete mechanisms for vesicle trafficking, tethering and fusion. The manuscript would be improved by a clearer model(s) of how these factors come together to carry out exocytosis.

      This suggestion has been addressed by the addition of a new model figure (Figure 9).

      Moreover, many conclusions (especially as they appear in the Results and Figures) are written as if they are well supported by the data (or others' data), when they are often speculative, or reasonable alternative explanations exist. The authors should be clear about which conclusions are well supported, and which are hypotheses. (e.g. Fig 6I, which is a terrific figure, but some of the "conclusions/statements" are speculations).

      We have made textual changes to make clearer distinctions between conclusions that are supported by the data, and which are more speculative.

      The mechanistic and experimental definitions for the start/end of "tethering" and "fusion" are not clearly stated in the main text, which leads to confusion when examining the arrival of different factors (and seems to lead to circular arguments about what is defining what). Are these definitions well supported by the previously published and current data? E.g. is the disappearance of GFP-Sec4 really equal to the fusion event? Without data showing membrane-merger or content delivery, this needs to be described as an assumption that is being made.

      Early in the results, we now define precisely what we interpret as the start of tethering and time of fusion. Unfortunately, thus far, all attempts at designing a cargo marker suitable for defining membrane fusion have not succeeded, however, we believe the observations in Figure 4 strongly support assumption that loss of GFP-Sec4 signal coincides with fusion.

      The Sro7 results and conclusions are complicated, and not always carefully supported, for several reasons: there is a functionally redundant paralog Sro77, and data shows Sro7 can bind to Sec4, Sec9 and Exo84 in exocyst (Brennwald, Novick and Guo labs). The authors should be clearer, as they seem to pick and choose which interactions they think are relevant for different observations.

      We did not intend to “pick-and-choose” relevant interactions and now more clearly state what our Sro7 results mean.

      The assumption that yeast Sec1 behaves similarly to other Sec1/Munc18 proteins for "templating" SNARE complex assembly, e.g. Vps33 in Baker et al, is unlikely, given the binding studies from a number of labs (Carr, McNew, Jantti). Furthermore, the evidence for Sec1 interaction with exocyst suggests that they may work together (Novick, Munson labs). Previous data from the Guo lab (Yue et al 2017) and new BioRxiv data from the Munson/Yoon labs suggest that exocyst may play key roles in SNARE complex assembly and fusion.

      We did not mean to imply that the exocyst does not play a meaningful and critical role in SNARE complex assembly and fusion. This was an unintentional omission, which we have now addressed in the text. Our interpretation of the published meaning of SM-protein “templating” is that SM’s facilitate the alignment of the critical zero-layer ionic residues in the SNARE motifs, which may be possible regardless of affinity to single SNARE motifs. Indeed, for Sec1 specifically, it may be possible that this exact function is of lower importance relative to, perhaps, the stabilization and protection of trans-SNARE complexes prior to membrane fusion. Future studies may clarify this.

      There is concern that the number of molecules of each of the factors measured is accurate, and how the authors really know that they are visualizing single vesicle events (especially with data showing that "hot-spots" may exist). For example, why is the number of molecules of exocyst is ~double or more than that previously observed (Picco et al; Ahmed et al with mammalian exocyst).

      Estimating the numbers of molecules is subject to some variation due to fluorescent tags used and to some extent where the protein is tagged. Since different tags were used in the earlier studies, being within a factor of two is not that surprising.

      For puncta of exocyst subunits in the mother or moving towards the plasma membrane, what is the evidence that they are actually on vesicles? The clearest argument seems to be the velocity at which they move, but this could be due to the direct interaction of exocyst with the myosin (which is a tighter interaction in vitro than exocyst-Sec4 binding), rather than being on vesicles. Furthermore, do all the exocyst complexes in the cell show this behavior, or could these be newly synthesized/assembled complexes?

      Transport of the exocyst by myosin alone without a vesicle seems very unlikely, as this myosin V needs to be activated by binding vesicle-associated Sec4 (Donovan et al., 2012, 2015). Moreover, transport of just two exocyst complexes by a myosin dimer would be very hard to detect. Nonetheless, we have added an additional supplementary figure (Figure 1 Supplement 5C) illustrating a clear example of exocyst complex colocalization with a secretory vesicle in the mother cell which we hope will quell fears that the exocyst complex is indeed on secretory vesicles, albeit in small numbers, during this stage of transport.

      With regard to the exocyst octamer leaving at the time of "fusion," the authors should discuss Ahmed et al.'s finding of Sec3 leaving prematurely in mammalian cells, as well as data from the Toomre lab.

      We did reference this earlier work in mammalian cells and indicate that it differs from the situation in yeast. We don't have anything insightful to be drawn from these differences.

      Reviewer #3 (Public Review):

      In this context, it is notable that dual-channel imaging appears to be made by sequential, not simultaneous, acquisition, which deserves a currently missing comment. Moreover, given the weight that image acquisition plays in this project, it might be described and justified better.

      As noted above, we have expanded our description of the microscopy. We took two-color images sequentially as our microscope is not configured with a beam-splitter for simultaneous imaging.

      This referee could not fully understand the routine of image acquisition, specifically, the continuous movement of the stage in the Z-axis as images are streamed (to the RAM or to the disk? the latter takes time, line 177); does it mean that Z-stepping is solely governed by the exposure time? The CCD camera penalizes pixel size (16 µm) at the expense of achieving outstanding quantum efficiency. The optical path includes a 100x objective and a 2x magnification lens to compensate for the large camera pixel size, thereby achieving 0.085 µm/pixel, but these lenses 'waste' part of the fluorescent signal. One wonders if the CMOS camera (6.5 µm pixel size) coupled with a 63x objective wouldn't be appropriate? A brief discussion on this choice would be helpful for readers.

      We now discuss the microscopy in more detail and why we use an EMCCD rather than aCMOS camera.

      It is remarkable that Sec2 and Sec4 are recruited to membranes even before a vesicle is formed (Fig 6I). I find somewhat weak the evidence that RAB11s 'mark' the TGN, and disturbing the fact that RAB11 reaches the PM (does GFP tagging prevent GAP accession?). I should like to recommend strongly that the authors integrate into the introduction/discussion information on the late steps of exocytosis available for Aspergillus nidulans, another ascomycete that is particularly well suited for studying this process. Here RAB11 is not a late Golgi resident but is transiently (20 s) recruited to TGN cisternae in the late stages of their 120 s maturation cycle to drive the transition between Golgi and post-Golgi (Pantazopoulou MBoC, 2014). Recruitment of RAB11 to the TGN is preceded by the arrival of its TRAPPII GEF (Pinar, PNAS 2015; Pinar PLOS Gen 2019), a huge complex that is incorporated en bloc to the TGN (Pinar JoCS, 2020). Upon RAB11 acquisition RAB11 membranes engage molecular motors (Penalva, MBoC 2017) to undertake a several-micron journey that transports them to a vesicle supply center located underneath the apex (review, Pinar & Penalva, 2021). Here is where Sec4 is located, strongly indicating that there is a division of work between two Rabs each mediating one of the two stages between the TGN and the membrane (Pantazopoulou, 2014, MBoC).

      In the general comments above, we discuss the possible artifact of tagged Ypt31 on the PM. In the Discussion, we now compare our results in S. cerevisiae with the findingss in A. nidulans.

    1. Author Response

      Reviewer #1 (Public Review):

      In this work, the authors were trying to develop an approach for microindentation-based spatial mapping of articular cartilage of mouse femur. Because mouse cartilage in articulating joints is incredibly thin and challenging to indent repeatably and reliably, a need exists to increase resolution of indentation spacing on very small surfaces, improve sensitivity of indentation (e.g., surface detection), and reduce error and improve accuracy of indentation measurements. Using a relatively new multiaxis material test stand with repositioning capabilities and multi axis load cells, the authors developed a spatial indentation test protocol as well as used this array-based approach to measure cartilage thickness via needle probing. They then validated thickness measurements generated using needle probing with high resolution 3D x-ray imaging using contrast enhancement with phosphotungstic acid (PTA). The authors then compared cartilage thickness and indentation mechanical properties between wild type (C57BL6J) and Prg4 mutant mice.

      This work is rigorous and includes new techniques that are validated using orthogonal approaches. Some of the techniques used in this work, especially indentation-based mapping of cartilage stiffness in small mouse joints, have been challenging for the field to overcome. This is especially true with the exploding number of small animal studies investigating cartilage health in transgenic mouse strains and injury models. While innovative and important, there remain a few key experiments that would help with validation of the data acquired in these experiments.

      Specifically, a general rule of thumb for indentation testing is to test no more than 1/10th the thickness of the indented material. Because the cartilage thickness of the medial condyles (~0.04mm) was only ~2x that of the indentation depth used for automated indentation mapping (0.02mm), it is possible that this thin region of cartilage will lead to substrate effects from the subchondral bone on the indentation data. It is unclear if the indentation measurements are characterizing cartilage or substrate properties. This may not be a major issue for healthy, intact cartilage (including in the mutant strains) but will likely have a major impact on interpretation of results following cartilage degeneration and loss.

      It is unclear if damage was caused by the 0.02mm indentations because the XRM scanning occurred after needle probing tests. The "bands" observed in the 3D XRM imaging following both indentation and needle probing (Fig 2A2) suggests that the indentation probes and individual needle probings at each site are not perfectly overlapping. Surface congruency of the cartilage suggest valley formation at indentation sites.

      We thank the reviewer for the enthusiastic comments on our work and its importance to the field. We agree this is the known rule of thumb; however, by employing a microindentation rather than a nanoindentation approach (such as AFM) to also obtain spatial resolution, we are unable to probe the cartilage within a 1µm amplitude range reliably. Also, we had to accommodate for thickness variations throughout the cartilage surface (thinner and thicker regions) during indentation testing, which are unknowable before needle probing thickness measurements. We completely agree that substrate effects can play a role on indentation data as this is well described within the field. Therefore, to mitigate such effects, the instantaneous modulus was calculated at 20% strain for all positions and presented alongside thickness mappings of same surfaces to avoid misinterpretations on cartilage loss. Demonstrated repeatability in indentation peak forces during test-retest suggests indentations did not damage the cartilage surfaces. This can be further corroborated by XRM imaging of femurs subjected to indentation testing only. Nevertheless, we would also like to clarify (and will do so in the methods) that the indentation and needle probing were not undertaken on the exact same position, exactly because of this possibility. We apologize for any misunderstandings and would be happy to further clarify that on the methods section.

    1. Author Response

      Reviewer #3 (Public Review):

      The study by Randzavola and colleagues provides a follow-up of their previous publication (Thomas DC et al, J Exp Med 2017) describing EROS (Essential for Reactive Oxygen Species or C17Orf62) as a novel chaperone essentially required to support the phagocyte Nox NADPH Oxidase respiratory burst and bacterial killing. Here, the authors extend the investigation of the mechanism underlying EROS effect and show its very early binding in the endoplasmic reticulum and interaction with immature partially glycosylated forms of gp91phox (the catalytic subunit of the Nox complex), allowing the incorporation of heme and subsequent binding of p22phox, which later follows the usual steps for complex maturation. A novel finding was the association of EROS with the OST component of the N-glycosylation machinery. An extended proteome analysis confirmed that EROS is quite specific for the gp91phox/p22phox complex and also for the purinergic P2X7 receptor, which also interacts with EROS (as also shown previously by the authors and further investigated by Ryoden et al. J Immunol 2020). The authors further validate EROS binding to P2X7 and provide evidence that EROS loss-of-function impairs P2X7-associated functions. Particularly, mice with genetically ablated EROS show improved survival to influenza infection.

      A major strength of this line of investigation is the clear functional importance of EROS in the regulation of the protein expression of the Nox complex components. Previous work has clearly shown that human EROS deficiency associated with the severe immunodeficiency Chronic Granulomatous Disease, which is usually caused by genetic deficiency of the Nox complex components. Indeed, the loss of gain of functions of EROS are very clearly associated with major changes in the expression of those components, indicating EROS functional relevance. Moreover, the interplay between the P2X7 receptor and EROS is also relevant, given that this receptor mediates an important arm of innate immunity, namely the nucleotide-driven inflammasome activation. Thus, the authors are likely dealing with some undoubtedly important novel information which may be of broad impact to understand several aspects of the adaptive and even adaptive immunity.

      Enthusiasm for this article, however, is somewhat decreased by some aspects, as follows:

      1) While there is a substantial amount of new data, the corresponding progress in depth of mechanistic insights has not been commensurate, bearing in mind the author's previous work. The novel findings are the more clear documentation of EROS/gp91phox interaction and its time-course during nascent gp91phox protein processing in the ER. Also, their interplay with the OST complex. The extended list of proteins associating with EROS essentially confirms previous findings. Also, the work with P2X7 mostly confirms previous findings, while the novel and interesting experiment with EROS-silenced mice and viral infection needs further work, as commented below.

      We thank the reviewer for this comment and for seeking clarity on novelty. We have addressed this above and in the discussion section. We have not reported the EROS interactome by mass spectrometry in previous work.

      2) Some aspects of these results are less than clearcut. The association between gp91phox and EROS is generally convincing, but for many experiments the authors make wide use of transfections of tagged protein constructs. One can clearly understand that this is possibly the only feasible approach at this time, however these constructs carry the intrinsic problem of possible protein misfolding, which would make them a potentially artificial target of any endoplasmic reticulum chaperone-like protein such as EROS. This would impact exactly on the very mechanism the authors are proposing for EROS effects, i.e., early protein processing.

      We understand Reviewer 3’s concerns about using tagged constructs. However, all transfection experiments depicted in Figure 1 have been done with untagged constructs and in different cell types in both mouse and human systems. The whole approach is also validated by extensive previous work showing the ability of transfected p22phox to augment gp91phox expression (Yu et al., J Biol Chem 1997; PMID: 9341176). All our experiments showed the same result, namely the stabilisation of the 58kDa gp91phox precursor. We have now included data showing we can immunoprecipitate endogenous gp91phox in PLB985 cells and detect endogenous EROS (Figure 3, figure supplement 1A) which confirms the specificity of the association between gp91phox and EROS. In the same sample, we can also detect endogenous p22phox (our positive control) which is well-known to associate as heterodimer with gp91phox. Furthermore, transfection of our constructs does not induce significant ER stress in HEK293 cells. Based on our own data and that of other investigators, we argue that this is a valid and useful approach to demonstrating the ability of EROS to increase gp91phox abundance. Similarly, this is just one of many orthologous techniques used in the manuscript.

      3) The same consideration applies to the experiments in Figure 3 with the OST complex STT3A. The co-localizations shown by the authors are technically acceptable, but their meaning is unclear, given it is expected that the proteins EROS and OST occupy the same compartment, being ER-located proteins, especially if transfected as constructs (tagged or not).

      The experiment has been done to assess the localisation of gp91phox relative to EROS and STT3A which are known to occupy the ER -compartment as pointed by the reviewer. Since HEK293 cells do not express gp91phox, this microscopy analysis allowed to determine if some population of gp91phox could be detected with EROS and STT3A at the ER as opposed to its localization as a mature protein at the plasma membrane and within granules, in phagocytic cells.

      4) It would be important to assess whether cells receiving such constructs depict markers of endoplasmic reticulum stress and/or show impaired survival.

      This has been addressed in Reviewer 3’s recommendation for author point 2.

      5) The experiments with co-transfection in HEK293 cells of EROS, Nox1 and Nox4 provide results at variance with the author's data in their previous work, in which endogenous Nox1 (intestine) and Nox4 (kidney) had no changes in expression in genetically silenced EROS mice.

      We thank the reviewer for this comment and acknowledge that this introduces some ambiguity. In showing the augmentation of NOX1, NOX4 but not p22phox or NOX5 we are demonstrating that it is likely that EROS can bind and stabilise NOX proteins that also require p22phox. In the case of NOX4, this is also supported by our yeast 2 hybrid data. Thus, these data suggest that EROS can bind p22phox-dependent NOX proteins. The key question is whether EROS has a physiological role in controlling the expression of other NOX proteins. Although we addressed this in our previous study, we have done so in a more extensive way in this manuscript. In particular, we note the subsequent publication by Diebold et al. (Methods Mol Biol 2019; PMID: 31172474) which points out that many commercially available antibodies are non-specific. Detailed examination showed this to be the case for the antibody we used in Thomas et al., (J Exp Med, 2017; PMID: 28351984). We therefore undertook specific analysis with the anti-mouse NOX1 antibody clone from Dr C. Yabe-Nishimura and Dr. Misaki Matsumoto.

      Similarly, our work on NOX4 in Thomas et al 2017 (J Exp Med, 2017; PMID: 28351984) suggested that while NOX4 is certainly present in the kidneys of EROS-/- mice, this was a limited analysis as it was not the main focus of the paper, and the conclusion was that there was no drastic effect on NOX4 expression in the same manner as that observed for NOX2. For the revisions to this paper, we examined a cohort of 4 control and 4 EROS-/- mice and showed that EROS does not physiologically regulate NOX4 in the kidney.

      Thus, the use of HEK293, which do not express NOX proteins, as a reductionist system may favour the effect of EROS on NOX1 and NOX4 abundance upon transfection of the constructs. One possible explanation could be that EROS binds to a conserved motifs present on NOX1, NOX2 and NOX4 which is readily accessible in the system we are using.

      6) The article is conceptually divided into two parts. However, there is no clear cross-fertilization between them and they essentially do not integrate.

      Although the reviewer notes that it seems that there are two separate stories, this reflects that we have extensively characterised the function of EROS and found that it specifically and profoundly affects only two distinct pathways in immunity, which is significant in itself. A strength of our manuscript is our extensive granular mass spectrometry approach which shows the specificity of EROS in 2 different cell types in which up to 8000 proteins have been detected. We have therefore placed the control of P2X7 and gp91phox-p22phox in context of the entire proteome. Our paper defines just how specific EROS is in its physiological effects and we therefore focus on the two major pathways that are affected by EROS deficiency. We integrate this in the final figure by showing how the combined lack of gp91phox and P2X7 lead to resistance to influenza A in contrast to the susceptibility to certain bacterial infections.

      7) While the authors claim that "the loss of both ROS and P2X7 signalling leads to resistance to influenza infection", this was not in fact shown in this work. It is known that P2X7 deficiency protects against influenza infection. Thus, it follows naturally that EROS deficiency, which essentially eliminates the expression of P2X7, would have the same effect. However, the role of ROS and gp91phox, i.e. whether or not they add to this equation, remains unclear.

      We thank the reviewer for this comment. The role of phagocyte NADPH oxidase-derived ROS has been explored in gp91phox deficiency and we apologise if this is not made clear in our manuscript. We have now added the following text to the discussion section of the manuscript:

      “A particular strength of our study is that we show marked in vivo sequelae of the lack of P2X7. EROS deficiency leads to profound susceptibility to bacterial infection but protects mice from infection with influenza A. This is likely to reflect the fact that mice that are (i) deficient in gp91phox (ii) deficient in P2X7 (iii) treated with P2X7 inhibitors have improved outcomes following infection with influenza A and raises intriguing questions about the physiological role of EROS. Snelgrove et al showed that gp91phox deficiency improved outcomes in influenza A. gp91phox knockout mice exhibited a reduced influenza titre in the lung parenchyma. Inflammatory infiltrate into the lung parenchyma was markedly reduced and lung function significantly improved (Snelgrove et al., 2006). To et al then showed that the phagocyte NADPH oxidase is activated by single stranded RNA and DNA viruses in endocytic compartments. This causes endosomal hydrogen peroxide generation, which suppresses antiviral and humoral signalling networks via modification of a highly conserved cysteine residue (Cys98) on Toll-like receptor-7. In this study, targeted inhibition of endosomal reactive oxygen species production using cholestanol-conjugated gp91dsTAT (Cgp91ds-TAT) abrogates influenza A virus pathogenicity (To et al., 2017). This group went on to explore infection with a more pathogenic influenza A strain, PR8. Using the same specific inhibitor. Cgp91ds-TAT reduced airway inflammation, including neutrophil influx and alveolitis and enhanced the clearance of lung viral mRNA following PR8 infection (To et al., 2019). This group has also shown that NOX1 (Selemidis et al., 2013) and NOX4 (Hendricks et al., 2022) can drive pathogenic inflammation in influenza A, emphasising the importance of clarifying the roles of EROS in control of expression of these proteins.

      In studies on P2X7, Rosli et al showed that mice infected with 105 PFU of influenza A HKx31 had improved outcomes if they were treated with a P2X7 inhibitor at day 3 post infection and every two days thereafter. Survival was also improved even if the inhibitor is given on day 7 post infection following a lethal dose of the mouse adapted PR8. This was associated with reduced cellular infiltration and pro-inflammatory cytokine secretion in bronchoalveolar lavage fluid, but viral titres were not measured (Rosli et al., 2019). Leyva-Grado et al examined influenza A infection in P2X7 knockout mice. They infected mice with both influenza A/Puerto Rico/08/1934 virus and influenza A/Netherlands/604/2009 H1N1pdm virus. They showed that P2X7 receptor deficiency led to improved survival after infection with both viruses with less weight loss (Leyva-Grado et al., 2017). Production of proinflammatory cytokines and chemokines was impaired and there were fewer cellular hallmarks of severe infection such as infiltration of neutrophils and depletion of CD11b+ macrophages. It is worth noting that the P2X7 knockout strain used in this study was the Pfizer strain in which some splice variants of P2X7 are still expressed (Bartlett et al., 2014). Hence, the dual loss of the phagocyte NADPH oxidase and P2X7 in EROS-/- mice likely confers protection from IAV infection. By reducing the expression of both NOX2 and P2X7, EROS regulates two pathways that may be detrimental in influenza A and we speculate that EROS may physiologically act as a rheostat controlling certain types of immune response.”

    1. Author Response

      Reviewer #1 (Public Review):

      This well-written paper combines a novel method for assaying ubiquitin-proteasome system (UPS) activity with a yeast genetic cross to study genetic variation in this system. Many loci are mapped, and a few genes and causal polymorphism are identified. A connection between UPS variation and protein abundance is made for one gene, demonstrating that variation in this system may affect phenotypic variation.

      The major strength of the study is the power of yeast genetics which makes it possible to dissect quantitative traits down to the nucleotide level. The weakness is that is not clear whether the observed UBS variation matters on any level, however, the claims are suitable to moderate, and generally supported.

      We agree with the reviewer that understanding how causal variants for ubiquitin-proteasome system (UPS) activity affect other molecular, cellular, and organismal phenotypes is an important area of future research.

      The paper provides a nice example of how it is possible to genetically dissect an "endo-phenotype", and learn some new biology. It also represents a welcome attempt to put the function of a mechanism that is heavily studied in molecular cell biology in a broader context.

      We thank the reviewer for these kind words.

      Reviewer #2 (Public Review):

      In this manuscript, the authors developed an elegant quantitative reporter assay to identify quantitative trait loci that regulates N-end rule pathway, a major quality control mechanism in eukaryotes. By crossing two yeast species with divergent proteostasis activity, they generated a population that showed broad variation in proteostasis activity. By sequencing and mapping the underlying loci, they have identified several genes that regulate N-end rule activity. They then verified them using precise genetic tools, validating the power of their approach.

      Overall, it is a very solid manuscript that would be highly interesting for the quality control field.

      In general, I really liked this manuscript for these reasons:

      • Uses fluorescent timers elegantly to quantitatively measure protein degradation.

      • Validates the approach in depth, showing the readers how the tool works.

      • Uses the power of yeast genetics and bulk segregant analysis to map loci that may have small effects.

      • Validates the mapped loci using precise genetic tools.

      In a field that is dominated by biochemistry, this manuscript will be a fresh breath of air…

      We thank the reviewer for their thoughtful evaluation of our work and these kind words.

      Reviewer #3 (Public Review):

      This manuscript, "Variation in Ubiquitin System Genes Creates Substrate-Specific Effects on Proteasomal Protein Degradation" studies the genetic basis of differences in protein degradation. The authors do so by screening natural genetic variation in two yeast strains, finding several genes and often several variants within each gene that can affect protein degradation efficiency by the Ubiquitin-Proteasome system (UPS). Many of these variants have "substrate-specific effects" meaning they only affect the degradation of specific proteins (those with specific degrons). Also, many variants located within the same genes have conflicting effects, some of which are larger than others and can mask others. Overall, this study reveals a complex genetic basis for protein degradation.

      Strengths: Revealing the genetic basis for any complex trait, such as protein degradation, is a major goal of biology. The results of this paper make a significant step towards the goal of mapping the genes and variants involved in this specific trait. Fine mapping methods are used to home in on the specific variants involved and to measure their effects. This is very nicely done and provides a detailed view of the genetic basis of protein degradation. Further, the GFP/RFP system used to quantify the efficiency of the protein degradation system is a very elegant system. Also, the completeness of the analysis, meaning that all 20 N-degrons were studied, is impressive and leads to very detailed findings. It is interesting that some genetic variants have larger and opposite effects on the degradation of different N-degrons.

      We thank the reviewer for these positive comments.

      Weaknesses: Some of the results discussed in this paper are not surprising. For example, the finding that both large effect and small effect genetic variants contribute to this complex trait is not at all surprising. This is true of many complex traits.

      We agree with the reviewer that the number and patterns of QTLs we observe are perhaps not unexpected given that most traits are genetically complex. However, we also note that our results stand in stark contrast to previous efforts to understand how natural genetic variation affects the UPS, which have focused almost exclusively on large-effect mutations in UPS genes that cause rare Mendelian disorders. We have therefore chosen to retain our discussion of the complex genetic architecture of the UPS.

      The discussion of human disease is also a bit extensive given this study was performed on yeast. It might be more productive to use these findings to understand the UPS better on a mechanistic level. Why does the same genetic variant have opposite effects on the degradation of different degrons, even in cases where those degrons are of the same type?

      Following the reviewer’s suggestion we have removed multiple references to human disease from the introduction. We retained paragraph 3 of the introduction (previously, lines 43-55, pg. 2, para. 2 in the revised manuscript), which discusses disease-causing mutations in UPS genes, because the examples presented highlight two important motivations for our work: (1) individual genetic differences create variation in UPS activity and (2) much of our knowledge of how natural genetic variation affects the UPS comes from these rare, limited examples. However, we have re-written the paragraph to focus on these points and removed descriptions of the clinical manifestations of the disorders mentioned.

      We agree with the reviewer that understanding the mechanistic basis of substrate-specific variant effects on distinct N-degrons is important. However, doing so would require additional experiments that we argue are outside the scope of the current study.

      Overall, this manuscript excels at mapping the genetic basis of variation in the UPS system. It demonstrates a very complex mapping from genotype to phenotype that begs for further mechanistic explanation. These results are important to the UPS field because they may help researchers interrogate this highly conserved essential system. The manuscript is weaker when it comes to the broader conclusions drawn about the relative importance of large vs. small effects variants on complex traits, the amount of heritability explained, and the effects of genetic variation on protein abundance vs transcript abundance. Though in the case of protein vs transcript, I feel the cursory examination of the trends is perhaps at an appropriate level for the study, as it is mainly meant to show these things differ rather than to show exactly how and why they differ.

      We state that the distribution of QTL effect sizes for UPS activity consists of many QTLs with small effects and few QTLs of large effects. While this result is similar to patterns observed for other complex traits, it differs dramatically from the results of previous studies of genetic influences on the UPS, which have been largely confined to large-effect variants. Given these differences, we think it is appropriate and worthwhile to emphasize the complex genetic architecture of UPS activity.

      We agree that estimating the fraction of heritability explained by our QTLs and variants would be valuable. However, as noted in our response to Reviewer 1, the QTL mapping method we used does not permit ready calculation of heritability estimates due to its pooled nature.

      The reviewer is correct in noting that the primary goal of our RNA-seq and proteomics experiments was to provide an initial exploration of the effects of causal variants for UPS activity on global gene expression at the protein and mRNA levels. While a comprehensive dissection of the effects of this and other causal variants is an important area of future work, our results here show broad changes in global gene expression and establish that the causal UBR1 variant affects gene expression at the protein and mRNA levels.

      Reviewer #4 (Public Review):

      Overall the paper is clear and well-written. The experimental design is elegant and powerful, and it's a stimulating read. Most QTL mapping has focused on directly measurable phenotypes such as expression or drug response; I really like this paper's distinctive approach of placing bespoke functional assays for a specific molecular mechanism into the classical QTL framework.

      We thank the reviewer for their thoughtful evaluation of the work and positive comments.

    1. Author Response

      Reviewer #2 (Public Review):

      Recent advances in the investigation of functional brain connectivity have allowed the identification of the main connectivity gradient between unimodal to transmodal brain regions. Gao et al. aimed to test whether this connectivity gradient is changing according to task demands and if so, whether this change was also related to the complexity of brain signals evoked by events of various task demands. Their results are three-fold. 1) They first compared the gradient of connectivity obtained during a semantic relatedness judgment task to a purely visual detection task and to a resting state. a) They found that the same main gradient could be extracted from the three conditions, making it suitable for investigating the effect of word relatedness. b) Additionally, they showed that the word relatedness modulates the main gradient: when words are close, the gradient was strengthened, i.e., the dissociation between unimodal and transmodal areas was sharpened. 2) The authors found that the strength of word associations modulates the complexity of brain signals: the closer the words, the more convergent brain signals across participants and trials were, particularly in the transmodal areas of the main gradient. 3) They found that transmodal brain regions in the gradient were similarly activated in participants with similar relatedness judgments. Finally, they tested the link between the three results above using mediation analysis. They showed that the dimensionality difference (result 2) mediated the link between the gradient in the semantic task (result 1a) and the interindividual similarities in semantic judgment and brain activation (result 3). Altogether, this study demonstrates that the main gradient state is predictive of both task variations and inter-individual similarities of task responses. Those results suggest that gradients are a relevant measure of functional connectivity for investigating the variation of connectivity within a task and between individuals. The results overall support conclusions.

      • Strengths:

      1. The main strength of the article is the methods used to obtain the results. Gradients of functional connectivity are a new measure that goes beyond classical brain network functional connectivity. Investigating the dynamics of gradients during a semantic task allows us to better understand how different brain regions (unimodal, transmodal, belonging to some specific networks, etc.) adapt to variability in a task.

      The second strength is the topic: the question is relevant to researchers interested in semantic memory or processing and to any researcher interested in brain dynamics within and between individuals. The demonstration is elegant, and the behavioral task is simple; it compensates for the complexity of the methods.

      • Weaknesses:

      1. The main weakness of the article is the lack of details about the performed analyses, which prevents a clear understanding of the results. The complexity of their methods calls for a crystal-clear description of them. The reader is not informed about how statistics are computed. New terms are sometimes used to describe already mentioned results, making reading the article particularly difficult.

      Thanks very much for the suggestions on statistics. We have now significantly updated our manuscript, please see our detailed reply to Essential Revision.

      1. Conceptually, the authors assumed that during the task, participants generated a word linking the pair of words displayed on the screen and that the neural and cognitive processes solely vary along with the distance between the two words of the pair. However, when words are close, it is not obvious that individuals will generate a third word to link them, and it might be even more challenging to find a linking word in that case as opposed to when words are quite distant from each other. Considering those potential confounds, the interpretation of the results could be different. The authors always contrast very high versus very low distance, then the observed results could also be interpreted as: "observing a link" versus "generating a word link", the first scenario is much more cognitively simple, and this could also explain the differences they observed.

      Sorry that we did not explain our task instruction clearly in our initial submission. The participants were not instructed to generate a linking word specifically and the link was typically expressed in multiple words and could involve imagery as well as words. For this reason, we are not sure that a simple recognition/generation distinction will capture the different neural effects that relate to high and low associations. However, the text now acknowledges that multiple cognitive processes could contribute to the differences we observe, including recognition vs. generation, more automatic retrieval vs. more controlled retrieval, and processes associated with creativity. We have acknowledged multiple ways that the neural patterns could be interpreted in the discussion. Please see page 29.

      ‘Though our results are in line with controlled semantic cognition framework in general, while multiple cognitive processes could contribute to the differences that relate to strong and weak associations we observe, including observing vs. generating semantic links, more automatic retrieval vs. more controlled retrieval, imagery, and processes associated with creativity.’

      Reviewer #3 (Public Review):

      With resting-state fMRI data, recent work has mapped the organisation of the cortex along a continuous gradient, and regions that share similar patterns of functional connectivity are located at similar points on the gradient (Margulies et al., 2016). In the current study, the authors investigate how this dimension of connectivity changes during conceptual retrieval with different levels of semantic association strength. Specifically, they perform gradient analysis on task-fMRI informational connectivity data and reveal a similar principal gradient to the previous study, which captures the separation of heteromodal memory regions from the unimodal cortex. More importantly, by comparing the gradient generated with data from different experimental conditions (i.e., strong vs. weak association), the authors find the separation of the regions at the two ends of the gradient can be regulated by the association strength, with more separation for stronger association. They also examine the relationships between the gradient values and dimensionality and brain-semantic alignment measures, to explore the nature of this shifting gradient as well as the corresponding brain areas.

      Strengths:

      1. The aim of this study is clear and the relevant background literature is covered at an appropriate level of detail. With the cortical gradient analysis approach, this study has the potential to make a contribution to the understanding of the topographical neural basis of semantics in a fine-grained manner.

      2. The methodology in the current study is novel. This study validates the feasibility of performing gradient analysis on task-fMRI data, which is enlightening for future research. Using the number of PCs generated by PCA as a measure of dimensionality is also an interesting approach.

      3. The authors have conducted multiple control analyses, which tested the validity of their results. Specifically, a control task without engaging semantic processing was built in the experimental design (i.e., the chevron task), and the authors conducted multiple parallel control analyses with the data from this control task as a comparison with their main results. Other control analyses were also performed to validate the robustness of their methodological choices. For example, varied thresholds were used during the calculation of dimensionality and similar results were obtained.

      Weaknesses:

      1. As a major manipulation in the experiment, it is not very clear how the authors split/define their stimuli into strong and weak semantic association conditions. If I understood correctly, word2vec was used to measure the association strength in each pair of words. Then the authors grouped the top 1/3 association strength trials as a "strong association" condition and the bottom 1/3 as "weak association" (Line 689), and all analyses comparing the effect of "strong vs. weak association" were conducted with data from these two subsets of stimuli. However, in multiple places, the authors indicate the association strength of their stimuli ranges from completely unrelated to weakly related to highly related (Line 612, Line 147, Line 690, and the examples in Figure 1B). This makes me wonder if the trials with bottom 1/3 association strength (i.e., those were used in the current study) are actually "unrelated/no association" trials (more like a baseline condition), instead of "weak association" trials as the authors claimed. These two situations could be different regarding how they engage semantic knowledge and control processing. Besides, I am very interested in what will the authors find if they compare all three conditions (i.e., unrelated vs. weak association vs. strong association).

      Thanks very much for bringing up this point. We have conducted additional analysis for the intermediary bin and compare it against the bottom for the gradient analysis and against the top 1/3 for the dimensionality analysis (compared to the baseline condition for each analysis), which did show a similar patten like the contrast between strong and weak association but with a smaller effect, thus representing an intermediary profile as expected. The correlation between the principle gradient difference between middle and weak association with the principle gradient value derived from resting state was also significant, see Figure S10C, but its magnitude was smaller than what we reported in the main body of manuscript (r = 0.235 vs. r = 0.369). Given that the expected strongest effect is between top and bottom 1/3, thus, we have now included these results in the supplementary materials. Please see Figure S10 in page 7.

      1. Following the previous point, because the comparison between weak vs. strong association conditions is the key of the current study, I feel it might be better to introduce more about the stimuli in these two conditions. Specifically, the authors only suggested the word pairs fell in these two conditions varied in their association strength, but how about other psycholinguistic properties that could potentially confound their manipulation? For example, words with higher frequency and concreteness may engage more automatic/richer long-term semantic information and words with lower frequency and concreteness need more semantic control. I feel there may be a possibility that the effect of semantic association was partly driven by the differences in these measures in different conditions.

      Thanks for raising this point. We have performed additional control analysis to examine the relationship between association strength and psycholinguistic features according to the reviewer’s suggestion. The association strength did not show significant correlation with word frequency (r = -0.010, p = 0.392), concreteness (r = -0.092, p = 0.285) or imageability (r = 0.074, p = 0.377). Direction comparison of these psycholinguistic features between strongly and weakly associated word-pairs also did not any significant difference: frequency (t = 0.912, p = 0.364), concreteness (t = 1.576, p = 0.119), imageability (t = 1.451, p = 0.153). Please see in page 32:

      ‘The association strength did not show significant correlation with word frequency (r = -0.010, p = 0.392), concreteness (r = -0.092, p = 0.285) or imageability (r = 0.074, p = 0.377).’

      1. The dimensionality analysis in the current study is novel and interesting. In this section, the authors linked decreasing dimensionality with more abstract and less variable representations. However, most results here were built based on the comparison between the dimensionality effects for strong and weak association conditions. I wonder if these conclusions can be generalised to results within each condition and across different regions (i.e., regions having lower dimensionality are doing more abstract and cross-modal processing). If so, I am curious why the ATL (a semantic "hub") in Figure 3A has higher dimensionality than the sensory-motor cortices (quite experiences related) and AG (another semantic "hub").

      The dimensionality and its relationship to the cortical gradient was also examined for each condition. We assessed whether this relationship was influenced by associative strength, averaging dimensionality estimates for sets of four trials with similar word2vec values using a ‘sliding window’ approach. There was a negative correlation between overall dimensionality (averaged across all trials) and principal gradient. And the magnitude of this negative relationship increases as a function of the association strength. So, we believe our conclusion could be generalized across conditions. In our results, we observed higher dimensionality in ATL/frontal orbital cortex than sensory-motor cortices, which seems contradictory to our conclusion. However, these areas are subject to severe distortion and signal loss in functional MRI, the lower tSNR, thus, caused higher dimensionality estimation in PCA. Therefore, we conducted a control analysis in which regions in limbic network were removed due to their low tSNR, while this pattern remained significant (r = -0.346, p = 0.038).

      Please see in Discussion part in page 30.

      ‘It is worth noting that not all brain regions showed the expected pattern in the dimensionality analysis – especially when considering the global dimensionality of all semantic trials, as opposed to the influence of strength of association in the semantic task. In particular, the limbic network, including regions of ventral ATL thought to support a heteromodal semantic hub, showed significantly higher dimensionality than sensory-motor areas – these higher-order regions are expected to show lower dimensionality corresponding to more abstract representations. However, this analysis does not assess the psychological significance of data dimensionality differences (unlike our contrast of strong and weak associations, which are more interpretable in terms of semantic cognition). Limbic regions are subject to severe distortion and signal loss in functional MRI, which might strongly influence this metric. Future studies using data acquisition and analysis techniques that are less susceptible to this problem are required to fully characterize global dimensionality and its relation to the principal gradient.’

      1. I am not sure about the meaning/representational content underlying the semantic similarity matrix in the semantic-brain alignment analysis. According to the authors, this matrix was built based on the correlation of participants' ratings of associative strength (0, no link; 1~4, weak to strong) across trials. The authors indicate that this matrix reflects the global similarity of semantic knowledge between participants (Line 403). However, even though two participants share very similar ratings of association strength across trials, they could still interpret the meaning/knowledge underlying the associations very differently. For example, one participant may interpret the link between "man" and "car" as a man owns a car but another participant may interpret it as a man is hit by a car, although both associations could be rated as strong for this trial. This situation may be even more obvious for those pairs with weak association. Therefore, I am not confident this is a measure of similarity of semantic knowledge.

      Thanks very much for bring up this point. Our experimenter carefully evaluated the links generated for each trial in each participant and found that the weaker association the less consistent their link being formed was. So, we agreed with the reviewer that even when two participants share similar ratings of association strength, they could still interpret those word pairs significantly different, especially for those weakly associated trials. Despite the retrieval content/meaning might be different, i.e. a man owns a car or a man is hit by car, both scenarios are quite consistent and without strong semantic conflict being detected. Therefore, we argued that the semantic-brain alignment might reflect the similarity of neural states of retrieval rather than general semantic content. We have now updated this point in the manuscript. Please see on page 20. ‘A semantic similarity matrix, based on the correlation of participants’ ratings of associative strength across trials (reflecting the global similarity of neural states of retrieval between participants; left-hand panel of Figure 4A), was positively associated with neural pattern similarity in inferior frontal gyrus, posterior middle temporal gyrus, right anterior temporal lobe, bilateral lateral and medial parietal cortex, pre-supplementary motor area, and middle and superior frontal cortex (right-hand panel of Figure 4A).’

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Vides et al. performed a functional analysis of the Parkinson's disease-associated leucine-rich repeat kinase 2 (LRRK2). In particular, the authors sought to address how membrane recruitment of LRRK2 leads to an increase in its kinase activity. Briefly, the authors showed that LRRK2 utilizes two distinct binding sites (350-550 #1, 17/18 #2) for Rab GTPases within its N-terminal Armadillo domain to achieve membrane association. Intriguingly, these two sites differ substantially in their preference for binding phosphorylated (Rab8a, Rab10) and non-phosphorylated (Rab8a, Rab10, Rab29, Rab32, Rab39) substrates. In cells, a LRRK2 site #2 mutant showed a significantly reduced colocalization with phosphorylated Rab10. Using LRRK2 inhibitor washout experiments, the authors demonstrate that disrupting site #2 led to slower re-phosphorylation kinetics. Lastly, the authors employed an elegant in vitro system to demonstrate that LRRK2 membrane association and Rab phosphorylation are coupled in a feed-forward reaction. Overall, the work of Vides and colleagues provide compelling mechanistic insights into the spatial regulation of LRRK2.

      Nevertheless, a few critical points remain.

      Major points:

      1) Since LRRK2 is reported to form dimers and multimers, the authors should perform their colocalization studies (Figure 6) in cells lacking endogenous LRRK2.

      Co-localization with wild type LRRK2 is not seen with the mutant in question, so dimerization/oligomerization with endogenous protein appears not to be an issue for this construct.

      2) To what extent does modification of K17 and/or K18 (e.g., acetylation or ubiquitylation) play a role in regulating LRRK2 pRab binding?

      Phosphosite indicates LRRK2 ubiquitylation at K1118, K1129, K1833, K1963, K2091, with none in the ARM domain. We have not looked at either acetylation or ubiquitylation directly but now mention that this could regulate interaction with pRabs.

      3) In their lipid bilayer-based in vitro assay, the authors should also examine the effect of an LRRK2 variant that lacks site #1.

      We have included the opposite mutant with similar impact on the model: we show that lack of pRab binding site at the N-terminus removes the cooperativity of the otherwise wild type protein.

      Reviewer #2 (Public Review):

      Vides and colleagues describe a novel feed-forward mechanism of LRRK2-mediated phosphorylation of Rab8a and Rab10. The work underlies the importance of the N-terminal armadillo domain in the binding of different Rabs. They further characterized the Rab29 binding epitope, which is involved in the membrane targeting of LRRK2 mediated by Rab29 (site #1). Beyond previous work, the authors could demonstrate that one point mutation (K499E) is sufficient to abolish Rab29 binding. Furthermore, they could show that this binding site also binds the substrate Rabs Rab8a and Rab10. In addition to this binding site (#1), the authors identified one additional site (site #2) particularly involved in the specific binding of Rab8a and Rab10 but not of Rab29 nor the non-LRRK2 substrate Rab7, providing an explanation for the LRRK2 substrate specificity observed in vivo. While the Rab29 binding site bind nonphosphorylated Rabs, the newly identified site around the N-terminal Lysine 18 shows increased binding to phosphorylated Rab and provides support for a feed-forward mechanism in the substrate phosphorylation.

      The authors provide a sound biochemical characterization of critical steps of LRRK2 activation, which is of broad interest to the field. Beyond scientific interest, a well- characterized activation mechanism might guide future drug development strategies.

      We thank the reviewer for noting that we should document the bound nucleotide identity. Rab8 and Rab10 are not the easiest to work with–much harder than other Rabs to retain full nucleotide exchange capacity–preps show at best, 50% active molecules in terms of ability to exchange nucleotide. We maintain Mg-GTP throughout all purification steps and assays and use Q mutants in vitro to stabilize GTP binding. Even so, we now monitored the nucleotide state of purified Rabs by mass spec and found that our routine preps of Rab8A-Q and Rab10-Q each show a 50:50 ratio of bound GTP to GDP. We have noted this caveat in the text –our work will underestimate affinities since GTP-bound forms likely predominate in these interactions.

      Major concerns:

      • The nucleotide states of the different Rabs (after nucleotide exchange), need to be experimentally confirmed, i.e. by HPLC.

      • It is not always clear, which Rab variants (i.e. WT or Q63L) have been used for a particular experiment (information provided in the main text vs material and methods). While irrelevant for in vitro experiments, for studies in cells it should be considered that the use of Rab Q63L constructs (Q60L in Ras), does not necessarily imply that the GAP catalyzed GTP hydrolysis is completely abolished. In contrast to Ras GAPs, some RAB GAPs can provide the water-coordinating glutamine residue, critical for hydrolysis (see: Müller and Goody, 2018; PMID: 28055292).

      All studies within cells were done with endogenous Rab GTPases (WT). We have also clarified the text throughout as to which Rab form is used.

      Reviewer #3 (Public Review):

      Vide et al. present new insights into the interactions between LRRK2 and Rab GTPases. They identified two distinct Rab-binding sites in the N-terminal Armadillo (ARM) domain of LRRK2, which they named Site #1 and Site #2. One of the main findings is the striking effect of Rab GTPase phosphorylation on LRRK2's recruitment to and activation on membranes; both unmodified and phosphorylated Rabs (pRab) bind to the N-terminus of LRRK2, but to different regions. Site #1, located closer to the C-terminus of the ARM domain, binds unmodified Rab8A, Rab10, and Rab29, with Rab29 showing the highest affinity. Site #2, located at the extreme N-terminus of LRRK2, binds to the modified pRab8A and pRab10. Combining structure prediction and conservation analysis they identified the potential interaction interfaces of Site #1 and Site #2, including two conserved lysine residues (K17 and K18) in Site #2 that are critical for pRab binding. The authors propose a model where initial membrane association is mediated by binding unphosphorylated Rab8A, 10, or 29 to the lower-affinity Site #1. Membrane-associated LRRK2 then phosphorylates one of its substrates, which can now engage the higher-affinity Site #2, starting a cascade of phosphorylation events (the feed-forward mechanism).

      Overall, the authors present clear and convincing data showing the interaction between LRRK2's Nterminal ARM domain and Rab/pRab, and supporting their feed-forward mechanism. The main shortcoming in the manuscript is the absence of data directly addressing two important features of their feed-forward model: (1) The proposal that the increased activity of LRRK2 upon recruitment to membranes is only the result of its increased local concentration (without any contributions from a potential Rab-dependent activation); and (2) The ability of LRRK2 to simultaneously bind Rab and pRab. Despite this shortcoming, this manuscript presents an important contribution to our understanding of LRRK2 function, providing an elegant model for LRRK2's recruitment to and activation on membranes. This paper will be of much interest to a broad readership.

      We have fully addressed the “shortcoming”: we now demonstrate that phosphoRab10 can bind LRRK2 Armadillo domain simultaneously with Rab8 and also that pRab8 can activate kinase activity on Rab10. We thank the reviewer for these terrific suggestions.

    1. Author Response

      Reviewer #2 (Public Review):

      This study evaluates the causal relationship between childhood obesity on the one hand, and childhood emotional and behavioral problems on the other. It applies Mendelian Randomization (MR), a family of methods in statistical genetics that uses genetic markers to break the symmetry between correlated traits, allowing inference of causation rather than mere correlation. The authors argue convincingly that previous studies of these traits, both those using non-genetic observational epidemiology methods and those using standard MR methods, may be confounded by demographic effects and familial effects. One possible example of this kind of confounding is that the idea that obesity in parents may contribute to emotional and behavioral problems in children; another is the idea that adults with emotional and behavioral issues may be more likely to have children with partners who are obese, and vice-versa. They then make use of a recently proposed "within-family" MR method, which should effectively control for these confounders, at the cost of higher uncertainty in the estimated effect size, and therefore lower power to detect small effects. They report that none of the previously reported associations of childhood BMI with anxiety, depression, or ADHD are replicated using the within-family MR method, and that in the case of depression the primary association appears to be with maternal BMI rather than the child's own BMI.

      This argument that these confounders may affect these phenotypes is fairly sound, and within-family MR should indeed do a good job of controlling for them. I do not see any major issues with the cohort itself or the choice of genetic instruments. I also do not see any major issues with the definitions or ascertainment of the phenotypes studied, though I am not an expert on any of these phenotypes in particular. I am especially satisfied with the series of analyses demonstrating that the results are robust to many variations of MR methodology. Overall, I think the positive result this study reports is very credible: that the known association between childhood BMI and depression is likely primarily due to an effect of maternal BMI rather than the child's own BMI (though given that paternal BMI has a similar effect size with only a slightly wider confidence interval, I would instead say that the effect is from parental BMI generally, not specifically maternal.)

      In the updated results based on the larger genetic data release, the estimates for the association of maternal BMI and paternal BMI with the child’s depressive symptoms are more clearly different than they were in the smaller dataset (for maternal BMI, beta= 0.11, CI:0.02,0.19, p=0.01; for paternal BMI, beta=0.02, CI:-0.09,0.12, p=0.71). Therefore, in this version, it makes sense to note an association with maternal BMI specifically.

      The main weakness of the study comes from its negative results, which the authors emphasize as their primary conclusion: that previously reported associations of childhood BMI with anxiety, depression, and ADHD are not replicated using within-family MR methods. These claims do not seem justified by the evidence presented in this study. In fact, in every panel of figures 2 and 3, the error bars for the within-family MR analysis encompass the estimates for both the regression analysis and the traditional MR analysis, suggesting that the within-family analysis provides no evidence one way or another about which of these analyses is more accurate. More generally, in order to convincingly claim that there is no causal relationship between two traits, an MR study must argue that the study would be powered to detect a relationship if one existed. Within-family MR methods are known to have less power to detect associations and less precision to estimate effect sizes than traditional MR methods or traditional observational epidemiology methods, so it is not sufficient to show that these other methods have power to detect the association. To make this kind of claim, it is necessary to include some kind of power analysis, such as a simulation study or analytic power calculations, and likely also a positive control to show that this method does have power to detect known effects in this cohort.

      We agree that it is imperative that negative (i.e. “non-significant”) results are correctly interpreted - it is just as important to discover what is unlikely to affect emotional and behavioural outcomes as what does affect them. Negative results (non-significant estimates) are neither a weakness nor strength of the study, but simply reflect the estimation error in our analysis of the data. The key question is whether our within-family MR estimates are sufficiently powered to detect effect sizes of interest or rule out clinically meaningful effect sizes – or are they simply too imprecise to draw any conclusions? As the reviewer suggests, one way to address this is via a post-hoc power calculation. We consider post-hoc power calculations redundant, since all the information about the power of our analysis is reflected in the standard errors and reported confidence intervals. Moreover, any post-hoc power calculation will be necessarily approximate compared to using the standard errors and confidence intervals which we report.

      Despite these methodological reservations, we have conducted simulations to estimate the power of our within-family models (the R code is included at the end of this document). These simulations indicate that we do have sufficient power to detect the size of effects seen for depressive symptoms and ADHD in models using the adult BMI PGS. They also indicate that we cannot rule out smaller effects for non-significant associations (e.g., for the impact of the child’s BMI on anxiety). Naturally, this is entirely consistent with the width of the confidence intervals reported in results tables and in Figures 1 and 2. However, although power calculations are important when planning a study, they make little contribution to interpretation once a study has been conducted and confidence intervals are available (e.g., https://psyarxiv.com/tcqrn/). For this reason, we comment on these simulations in this response to reviewers but do not include them in the manuscript or supplementary materials. At the same time, we have changed the language used in the manuscript to be clearer that the results were imprecise and that values contained within the confidence limits cannot be ruled out.

      For example, the discussion now includes the following:

      ‘However, within-family MR estimates using the childhood body size PGS are still consistent with small effects of the child’s BMI on all outcomes, with upper confidence limits around a 0.2 standard-deviation increase in the outcome per 5kg/m2 increase in BMI.’

      And the conclusion of the paper now reads:

      ‘Our results suggest that genetic variation associated with BMI in adulthood affects a child’s depressive and ADHD symptoms, but genetic variation associated with recalled childhood body size does not substantially affect these outcomes. There was little evidence that BMI affects anxiety. However, our estimates were imprecise, and these differences may be due to estimation error. There was little evidence that parental BMI affects a child’s ADHD or anxiety symptoms, but factors associated with maternal BMI may independently influence a child’s depressive symptoms. Genetic studies using unrelated individuals, or polygenic scores for adult BMI, may have overestimated the causal effects of a child’s own BMI.’

      Regarding a positive control: for analyses of BMI in adults, suitable positive controls would include directly measured biomarkers such as fat mass or blood pressure or reported medical outcomes like type 2 diabetes. In adolescents and younger adults, age at menarche or other measures of puberty can be used, as these are reliably influenced by BMI. However, the age of the participants for whom within-family effects are being estimated (8 years), together with the lack of any biomarkers such as fat mass (due to the questionnaire-based survey design) mean no suitable measures are available.

      Reviewer #3 (Public Review):

      Higher BMI in childhood is correlated with behavioral problems (e.g. depression and ADHD) and some studies have shown that this relationship may be causal using Mendelian Randomization (MR). However, traditional MR is susceptible to bias due to population stratification, assortative mating, and indirect effects (dynastic effects). To address this issue, Hughes et al. use within-family MR, which should be immune to the above-listed problems. They were unable to find a causal relationship between children's BMI and depression, anxiety, or ADHD. They do, however, report a causal effect of mother's BMI on depression in their children. They conclude that the causal effect of children's BMI on behavioral phenotypes such as depression and anxiety, if present, is very small, and may have been overestimated in previous studies. The analyses have been carried out carefully in a large sample and the paper is presented clearly. Overall, their assertions are justified but given that the conclusions mostly rest on an absence of an effect, I would like to see more discussion on statistical power.

      1) The authors show that the estimates of within-family MR are imprecise. It would be helpful to know how much power they have for estimating effect sizes reported previously given their sample size.

      As discussed in response to a comment from reviewer 2, the power of our results is already indicated by our standard errors and confidence intervals. Nevertheless, we conducted simulations to estimate the size of effects which we had 80% power to detect. Results, presented below, are consistent with our main results. As discussed in response to a comment from reviewer 2, we consider post-hoc power calculations redundant when standard errors and confidence intervals are reported; for this reason, we include this information in the response to reviewers but not the manuscript itself.

      2) They used the correlation between PGS and BMI to support the assertion that the former is a strong instrument. Were the reported correlations calculated across all individuals? Since we know that stratification, assortative mating, and indirect effects can inflate these correlations, perhaps a more unbiased estimate would be the proportion of children's BMI variance explained by their PGS conditioned on the parents' PGS. This should also be the estimate used in power calculations.

      The manuscript has been updated to quote Sanderson-Windmeijer conditional R2 values: the proportion of BMI variance explained by the BMI PGS for each member of a trio, conditional on the PGS of the other members of the trio, and all genetic covariates included in within-family models. Similarly, we now show Sanderson-Windmeijer conditional F-statistics for a model including the child, mother, and father’s BMI instrumented by the child, mother, and father’s PGS.

      3) In testing the association of mothers' and fathers' BMI with children's symptoms, the authors used a multivariable linear regression conditioning on the child's own BMI. Was the other parent's BMI (either by itself or using the polygenic score) included as a covariate in the multivariable and MR models? This was not entirely clear from the text or from Fig. 2. I suspect that if there were assortative mating on BMI in the parent's generation, the effect of any one parent's BMI on the child's symptoms might be inflated unless the other parent's BMI was included as a covariate (assuming both mother's and father's BMI affect the child's symptoms).

      Non-genetic models include both the mother and father’s phenotypic BMI as well as the child’s, allowing estimation of conditional effects of all three. This controls for assortative mating as noted by the reviewer. This was not previously clear - all relevant text and figure captions have been updated to clarify this.

      4) They report no evidence of cross-trait assortative mating in the parents generation. The power to detect cross-trait assortative mating in the parents' generation using PGS would depend on the actual strength of assortative mating and the respective proportions of trait variance explained by PGS. Could the authors provide an estimate of the power for this test in their sample?

      We have updated the discussion of assortative mating (in both the results and the discussion section) to note possible limitations of power and clarify that that this approach to examining assortment may not capture its full extent.

      The relevant part of the results section now reads:

      “In the parents’ generation, phenotypes were associated within parental pairs, consistent with assortative mating on these traits (Appendix 1 – Table 5). Adjusted for ancestry and other genetic covariates, maternal and paternal BMI were positively associated (beta: 0.23, 95%CI: 0.22,0.25, p<0.001), as were maternal and paternal depressive symptoms (beta: 0.18, 95%CI: 0.16,0.20, p<0.001), and maternal and paternal ADHD symptoms (beta: 0.11, 95%CI: 0.09,0.13, p<0.001). Consistent with cross-trait assortative mating, there was an association of mother’s BMI with father’s ADHD symptoms (beta: 0.03, 95%CI: 0.02,0.05, p<0.001) and mother’s ADHD symptoms with father’s depressive symptoms (beta: 0.05,95%CI: 0.05,0.06, p<0.001). Phenotypic associations can reflect the influence of one partner on another as well as selection into partnerships, but regression models of paternal polygenic scores on maternal polygenic scores also pointed to a degree of assortative mating. Adjusted for ancestry and genotyping covariates, there were small associations between parents’ BMI polygenic scores (beta: 0.01, 95%CI: 0.00,0.02, p=0.02 for the adult BMI PGS, and beta: 0.01, 95%CI: 0.00,0.02, p=0.008 for the childhood body size PGS), and of the mother’s childhood body size PGS with the father’s ADHD PGS (beta: 0.01, 95%CI: 0.00,0.02, p=0.03). We did not detect associations with pairs of other polygenic scores, which may be due to insufficient statistical power.”

      And the relevant part of the discussion section now reads:

      “We found some genomic evidence of assortative mating for BMI, and cross-trait assortative mating between BMI and ADHD, but not between other traits. However, associations between polygenic scores, which only capture some of the genetic variation associated with these phenotypes, may not capture the full extent of genetic assortment on these traits.”

      5) Are the actual phenotypes (BMI, depression or ADHD) correlated between the parents? If so, would this not suffice as evidence of cross-trait assortative mating? It is known that the genetic correlation between parents as a result of assortative mating is a function of the correlation in their phenotypes and the heritabilities underlying the two traits (e.g., see Yengo and Visscher 2018). An alternative way to estimate the genetic correlation between parents without using PGS (which is noisy and therefore underpowered) would be to use the phenotypic correlation and heritability estimated using GREML or LDSC. Perhaps this is outside the scope of the paper but I would like to hear the author's thoughts on this.

      Associations between maternal and paternal phenotypes are consistent with a degree of assortative mating (shown below). These results have added to Appendix 1 - Table 5, which also shows associations between maternal and paternal polygenic scores, and methods and results updated accordingly (see quoted text in response to the comment above). For comparability, both sets of results are based on regression models adjusting for the mother’s and father’s ancestry PCs and genotyping covariates. We agree that analysis of assortative mating using GREML or LDSC is out of scope for this paper. As noted above, we have updated the discussion to acknowledge the limitations of the approach taken:

      ‘We found some genomic evidence of assortative mating for BMI, and cross-trait assortative mating between BMI and ADHD, but not between other traits. However, associations between polygenic scores, which only capture some of the genetic variation associated with these phenotypes, may not capture the full extent of genetic assortment on these traits.’

      6) It would be helpful to include power calculations for the MR-Egger intercept estimates.

      As with our response to the comments above, post-hoc power calculations are redundant, as all the information about the power of our analysis, including the MR-Egger is indicated by the standard errors and confidence intervals. MR-Egger is less precise than other estimators, as is made clear from the wide confidence intervals reported in the relevant tables (Appendix 1 - Tables 8 and 9). However, we have now updated the discussion to give more weight to this as a limitation. The discussion of pleiotropy in the final paragraph of the discussion now reads:

      ‘While robustness checks found little evidence of pleiotropy, these methods rely on assumptions. Moreover, MR-Egger is known to give imprecise estimates (Burgess and Thompson 2017), and confidence intervals from MR-Egger models were wide. Thus, pleiotropy cannot be ruled out.’

      Similarly, we have updated the relevant line of the results section, which now reads:

      ‘MR-Egger models found little evidence of horizontal pleiotropy, although MR-Egger estimates were imprecise (Appendix 1 - Tables 8 and 9).’

      7) Finally, what is the correlation between PGS and genetic PCs/geography in their sample? A correlation might provide evidence to support the point that classic MR effects are inflated due to stratification.

      Figures presenting the association of the child’s BMI polygenic scores and their PCs have been added to the supplementary information as Appendix 1 - Figure 2 and Appendix 1 - Figure 3. Consistent with an influence of residual stratification, a regression of the child’s BMI polygenic scores against their ancestry PCs (adjusting for genotyping centre and chip) found that 7 of the 20 PCs were associated at p<0.05 with the adult BMI PGS, and 8 of 20 with the childhood body size PGS (under the null hypothesis, we would expect one association in each case). When parental polygenic scores were added to the models, these associations attenuated towards to null.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript shows that bone is resorbed during the early steps of limb regeneration in urodeles, and osteoclasts are required for this process. In case of impaired resorption, integration of newly-formed tissue with the original bone shaft is compromised. The manuscript further shows that wound epithelium is required for bone resorption and suggests that it induces osteoclastogenesis or migration of osteoclasts. Furthermore, the authors showed that the formation of novel skeletal elements is initiated while the resorption of the old one is still actively ongoing.

      The study is well designed, conclusions are relatively well supported, and data are presented in a clear way. Two new models of transgenic axolotls have been created. The strongest and most important finding is that partial bone resorption is required for tissue reintegration. My main concern is the novelty of this study, which is quite limited in my opinion.

      Specifically, resorption of bone stump during limb regeneration has been shown before in various model organisms.

      The role of osteoclasts in this process has not been well characterized in urodeles but has been shown during the regeneration of a mouse digit.

      It is reasonable to anticipate that similarly, osteoclasts are resorbing bone in salamanders, especially since this is the only cell type known for bone resorption.

      Thus, this observation, despite being nicely and thoroughly done, is of limited interest.

      The role of wound epithelium in bone histolysis is well demonstrated via skin flap experiments in this manuscript. However, upon skin flap surgery no limb regeneration occurs, implying wound epithelium is a key tissue triggering all the processes of limb regeneration. Accordingly, the absence of bone histolysis in such conditions can be secondary to the absence of any other part of the regenerative process, e.g., blastema formation, macrophage M1 to M2 transition, reinnervation, etc. The proposed link between wound epithelium and osteoclastogenesis (i.e., Sphk1, Ccl4, Mdka) is very superficial and very suggestive.

      No functional evidence was provided to confirm these connections. Finally, the authors showed that new bone formation occurs while resorption of the bone stump is still ongoing. This is a nice observation, but again, rather indirect as it is based on the dynamics of bone resorption and bone formation in different animals. Due to high variability among animals, direct evidence, like double staining for osteoclasts and blastema markers would address this point more precisely.

      We consider that our work provides evidence, for the first time, that skeletal resorption in early stages of regeneration has a durable impact by affecting tissue integration. We show that this process occurs in a short and conserved time, which provides a window of interest for comparative research with other models, and interventional therapies. To our knowledge, limb regeneration is studied mainly in amphibians, as they are the only established lab model with this ability. Some lizards, geckos and possibly iguanas, have been reported to regrow an appendage albeit lacking the regenerative fidelity amphibians have. In an established regeneration lab model, such as the axolotl, the study of regeneration-induced resorption has been scarce.

      During murine digit tip, osteoclasts are recruited to the amputation site and resorb the bone in a similar time frame as we show here in the axolotl. Ablating osteoclasts delays the regeneration time, however, no study has been conducted on the impact of tissue integration. Additionally, a key difference between mouse digit and adult axolotl limb regeneration is that the new skeletal elements are built fundamentally different: direct ossification (bone on top of bone) in mouse, versus endochondral ossification (cartilage on top of osteo-cartilage elements) in the axolotl limb. The tissue integration of the latter may present different challenges worth exploring to understand its regulation. What this work adds, is a characterization of the temporal and cellular dynamic of regeneration-induced resorption, the interaction of osteoclasts with skeletal cells and lastly, the impact on tissue integration.

      Based on previous studies in mammals, it is reasonable to anticipate the presence and role of osteoclasts in salamanders. However, the growing body of work in the field, as well as our own work in the axolotl, have shown that extrapolations of mammalian skeletal biology to other species come with their risks.

      We agree that the role of the wound epithelium (WE) in skeletal histolysis will require further and extensive work. The evidence shown here, provides a glimpse of the complex response and crosstalk of the WE with the tissue underneath, and we hypothesize this response is tailored to the tissue composition exposed during the injury.

      Finally, following the reviewer’s advice, we have conducted new experiments to prove the temporal connection between skeletal resorption and regeneration, showing that these processes occur simultaneously.

      Reviewer #3 (Public Review):

      This study outlines the role of osteoclast-mediated resorption in integrating the skeletal elements during limb regeneration, using axolotls that can regenerate the entire limb upon amputation. Using calcium-binding vital dyes (calcein and alizarin red), the authors first demonstrated that a large portion of amputated skeletal elements is resorbed prior to blastema formation. They further show that 1) inhibiting bone resorption by zoledronic acid impairs proper integration of the pre-existing and regenerating skeletal elements, 2) removing the wound epithelium using the full skin flap surgery inhibits bone resorption, and 3) bone resorption and blastema formation are correlated. The authors reached the major conclusion that bone resorption is essential for successful skeletal regeneration. Notably, this study applies a well-established and elegant axolotl limb regeneration model and transgenic reporter strains to reveal the potential roles of resorption in limb regeneration.

      Strengths:

      1. The authors utilized a well-established axolotl limb regeneration model and applied elegant vital mineral dyes and transgenic reporter lines for sequential in vivo imaging. The authors also provided quantitative assessment by examining multiple animals, particularly in the early sections, ensuring the rigor and the reproducibility of the study.

      2. The authors further performed important interventions that can impinge upon successful limb regeneration, including inhibition of bone resorption by zoledronic acid and impairment of the wound epithelium by full skin flap surgery. These procedures gave rise to useful insights into the relationship between bone resorption and successful limb regeneration.

      3. The imaging presented in this manuscript is of exceptionally high quality.

      Weaknesses:

      1. Despite the high quality of the work, many analyses in this study are incomplete, making it insufficient to support the major conclusion. For example, in Figure 4, the authors did not provide any quantitative assessment to show how zol affects the integration of the skeletal elements (angulation?), which seems to be essential for supporting the conclusion. Likewise in Figure 7, the analyses of EdU+ cells and Sox9 reporter expression were not included in zol-treated animals. Similarly in Figure 5, quantification of osteoclasts was not performed with the full skin flap surgery group. Analyses of only normally regenerated animals are not sufficient to support many of the conclusions.

      2. The phenotype of zol-treated animals in limb regeneration is somewhat disappointing. Although zol-treated animals show decreased blastema formation and unresorbed pre-existing skeletal elements, limb regeneration still occurs and the only phenotype is a relatively minor defect in skeletal integration. It is possible that zol-induced defect in blastema formation is not directly linked to the failure of integration at a later stage. I find this “weakness” a bit subjective.

      3. As an integration failure of the newly formed skeleton still occurs in untreated animals, it is not entirely clear how the authors can attribute this defect to a lack of bone resorption. More quantitative analyses would be necessary to demonstrate the correlation between zol treatment and lack of integration.

      Taking into consideration the reviewer’s concerns, we have improved our analysis of integration phenotype. The assessment of integration success was carried out using a score matrix and with it, we correlated the extent of resorption with integration efficiency more accurately. We believe our results provide sufficient evidence to support this correlation.

      When we first saw the phenotype of zol-treated animals, we were far from disappointed, we were actually intrigued that we could observe a significant failure in tissue integration after removing the function of osteoclasts in an early phase of regeneration. All or nothing results are exciting, subtle results on the other hand, could prove more informative, and we think this is the case here. Our treatment does not inhibit regeneration, but disrupts tissue integration, opening another fascinating aspect of regeneration: how old tissue is capable of functionally integrate newly-formed tissue?

      The integration phenotypes observed in the un-resorbed limbs does not resemble anything reported in the field so far. Moreover, the range of phenotypes observed led us to better determine its correlation with resorption. Importantly, the presence of integration failures in untreated animals allowed us to look into ECM organization at this old-new tissue interphase, while highlighting the normal occurrence of imperfect regeneration in the axolotl limb.

      Finally, we have included new results to complement the conclusions presented at the end of our work. Albeit we observed differences in blastema size in zol-treated animals, we did not observe difference in the amount of EdU+ cells, which reveals that the skeleton cannot be used as a reference for assessing blastema location. This conclusion is complemented with our in vivo assays in which we observed condensation of cartilage despite resorption still occurring. We consider our conclusions to be justified and supported by the assays presented in our work.

    1. Author Response

      Reviewer #1 (Public Review):

      Khan et al describe how two important transcription factors functionally cooperate to activate a few of the CRP-dependent genes in Mycobacterium tuberculosis. CRP is a global regulator in eubacteria needed to activate a number of genes while PhoP is an acid stress response regulator required for expression specific set of genes. The authors delineate the interaction between these two key regulators of the bacterial pathogen and show that in a subset of CRP-dependent promoters, PhoP binding recruits CRP to activate transcription.

      The experiments are well designed and executed with a coalescent presentation of the manuscript. While the data is well organized and presented with clean images of phophorimages and blots to facilitate their easy understanding, interpretation could have been more robust (see comments below).

      We thank the reviewer for these extremely encouraging comments. We have now included substantial changes throughout the ‘Results’ section to improve interpretation of the results (please see below our responses).

      Obviously, the strength of the paper is the description of hitherto unknown stress-specific cooperation between two well-studied transcription factors with most evidence supporting the claims. In E. coli (and in other bacteria) studies CRP mediated control of genes have led to the identification of different classes of CRP-dependent promoters with their own specific regulators. Such a description was lacking in M. tuberculosis and the PhoP - CRP collaboration described is likely to have implications on pathogenesis. The weakness (or possibly what remains to be explored) is that the precise mechanism of the cooperative transcription regulation is yet to be understood.

      We agree with the reviewer’s comment that the precise mechanism of cooperative transcription regulation is yet to be fully understood. While we briefly mention it as the future scope of work in the concluding part of the ‘Discussion’ section, we have now included a new paragraph on the schematic model summarizing a possible mechanism of cooperative transcription regulation.

      From the data presented it is apparent that PhoP binds to whiB up promoter own efficiently. It is also evident that CRP is recruited to its site as a result of PhoP binding. This is reminiscent of the bacteriophage Lamba paradigm of positive cooperativity. Thus, it is not reciprocal synergy (as stated in the paper in one place). It is PhoP mediated recruitment as claimed elsewhere. Indeed, PhoP null mutants nicely support the latter interpretation

      The reviewer raises an important and interesting point on positive cooperativity resembling bacteriophage lambda paradigm. We agree. We have now modified text of the ‘Results’ section to establish clarity on this matter.

      A discussion on why and how CRP binds on its own in other CRP-dependent promoters would help better appreciate the need for PhoP sites next to CRP sites for their cooperative interaction in these promoter subsets. CRP sites could be at a varied distance with respect to the promoter as seen in E. coli.

      Again, this is an interesting point. We thank the reviewer for bringing this point to our attention. As recommended by the reviewer, we have now included the following text in the ‘Discussion’ section of the revised manuscript.

      “Notably, the subset of genes which undergo differential expression in Δcrp-H37Rv conforms a pattern largely resembling canonical CRP regulon of E. coli with CRP binding sites either proximal to transcription start sites, leading to repression or distal to transcription start sites, leading to promoter activation, respectively (Kahramanoglou et al., 2014). It is noteworthy that CRP has been suggested to function as a general chromosomal organizer (Grainger et al., 2005). In this study, we uncover that strikingly PhoP binding sites are present next to CRP binding sites, located only distal upstream of promoters, and therefore, associated with activation. We propose that in case of these co-regulated promoters, the additional stability of the transcription initiation complex is derived from protein-protein interaction between CRP and PhoP. These two interacting proteins remain bound to their cognate sites away from the start site, and contribute to stability of the transcription initiation complex, providing access for mycobacterial RNA polymerase (RNAP) to bind and transcribe genes. A schematic model is shown in Fig. 6C. Together, these molecular events mitigate stress by controlling expression of numerous genes and perhaps contribute to better survival of the bacilli in cellular and animal models.”

      Reviewer #2 (Public Review):

      In this manuscript by Khan et al., the authors set out to characterize how the cAMP receptor protein, CRP, and PhoP function to coregulate a subset of virulence genes in Mycobacterium tuberculosis. To this end, the authors use a wide variety of molecular techniques to monitor gene regulation, DNA-binding activity, and protein-protein interactions between phosphorylated PhoP and CRP. The authors conclude that phosphorylated PhoP functions to recruit CRP to promoter regions, where together the two regulators function synergistically to control gene expression. In general, the conclusions of the manuscript appear to be justified by the data, however, the text is difficult to follow. The current version of the paper is likely of interest to scientists within the field of mycobacterial signal transduction.

      The major strength of the paper is that the authors test their hypothesis using a variety of complementary approaches. The authors demonstrate a genetic interaction between CRP and PhoP in vivo and reconstitute the phenomenon in vitro, providing compelling evidence that the coregulation by these well-studied regulators does take place. The major weakness is that the logic of the manuscript is difficult to follow as a reader, at times making an evaluation of results and interpretations difficult. The majority of the experimentation involves the whiB1 promoter while conclusions are extrapolated broadly.

      We would like to thank the reviewer for her/his constructive comments and suggestions. In the revised manuscript, we have now included numerous changes throughout the ‘Results’ and ‘Discussion’ sections to improve logic of the manuscript and interpretation of the results (please see below our responses). Also, we have included experiments as requested by the reviewers and provided additional data and explanations that address their concerns.

    1. Author Response

      Reviewer #3 (Public Review):

      1) Information is missing about the regions of interest in which calcium responses were measured. Judging from Fig. 1E, calcium signals were measured in the somata, and this should be specified. Also judging from this figure, calcium signals seem to be largely confined to the somata and virtually absent from dendritic arbors. Fig. 6a shows very faint signals in the dendrites, yet those signals seem to have been measured rather far from the point of force application (a scale bar is shown but undefined), and, for some unknown reason, not between soma and force application point). Should there be detectable calcium signals in the somata, respective image gains should be adjusted so that those signals can be appreciated by the reader. If there are no clear signals in the dendrites, this would affect interpretations concerning e.g. Ca-α1D.

      Calcium responses can be observed in the soma and dendrites, which was presented in the original manuscript (Figure 6). Inspired by the 2nd suggestion from this reviewer, we went through our data and refined our measurement of the dendritic signal in the revised manuscript (see revised Figure 6). In addition, we also showed that the dendritic response was dependent on Ca-α1D (see revised Figure 6 and Figure 6-figure supplement 1). Finally, in the revised manuscript, we made it clear that all F/F0 were measured from the soma unless otherwise stated (see Figure 2, legend).

      2) Along this line, analyzing also the spacial distribution of dendritic calcium responses to the pokes would provide a much more detailed picture about how the dendritic tree responds to the various pokes. The beauty of the imaging approach chosen here is that it provides such information. Rather than ignoring this possibility, it should be exploited in this study, especially as respective data might provide much deeper insights into the relation between the mechanosensory function of the cell and its dendritic tree (and bolster the modelling results in Fig. 4 experimentally).

      In the original manuscript, we included the data on the dendritic calcium signal and showed that the dendritic signal was reduced when the activity of VGCCs were inhibited or in the Ca-α1D knockdown mutant (see Fig. 6 A-B in the original manuscript). Inspired by the suggestion from the reviewers, we had a closer look at our data and performed additional experiments. In the revised Figure 6 A-B, we showed that the mechanical stimuli could evoke calcium responses not only in the soma, but also in the homolateral (i.e. between the soma and the force probe) and contralateral (i.e. opposite side of the force probe) dendrites, suggesting that the dendritic signals are propagating within the dendritic arbors. Moreover, in the revised Figure 6 A-B and Figure 6-figure supplement 1, we showed that these dendritic signals were reduced in the mutant strains of Ca-α1D or if the fillet preparation was treated with nimodipine, demonstrating a clear dependence on the activity of VGCCs. However, because our imaging speed is not fast enough to capture the dendritic flow of calcium signals, the dynamics of signal propagation remains undefined. This would be an interesting issue to study in the future. Along with the revised Figure 6, we also revised the text and legends accordingly.

      3) When showing response functions as in e.g. Figs. 2C, G, H, 3D, 5C-E, etc., the y-axis should have a logarithmic scaling; receptor potentials of receptor cells usually scale proportionally to the logarithm of the stimulus amplitude. Only then, the reader will be able to fully appreciate the sensitivity differences. This will also alter interpretation of response function slopes.

      We thank the reviewer for the suggestion. However, the stimulation force is actually a distal stimulus for the cell, while the proximal stimuli (e.g. local deformation) are difficult to measure/estimate. Therefore, we are not sure if the cellular responses scale necessarily to the logarithm of macroscopic forces (i.e. the distal stimuli). However, simply by looking at the data, we found that the response is proportional to the force and for conciseness, and thus we fitted the plot using a linear function.

      4) The knockdown and mutant data is interesting, yet important controls are missing. For the RNAi lines used, qPCR data on the knockdown-efficiency should be added. For the channel mutations, available genetic rescue lines should be used as controls. Data on protein localization is presented for the mechanosensitive channels, but not for voltage-gated calcium channel subunit. Should antibodies be available, respective stainings should be included. If not, the authors should at least check whether Ca-α1D is expressed in the cell using e.g. Mi{ET1}Ca-α1D[MB06807] that is available at Bloomington.

      First, we did not use RNAi mutant for Piezo. The PiezoKO line is a genomic mutant strain.

      Second, for Ca-α1D, because there are only a small number of c4da in each animal and Ca-α1D has a quite broad expression in various types of neurons (see our revised Figure 6-figure supplement 2), we expected that the reduction in the expression level of Ca-α1D in c4da would be very difficult to detect. Therefore, we knocked down the expression of Ca-α1D in the whole animal using the same uas-Ca-α1Di strain and the tub-gal4 strain. Using RT-PCR, we showed that the expression level of Ca-α1D was significantly reduced (revised Figure 6-figure supplement 2). In fact, the same RNAi strain was also used in other functional studies.

      5) The statistics used is not entirely convincing. T-test are used throughout, though I do not feel that all the data is distributed normally. Moreover, some figures include multiple comparisons, apparently without statistical correction. The data should be re-analyzed using appropriate statistical procedures.

      We thank the reviewer for this suggestion. We have now used Mann-Whitney U test or Kruskal Wallis test for all the data that were not proven to follow a normal distribution. For multiple comparisons, we used One-way ANOVA. We have now included the relevant information in the revised figure legends.

    1. Author Response

      Reviewer #3 (Public Review):

      1) Validation of reagents: The authors generated a pY1230 Afadin antibody claiming that (page 6) "this new antibody is specific to tyrosine phosphorylated Afadin, and that pY1230 is targeted for dephosphorylation by PTPRK, in a D2-domain dependent manner". The WB in Fig 1B shows a lot of background, two main bands are visible which both diminish in intensity in ICT WT pervanadate-treated MCF10A cell lysates. The claim that the developed peptide antibody is selective for pY1230 in Afadin would need to be substantiated, for instance by pull down studies analysed by pY-MS to substantiate a claim of antibody specificity for this site. However, for the current study it would be sufficient to demonstrate that pY1230 is indeed the dephosphorylated site. I suggest therefore including a site directed mutant (Y1230F) that would confirm dephosphorylation at this site and the ability of the antibody recognizing the phosphorylation state at this position.

      We would like this antibody to be a useful and freely accessible tool in the field and have taken on board the request for additional validation. To this end we have significantly expanded Supplementary Figure 2 (now Figure 1 - figure supplement 2) and included a dedicated section of the results as follows: 1. We have now included information about all of the Afadin antibodies used in this study, since Afadin(BD) appears to be sensitive to phosphorylation (Figure 1 - figure supplement 2A). 2. We have demonstrated that the Afadin pY1230 antibody detects an upregulated band in PTPRK KO MCF10A cells, consistent with our previous tyrosine phosphoproteomics (Figure 1 - figure supplement 2B). This indicates that the antibody can be used to detect endogenous Afadin phosphorylation. 3. We have included two new knock down experiments demonstrating the recognition of Afadin by our antibody (Figure 1 - figure supplement 2C). There appear to be two Afadin isoforms recognised in HEK293T cells by both the BD and pY1230 antibody, consistent with previous reports (Umeda et al. MBoC, 2015). We have highlighted these in the figure. 4. We have performed mutagenesis to demonstrate the specificity of the antibody. We tagged Afadin with a fluorescent protein tag, reasoning that it would cause a shift in molecular weight that could be resolved by SDS PAGE, as is the case. We noted that the phosphopeptide used spans an additional tyrosine, Y1226, which has been detected as phosphorylated (although to a much lower extent than Y1230) on Phosphosite plus. The data clearly show that Afadin cannot be phosphorylated when Y1230 is mutated to a phenylalanine (compared to CIP control), indicating that this is the predominant site recognised by the antibody. In addition, the endogenous pervanadate-stimulated signal is completely abolished by CIP treatment (Figure 1 - figure supplement 2D). 5. We have included densitometric quantification of the dephosphorylation assay shown in Figure 1B, which was part of a time course and shows preferential dephosphorylation by the PTPRK ICD compared to the PTPRK D1. The signal stops declining with time, which could indicate antibody background, or an inaccessible pool of Afadin-pY1230 (Figure 1 - figure supplement 2E). 6. To further demonstrate that this site is modulated by PTPRK in post-confluent cells, we have used doxycycline (dox)-inducible cell lines generated in Fearnley et al, 2019. Upon treatment with 500 ng/ml Dox for 48 hours PTPRK is induced to lower levels than wildtype, however, normalized quantification of the Afadin pY1230 against the Afadin (CST) signal clearly indicates downregulation by PTPRK WT, but not the catalytically inactive mutant (Figure 1 - figure supplement 2F and 2G). Together these data strengthen our assertion that this antibody recognises endogenously phosphorylated Afadin at site Y1230, which is modulated in vitro and in cells by PTPRK phosphatase activity. For clarity, we have highlighted and annotated the relevant bands in figures. We have also included identifiers for each Afadin total antibody was used in particular experiments.

      2) The authors claim that a short, 63-residue predicted coiled coil (CC) region, is both necessary and sufficient for binding to the PTPRK-ICD. The region is predicted to have alpha-helical structure and as a consequence, a helical structure has been used in the docking model. Considering that the authors recombinantly expressed this region in bacteria, it would be experimentally simple confirming the alpha-helical structure of the segment by CD or NMR spectroscopy.

      To clarify, the helical structure in the docking model was independently predicted by several sequence and structural analysis programmes including AlphaFold2, RobettaFold, NetSurfP and as annotated in Uniprot (as a coiled coil). We did not stipulate prior to the AF2 prediction that it was helical. Isolated short peptides frequently adopt helical structure, therefore prediction of a helix within the context of the full Afadin sequence is, in our opinion, stronger evidence than CD of an isolated fragment.

      3) Only two mutants have been introduced into PTPRK-ICD to map the Afadin interaction site. One of the mutations changes a possibly structurally important residues (glycine) into a histidine. Even though this residue is present in PTPRM, it does not exclude that the D2 domain no longer functionally folds. Also the second mutation represents a large change in chemical properties and the other 2 predicted residues have not been investigated.

      The residues that were selected for mutation are all localised to the protein surface and therefore are unlikely to be involved in stable folding of PTPRK. In support of the correct folding of the mutated PTPRK, we include in Figure 1 below SEC elution traces for wild-type and mutant D2 showing that they elute as single symmetric peaks at the same elution volume as the WT protein. This is consistent with them having a similar shape and size, and not being aggregated or unfolded.

      Figure 1. PTPRK-D2 wild-type and mutant preparative SEC elution profiles. A280nm has been normalised to help illustrate that the different proteins elute at the same volume. The main peak from these samples was used for binding assays in the main paper.

      Furthermore, the yield for the double mutant was very high (4 mg of pure protein from a 2 L culture, see A280 value in graph below), whereas poorly folded proteins tend to have significantly reduced yields. This protein was also very stable over time whereas unfolded proteins tend to degrade during or following purification.

      Figure 2. Analytical SEC elution profile for the PTPRK-D2 DM construct showing the very high yield consistent with a well-folded, stable protein.

      Finally, we have carried out thermal melt curves of the WT and mutant PTPRK D2 domains showing that they all possess melting temperatures between 39.3°C and 41.7°C, supporting that they are all equivalently folded. We include these data as an additional Supplementary Figure (Figure 4 - figure supplement 3) in the paper.

      4) The interface on the Afadin substrate has not been investigated apart from deleting the entire CC or a central charge cluster. Based on the docking model the authors must have identified key positions of this interaction that could be mutated to confirm the proposed interaction site.

      We have now made and tested several additional mutations within both the Afadin-CC and PTPRK-D2 domains to further validate the AF2 predicted model of the complex.

      For Afadin-CC we introduced several single and double mutations along the helix including residues predicted to be in the interface and residues distal from the interface. These mutations and the pulldown with PTPRK are described in the text and are included as additional panels to a modified Figure 3. All mutations have the expected effect on the interaction based on the predicted complex structure. To help illustrate the positions of these mutations we have also included a figure of the interface with the residues highlighted.

      For the PTPRK-D2 we have also introduced two new mutations, one buried in the interface (F1225A) and one on the edge of the interface encompassing a loop that is different in PTPRM (labelled the M-loop). GST-Afadin WT protein was bound to GSH beads and tested for their ability to pulldown WT and mutated PTPRK. These new mutations (illustrated in the new Figure 4 – figure supplement 2) further support the model prediction. F1225A almost completely abolishes binding as predicted, while the M-loop retains binding. These mutations and their effects are now described in the main text and the pull-down data, including controls and retesting of the original DM mutant, are included as panel H in a newly modified Figure 4 focussed solely on the PTPRK interface.

      5) A minor point is that ITC experiments have not been run long enough to determine the baseline of interaction heats. In addition, as large and polar proteins were used in this experiment, a blank titration would be required to rule out that dilution heats effect the determined affinities.

      All control experiments including buffer into buffer, Afadin into buffer and buffer into PTPRK were carried out at the same time as the main binding experiment and are shown below overlaid with the binding curve. These demonstrate the very small dilution heats consistent with excellent buffer matching of the samples.

      We were able to obtain excellent fits to the titration curves by fitting 1:1 binding with a calculated linear baseline (see Figure 2B,D). Very similar results were obtained by fitting to the sum (‘composite’) of fitted linear baselines obtained for the three control experiments for each titration.

    1. Author Response

      Public Evaluation Summary:

      This work presents a series of enhancements to the PhIP-seq method of autoantibody discovery, with the goal of improving scaling to larger cohorts and increasing disease specificity. The strength of the paper is the validation of the high throughput format, although results from screening patient samples confirm or only modestly extend previous data.

      We thank the reviewers for their feedback and agree that the validation of our high throughput, easily accessible approach is a strength of this work. We appreciate that the reviewers expressed uncertainty about whether there were sufficient advances to qualify this paper as a Research Advance. In addition to a point-by-point rebuttal, we quantify and enumerate the advances, improvements, and novel findings disclosed in this manuscript, relative to our original eLife paper.

      1. Demonstration of the importance of adequate healthy control cohorts in PhIP-Seq design. Using scaled protocols, we demonstrate the importance of using large control cohorts to filter out non-specific hits, as well as to detect rare but specific disease-associated antigens such as PDYN. To our knowledge, we are the first to demonstrate and discuss the consequences of PhIP-Seq dataset interpretation in the absence of sufficient controls. These findings are especially important in light of recent, high-impact papers using few to no controls (Mina et al. Science 2019, Gruber et al. Cell 2020, among others) to make conclusions about novel autoantibodies in the context of specific diseases.

      2. Design, validation and documentation of accessible, benchtop protocols for scaled PhIP-Seq. These protocols enable parallel testing of 600-800 samples without contamination or batch effects. Using a substantially expanded, multi-cohort set of patients with APS1, we validate the quality of the protocol and apply this protocol to numerous other disease contexts. Importantly, our protocols are documented (protocols.io) with each step tested for optimal quality, and are easily accessible without the need for robotics or specialized equipment.

      3. Machine Learning for disease classification using phage-based immunoprofiling. We show that large, well-controlled PhIP-Seq datasets lend well to machine learning approaches and enable unsupervised classification of disease status. To our knowledge, this is the first successful application of an unsupervised machine learning approach to phage-based immunoprofiling data. We demonstrate that PhIP-Seq data enables APS1 disease classification in 97% of cases (compare even to the 95% sensitivity seen in current testing for anti-IFN antibodies in the setting of suspected APS1). This finding, while applied to only one large cohort, demonstrates that PhIP-Seq data, when appropriately controlled, can have substantial value outside of simply a single-antigen discovery platform. The combination of machine learning and phage-based immunoprofiling will likely have extensive applications beyond APS1 including the discovery of novel diagnostic tests and biomarkers.

      4. Novel IPEX antigen BTNL8. We discovered and validated anti-BTNL8 antibodies in 42% of IPEX patients, suggesting that this may be a major autoantigen in IPEX. BTNL8 is a cell surface-expressed protein in intestinal gamma-delta T-cells, raising the novel question of a possible role for autoantibodies in directly regulating gut epithelial immune homeostasis (see discussion, lines 540-551). This is the first report, not only of BTNL8, but of any antigen discovery by PhIP-seq immunoprofiling in IPEX patients. Given the importance of this discovery, we sought to validate the presence of these autoantibodies in an additional validation cohort. We were successful, and present these findings in the new Figure 5., highlighting the generalizability of our findings to IPEX patients.

      5. BEST4 autoantibodies in IPEX and RAG-hypomorphic patients. We discovered anti-BEST4 antibodies in 15% of patients with IPEX, as well as in 2 patients with RAG1/2 mutations, demonstrating a connection between the intestinal autoimmunity seen in both IPEX and RAG1/2 deficiency. Of note, one of the 2 positive RAG1/2 deficient patients with anti-BEST4 antibodies is known to have very-early-onset IBD (VEO-IBD), a rare sub-phenotype in RAG-hypomorphs (and other primary immune deficiencies). Given the severity of VEO-IBD and how little is known about why certain patients with immune dysregulation develop this phenotype, these findings mark an important scientific advance and provide an essential clue into etiology. Furthermore, given that IPEX is driven by dysfunctional Treg cells, the commonality of these findings in both IPEX and hypomorphic RAG indicate a potential role for Treg dysfunction in hypomorphic RAG.

      6. Expansion of scaled PhIP-Seq to interrogate severe COVID-19 pneumonia, Kawasaki disease (KD), and Multisystem Inflammatory Syndrome in Children (MIS-C). Importantly, in MIS-C we find no evidence for any of the previously reported autoantigens described in Gruber at al (Cell, 2020) – a study which made strong conclusions about autoantibodies despite featuring only 4 PhIP-Seq control samples. Our results highlight the importance of scaling and appropriate control groups, and caution against overinterpretation of reported disease-specific autoantigens in PhIP-Seq (or other expanded antigen screening technologies such as near-proteome wide fixed protein arrays) which utilize smaller control cohorts, often without orthogonal validation experiments.

      7. Anti-CGNL1 antibodies in KD/MIS-C. We discovered and validated autoantibodies to CGNL1 in KD and MIS-C. It is possible that these antibodies represent a subset of specificities within anti-endothelial cell antibodies, given the endothelial expression of CGNL1 as well as its implications in cardiovascular disease.

      Reviewer #2 (Public Review):

      The authors update PhIP-seq into a high throughput format with the goal to accommodate screening of large numbers of human patient sera for the presence of novel autoantibodies and screening of more control sera to better determine standards for positivity of experimental samples. The high throughput protocol is detailed in an associated web-based format and validated in the paper using sera from patients with inherited immunodeficiencies and patients with MIS-C, Kawasaki syndrome, and COVID19. These are strengths of the work, and the high throughput PhIP-seq format will be useful to other investigators doing similar screenings. Yet, the findings do not significantly extend our knowledge of the range of autoantibodies in these illnesses, and many of the autoantibodies detected using PhIP-seq linear epitopes are not validated with other strategies, limiting significance of the results. The data from MIS-C and Kawasaki cohorts are confounded by an undetermined number of IVIG treated subjects, and limited numbers of control samples, including sera from patients with febrile illnesses that contain autoantibodies that are not discussed in the context of findings from the experimental groups.

      In summary, the paper is solid technically, with the high throughput strategy seemingly well validated; however, the advance here is primarily a technical one.

      We thank the reviewer and agree that the technical advance here is substantial and will be of value to other investigators doing similar screenings – as well as to investigators who previously did not have access to this technology due to high requirements for robotics and specialized equipment in previous iterations of the protocol. As such, we feel that this, combined with the demonstration of how to appropriately control PhIP-Seq experiments, should be considered a valuable research advance alone -- even in the absence of the extensive validation and novel findings on 5 additional disease contexts, summarized in greater detail above.

      IVIG status is discussed in lines 417-423. Briefly, the large majority of MISC samples are confirmed to be IVIG free at the time of blood draw. All of our KD samples are confirmed IVIG-free.

      While pediatric febrile illness samples could conceivably contain autoantibodies, we believe that this is best group for comparison given that these samples are taken from age-matched, acutely ill patients, thus providing a control group that is as clinically similar to MIS-C as possible. In addition, we included adult healthy sera and adult COVID19 sera as secondary control groups. Of note, this matching is much more extensive (and substantially larger in number) than the recent study in Cell (Gruber at el 2020), which for PhIP-Seq used only 4 healthy, COVID19-negative samples to compare to 9 MISC samples.

      Reviewer #3 (Public Review):

      This paper presents a rigorously performed series of studies to improve the ability of the PhIP-seq method to discover autoantibodies against peptide antigens that span the whole peptidome at scale, and increase the ease of validation and definition of disease specificity. The paper is an extension of a recent paper from the DeRisi and Anderson groups done on APS1 patients, which defined and validated a novel series of tissue-specific autoantigens in APS1. The current studies show that the authors can find the antibodies they previously defined, and using larger numbers of disease and control samples, can expand some what they detect. They then use the new method to look at multiple additional processes in which autoimmunity has been demonstrated/postulated.

      The dataset may be of use to others interested in defining novel autoantibodies. The findings really did not share significant new insights into the processes they studied,. As the authors note, they were unable to detect the antibodies (~10% of patients) recognizing type I IFNs in severe COVID-19, where these had been demonstrated effectively using ELISA previously. Unlike APS1, where their findings about uncommon tissue specific autoantibody responses across a population with known genetic deficiency and heterogeneous phenotypes could really illustrate the power of the method and approach, that elegance and powerful and novel conclusion is not as evident here.

      The trade-off between sensitivity, specificity, and screening power of antigen discovery tools is present in every assay. We do not feel that the comparison of our assay to a single protein ELISA assay is appropriate (nor particularly relevant for the conclusions drawn in this manuscript) given the inherent difference in nature and goals of the two assays. It has long been understood that PhIP-Seq does not have sensitivity for all protein antigens, including post-translationally modified and conformational antigens, which we state for readers in lines 190-193, within the discussion section, as well as in our previous work.

    1. Author Response

      Reviewer #2 (Public Review):

      Silberberg et al. present a series of cryo-EM structures of the ATP dependent bacterial potassium importer KdpFABC, a protein that is inhibited by phosphorylation under high environmental K+ conditions. The aim of the study was to sample the protein's conformational landscape under active, non-phosphorylated and inhibited, phosphorylated (Ser162) conditions.

      Overall, the study presents 5 structures of phosphorylated wildtype protein (S162-P), 3 structures of phosphorylated 'dead' mutant (D307N, S162-P), and 2 structures of constitutively active, non-phosphorylatable protein (S162A).

      The true novelty and strength of this work is that 8 of the presented structures were obtained either under "turnover" or at least 'native' conditions without ATP, ie in the absence of any non-physiological substrate analogues or stabilising inhibitors. The remaining 2 were obtained in the presence of orthovanadate.

      Comparing the presented structures with previously published KdpFACB structures, there are 5 structural states that have not been reported before, namely an E1-P·ADP state, an E1-P tight state captured in the autoinhibited WT protein (with and without vanadate), and two different nucleotide-free 'apo' states and an E1·ATP early state.

      Of these new states, the 'tight' states are of particular interest, because they appear to be 'off-cycle', dead end states. A novelty lies in the finding that this tight conformation can exist both in nucleotide-free E1 (as seen in the published first KdpFABC crystal structure), and also in the phosphorylated E1-P intermediate.

      By EPR spectroscopy, the authors show that the nucleotide free 'tight' state readily converts into an active E1·ATP conformation when provided with nucleotide, leading to the conclusion that the E1-P·ADP state must be the true inhibitory species. This claim is supported by structural analysis supporting the hypothesis that the phosphorylation at Ser162 could stall the KdpB subunit in an E1P state unable to convert into E2P. This is further supported by the fact that the phosphorylated sample does not readily convert into an E2P state when exposed to vanadate, as would otherwise be expected.

      The structures are of medium resolution (3.1 - 7.4 Å), but the key sites of nucleotide binding and/or phosphorylation are reasonably well supported by the EM maps, with one exception: in the 'E1·ATP early' state determined under turnover conditions, I find the map for the gamma phosphate of ATP not overly convincing, leaving the question whether this could instead be a product-inhibited, Mg-ADP bound E1 state resulting from an accumulation of MgADP under the turnover conditions used. Overall, the manuscript is well written and carefully phrased, and it presents interesting novel findings, which expand our knowledge about the conformational landscape and regulatory mechanisms of the P-type ATPase family.

      We thank the reviewer for their comments and helpful insights. We have addressed the points as follows:

      However in my opinion there are the following weaknesses in the current version of the manuscript:

      1) A lack of quantification. The heart of this study is the comparison of the newly determined KdpFABC structures with previously published ones (of which there are already 10). Yet, there are no RMSD calculations to illustrate the magnitude of any structural deviations. Instead, the authors use phrases like 'similar but not identical to', 'has some similarities', 'virtually identical', 'significant differences'. This makes it very hard to appreciate the true level of novelty/deviation from known structures.

      This is a very valid point and we thank the reviewers for bringing it up. To provide a better overview and appreciation of conformational similarities and significant differences we have calculated RMSDs between all available structures of KdpFABC. They are summarised in the new Table 1 – Table Supplement 2. We have included individual rmsd values, whenever applicable and relevant, in the respective sections in the text and figures. We note that the RMSDs were calculated only between the cytosolic domains (KdpB N,A,P domains) after superimposition of the full-length protein on KdpA, which is rigid across all conformations of KdpFABC (see description in material and methods lines 1184-1191 or the caption to Table 1 – Table Supplement 2). We opted to not indicate the RMSD calculated between the full-length proteins, as the largest part of the complex does not undergo large structural changes (see Figure 1 – Figure Supplement 1, the transmembrane region of KdpB as well as KdpA, KdpC and KdpF show relatively small to no rearrangements compared to the cytosolic domains), and would otherwise obscure the relevant RMSD differences discussed here.

      Also the decrease in EPR peak height of the E1 apo tight state between phosphorylated and non-phosphorylated sample - a key piece of supporting data - is not quantified.

      EPR distance distributions have been quantified by fitting and integrating a gaussian distribution curve, and have been added to the corresponding results section (lines 523-542) and the methods section (lines 1230-1232).

      2) Perhaps as a consequence of the above, there seems to be a slight tendency towards overstatements regarding the novelty of the findings in the context of previous structural studies. The E1-P·ATP tight structure is extremely similar to the previously published crystal structure (5MRW), but it took me three reads through the paper and a structural superposition (overall RMSD less than 2Å), to realise that. While I do see that the existing differences, the two helix shifts in the P- and A- domains - are important and do probably permit the usage of the term 'novel conformation' (I don't think there is a clear consensus on what level of change defines a novel conformation), it could have been made more clear that the 'tight' arrangement of domains has actually been reported before, only it was not termed 'tight'.

      As indicated above we have now included an extensive RMSD table between all available KdpFABC structures. To ensure a meaningful comparison, the rmsd are only calculated between the cytosolic domains after superimposition of the full-length protein on KdpA, as the transmembrane region of KdpFABC is largely rigid (see figure below panel B). However, we have to note that in the X-ray structure the transmembrane region of KdpB is displaced relative to the rest of the complex when compared to the arrangement found in any of the other 18 cryo-EM structures, which all align well in the TMD (see figure below panel C). These deviations make the crystal structure somewhat of an outlier and might be a consequence of the crystal packing (see figure below panel A). For completeness in our comparison with the X-Ray structure, we have included an RMSD calculated when superimposed on KdpA and additional RMSD that was calculated between structures when aligned on the TMD of KdpB (see figure below panel D,E). The reported RMSD that the reviewer mentiones of less than 2Å was probably obtained when superimposing the entire complex on each other (see figure below panel F). However, we do not believe that this is a reasonable comparison as the TMD of the complex is significantly displaced, which stands in strong contrast to all other RMSDs calculated between the rest of the structures where the TMD aligns well (see figure below panel B).

      From the resulting comparisons, we conclude that the E1P-tight and the X-Ray structure do have a certain similarity but are not identical. In particular not in the relative orientation of the cytosolic domains to the rest of the complex. We hope that including the RMSD in the text and separately highlighting the important features of the E1P tight state in the section “E1P tight is the consequence of an impaired E1P/E2P transition“ makes the story now more conclusive.

      Likewise, the authors claim that they have covered the entire conformational cycle with their 10 structures, but this is actually not correct, as there is no representative of an E2 state or functional E1P state after ADP release.

      This is correct, and we have adjusted the phrasing to “close to the entire conformational cycle” or “the entire KdpFABC conformational cycle except the highly transient E1P state after ADP release and E2 state after dephosphorylation.”

      3) A key hypothesis this paper suggests is that KdpFABC cannot undergo the transition from E1P tight to E2P and hence gets stuck in this dead end 'off cycle' state. To test this, the authors analysed an S162-P sample supplied with the E2P inducing inhibitor orthovanadate and found about 11% of particles in an E2P conformation. This is rationalised as a residual fraction of unphosphorylated, non-inhibited, protein in the sample, but the sample is not actually tested for residual unphosphorylated fraction or residual activity. Instead, there is a reference to Sweet et al, 2020. So the claim that the 11% E2P particles in the vanadate sample are irrelevant, whereas the 14% E1P tight from the turnover dataset are of key importance, would strongly benefit from some additional validation.

      We have added an ATPase assay that shows the residual ATPase activity of WT KdpFABC compared to KdpFABS162AC, both purified from E. coli LB2003 cells, which is identical to the protein production and purification for the cryo-EM samples (see Figure 2-Suppl. Figure 5). The residual ATPase activity is ca. 14% of the uninhibited sample, which correlates with the E2-P fraction in the orthovanadate sample.

      Reviewer #3 (Public Review):

      The authors have determined a range of conformations of the high-affinity prokaryotic K+ uptake system KdpFABC, and demonstrate at least two novel states that shed further light on the structure and function of these elusive protein complexes.

      The manuscript is well-written and easy to follow. The introduction puts the work in a proper context and highlights gaps in the field. I am however missing an overview of the currently available structures/states of KdpFABC. This could also be implemented in Fig. 6 (highlighting new vs available data). This is also connected to one of my main remarks - the lack of comparisons and RMSD estimates to available structures. Similarity/resemblance to available structures is indicated several times throughout the manuscript, but this is not quantified or shown in detail, and hence it is difficult for the reader to grasp how unique or alike the structures are. Linked to this, I am somewhat surprised by the lack of considerable changes within the TM domain and the overlapping connectivity of the K indicated in Table 1 - Figure Supplement 1. According to Fig. 6 the uptake pathway should be open in early E1 states, but not in E2 states, contrasting to the Table 1 - Figure Supplement 1, which show connectivity in all structures? Furthermore, the release pathway (to the inside) should be open in the E2-P conformation, but no release pathway is shown as K ions in any of the structures in Table 1 - Figure Supplement 1. Overall, it seems as if rather small shifts in-between the shown structures (are the structures changing from closed to inward-open)? Or is it only KdpA that is shown?

      We thank the reviewer for their positive response and constructive criticisms. We have addressed these comments as follows:

      1. The overview of the available structures has been implemented in Fig. 6, with the new structures from this study highlighted in bold.

      2. RMSD values have been added to all comparisons, with a focus on the deviations of the cytosolic domains, which are most relevant to our conformational assignments and discussions.

      3. To highlight the (comparatively small) changes in the TMD, we have expanded Table 1 - Figure Supplement 1 to include panels showing the outward-open half-channel in the E1 states with a constriction at the KdpA/KdpB interface and the inward-open half-channel in the E2 states. The largest observable rearrangements do however take place in the cytosolic domains. This is an absolute agreement with previous studies, which focused more on the transition occurring within the transmembrane region during the transport cycle (Stock et al, Nature Communication 2018; Silberberg et al, Nature Communication 2021; Sweet et al., PNAS 2021).

      4. The ions observed in the intersubunit tunnel are all before the point at which the tunnel closes, explaining why there is no difference in this region between E1 and E2 structures. Moreover, as we discussed in our last publication (Silberberg, Corey, Hielkema et al., 2021, Nat. Comms.), the assignment of non-protein densities along the entire length of the tunnel is contentious and can only be certain in the selectivity filter of KdpA and the CBS of KdpB.

      5. The release pathway from the CBS does not feature any defined K+ coordination sites, so ions are not expected to stay bound along this inward-open half-channel.

      My second key remark concerns the "E1-P tight is the consequence of an impaired E1-P/E2-P transition" section, and the associated discussion, which is very interesting. I am not convinced though that the nucleotide and phosphate mimic-stabilized states (such as E1-P:ADP) represent the high-energy E1P state, as I believe is indicated in the text. Supportive of this, in SERCA, the shifts from the E1:ATP to the E1P:ADP structures are modest, while the following high-energy Ca-bound E1P and E2P states remain elusive (see Fig. 1 in PMID: 32219166, from 3N8G to 3BA6). Or maybe this is not what the authors claim, or the situation is different for KdpFABC? Associated, while I agree with the statement in rows 234-237 (that the authors likely have caught an off-cycle state), I wonder if the tight E1-P configuration could relate to the elusive high-energy states (although initially counter-intuitive as it has been caught in the structure)? The claims on rows 358-360 and 420-422 are not in conflict with such an idea, and the authors touch on this subject on rows 436-450. Can it be excluded that it is the proper elusive E1P state? If the state is related to the E1P conformation it may well have bearing also on other P-type ATPases and this could be expanded upon.

      This a good point, particularly since the E1P·ADP state is the most populated state in our sample, which is also counterintuitive to “high-energy unstable state”. One possible explanation is that this state already has some of the E1-P strains (which we can see in the clash of D307-P with D518/D522), but the ADP and its associated Mg2+ in particular help to stabilize this. Once ADP dissociates and takes the Mg2+ with it, the full destabilization takes effect in the actual high-energy E1P state. Nonetheless, we consider it fair to compare the E1P tight with the E1P·ADP to look for electrostatic relaxation. We have clarified the sequence of events and our hypothesized role the ADP/Mg2+ have in stabilizing the E1P·ADP state that we can see (lines 609-619): “Moreover, a comparison of the E1P tight structure with the E1P·ADP structure, its most immediate precursor in the conformational cycle obtained, reveals a number of significant rearrangements within the P domain (Figure 5B,C). First, Helix 6 (KdpB538-545) is partially unwound and has moved away from helix 5 towards the A domain, alongside the tilting of helix 4 of the A domain (Figure 5B,C – arrow 2). Second, and of particular interest, are the additional local changes that occur in the immediate vicinity of the phosphorylated KdpBD307. In the E1P·ADP structure, the catalytic aspartyl phosphate, located in the D307KTG signature motif, points towards the negatively charged KdpBD518/D522. This strain is likely to become even more unfavorable once ADP dissociates in the E1P state, as the Mg2+ associated with the ADP partially shields these clashes. The ensuing repulsion might serve as a driving force for the system to relax into the E2 state in the catalytic cycle.”

      We believe it is highly unlikely that the reported E1-P tight state represents an on-cycle high-energy E1P intermediate. For one, we observe a relaxation of electrostatic strains in this structure, in particular when compared to the obtained E1P ADP state. By contrast, the E1P should be the most energetically unfavourable state possible to ensure the rapid transition to the E2P state. As such, this state should be a transient state, making it less likely to be obtainable structurally as an accumulated state. Additionally, the association of the N domain with the A domain in the tight conformation, which would have to be reverted, would be a surprising intermediary step in the transition from E1P to E2P. Altogether, the here reported E1P tight state most likely represents an off-cycle state.

    1. Author Response

      Reviewer #1 (Public Review):

      A novel approach is introduced for targeting Protein-RNA interactions. The approach (presented in Figure 1) integrates computational techniques with cellular assays, and is applicable, in principle, whenever the protein-RNA complex has a druggable binding pocket. It is demonstrated with the discovery of inhibitors of YB-1's interaction with its mRNA target. Of 22 putative hits, discovered based on virtual screen, 11 come out as very strong hits. Far beyond the 5-10 percent success rate that one often sees in drug discovery. The main strength here is the proof of concept that protein-RNA interactions are targetable.

      We agree with the reviewer that large computational screens to identify potential inhibitors generally lead to dead ends. This is why we have rationally designed this integrative approach where predictions are experimentally validated with different tools and the obtained results feed/orient the computational approach. The workflow illustrated in Figure 1 creates a vivid exchange between computational and experimental data and allows a back-and-forth between both to enhance and refine the computational screen. We have also put in place a refined physics-based computational approach to increase our chances in avoiding these dead-end screens (details are in Computational Methods and in Appendix 2). The high predictive power of our computational approach comes from a rationally designed workflow combining the following:

      1- Understanding the dynamic behavior of the target, the binding pocket, and identification of key residues using MD simulations.

      2- The starting 3D structures used and refined using MD simulations.

      3- The prior identification and validation of the binding site and the identification of F1 and F4 as hits by NMR spectroscopy. F1 was then used in the pharmacophore screen.

      4- The statistical mechanics-based filter played an important role in orienting and refining this selection. For example, the use of ligand-water interactions to qualitatively estimate the residence of the ligand in the binding site.

      Nevertheless, the high success rate also comes from human intervention, where visual inspection and rational selection of structurally promising candidates (sometimes intuition-driven) also played an important role in selecting the 111 molecules issued from the static virtual screen (pharmacophore screens). We now clarify this point on pages 5 and 6 of the revised manuscript and give more details on the selection criteria used. We also specify that the large computational screen we implemented was mandatory to validate the MT bench.

      Reviewer #2 (Public Review):

      In the manuscript "Targeting RNA-Protein Interactions with an Integrative Approach Leads to the Identification of Potent YB-1 Inhibitors" the authors have tried to integrate computational, structural, and cellular imaging approaches to identify small molecule inhibitors of RNA-protein interactions. They take up as their target YB-1, an abundant RNA-binding protein (RBP) involved in regulating the translation and/or processing of multiple mRNAs, many of which encode genes involved in tumorigenesis and tumor progression. Firstly, the authors find a binding pocket in the cold shock domain (CSD) of YB-1, for the flavonoid fisetin, and more so for the analog quercetin, by NMR spectroscopy, which they name the "quercetin pocket". They then delineate and refine the RNA-binding characteristics of this pocket by MD simulations. Further, they conduct a computational screen of a large library of small molecules to find candidates which bind to this pocket. They then check the selected candidates as inhibitors of YB1-mRNA interaction using the microtubule bench (MT-bench) method. They find 11 molecules as significant hits with this approach, including one FDA-approved PARP-inhibitor drug (P1). P1 is shown to bind YB-1 by MD-simulation and NMR spectroscopy and was also shown to interfere with YB-1-mRNA interaction by NMR and in cells by the MT-bench assay. Finally, they showed that the molecule P1 reduced cellular translation by a puromycin incorporation assay and this effect was not observed in cells depleted of YB-1.

      Together, these multifarious approaches appear to establish a workflow useful for scoring for inhibitors of RNA-protein interactions. The workflow is rationally designed, moving from the identification of a binding pocket to the identification of binding molecules and then selecting molecules that inhibit protein-mRNA interactions. This workflow may be useful for other researchers attempting to screen libraries of compounds targeting RNA interactions by other RNA-binding proteins. However, as many RNA-binding proteins have large intrinsically disordered regions or no recognizable RNA-binding domains, it is to be seen whether such a structural "binding-pocket"-based approach can be generalizable to all RNA-binding proteins.

      We agree with the reviewer that this is not sufficient to generalize to all RBPs. Performing a complete study for other RBPs would require a separate paper. In the current work, we did show that we can detect mRNA-RBP interactions with two other RBPs HuR and FUS and used them as a control to show the specificity of the tested small molecules towards YB-1 (Figures 3d and 4b,c). We have now tuned down the statements about the generality of the method (page 20).

      In the discussion, we now also explain that YB-1, because it has a single cold-shock domain and a druggable pocket, is an “ideal” target. We also explain that many RNPs harbors many RNA-binding domains, which may reduce the sensitivity of our method when a specific domain is targeted by small molecules because the other domains would contribute to the binding to mRNA. However, a single RNA-binding domain may be isolated and used as bait for the MT bench assay to overcome this obstacle. Developing molecules what would target a specific domain may be sufficient to modulate the biological function exerted by the full length protein.

      While the data presented in the paper is coherent and generally supports the demonstration of an inhibition of RNA-binding by YB-1, what appears to be lacking is evidence that the observed effect is specific to inhibition of YB-1-mediated regulation of translation and whether the expression of transcripts specifically regulated by YB-1 is affected. Secondly, it is not clear what is the effect of the putative inhibitor on cellular activity and behaviour, which is important to judge both specific phenotypic effects as well as non-specific cytotoxic effects.

      Overall the work is interesting and instructive, but the lack of the above observations detracts from its significance.

      We thank the reviewer for his feedback and for raising these interesting points. As indicated in the manuscript, it is very difficult to find functional cellular assays that would reveal a phenotype specific to a general RBP such as YB-1. This is even more difficult with YB-1 since it binds nonspecifically to most mRNAs as shown from CLIP analysis1. This was one of the reasons to develop a specific cellular assay such as the MT bench assay. YB-1 originates from cold shock proteins in bacteria which preserve global mRNA translation during cold stress, presumably by removing secondary structures. YB-1 in contrast with many RBPs has only a single structured RNA-binding domains, which is not favorable for a specific binding to some mRNA sequences/structures. As noticed by the reviewers, YB-1 is indeed not a general translation factor but is a general protein that binds to most non polysomal mRNA 2. mRNAs, even those highly translated, switch from a polysomal state (active) to a non polysomal state (dormant) from time to time. In a recent work, we showed that YB-1 prepared non polysomal mRNAs in a way to facilitate the translation from dormant to active state. We also showed that, accordingly, decreasing the expression of YB-1 reduces global mRNA translation rates in HeLa cells3. Consistent with this trend, a global decrease of mRNA translation as observed with Niraparib P1 that targets YB-1 makes sense. We have no knowledge of established 3’UTRs which would be highly specific to YB-1. YB-1 binds non specifically to both mRNA coding sequences and 3’UTRs (YBX1 data1, YBX3 data4). Large scale and in depth analysis should be performed to find out whether specific structures/sequences increase significantly the YB-1 dependency in mRNA translations. However, the expression of some proteins associated to malignancy have been associated to YB-1 expression level notably Vimentin and E-cadehrin3. For this we performed a new experiment where we measured the expression levels of these two proteins after silencing YB-1 expression in HeLa cells, in the absence and in the presence of Niraparib P1 and Olaparib P2 (used as a negative control). Results show that P1, but not P2, decrease the dependence on YB-1 of Vimentin expression level (significant) and that of E-cadherin (non-significant). Other proteins such as eIF5a and RPL36, used here as negative controls, did not show a similar behavior. These results were thus in agreement with a specific effect of Niraparib on YB-1-mediated translation. In agreement with these results, we now add a result from a recent report showing the down regulation of Vimentin expression in ovarian cancer cells when treated with Niraparib5. This is now discussed on pages 16 and 17 of the revised manuscript and the new data are included as a new figure Figure 8-Figure supplement 3.

      1. Wu, S.-L. et al. Genome-wide analysis of YB-1-RNA interactions reveals a novel role of YB-1 in miRNA processing in glioblastoma multiforme. Nucleic acids research 43, 8516-8528 (2015).

      2. Singh, G., Pratt, G., Yeo, G.W. & Moore, M.J. The clothes make the mRNA: past and present trends in mRNP fashion. Annual review of biochemistry 84, 325 (2015).

      3. Budkina, K. et al. YB-1 unwinds mRNA secondary structures in vitro and negatively regulates stress granule assembly in HeLa cells. Nucleic acids research 49, 10061-10081 (2021).

      4. Van Nostrand, E.L. et al. A large-scale binding and functional map of human RNA-binding proteins. Nature 583, 711-719 (2020).

      5. Zhen Zeng, Jing Yu, Zhongqing Jiang, Ningwei Zhao, "Oleanolic Acid (OA) Targeting UNC5B Inhibits Proliferation and EMT of Ovarian Cancer Cell and Increases Chemotherapy Sensitivity of Niraparib", Journal of Oncology, vol. 2022, 12 pages, 2022. https://doi.org/10.1155/2022/5887671

      As for the effect of the putative inhibitor on cellular activity and behaviour, which is important to judge both specific phenotypic effects as well as non-specific cytotoxic effects. We agree with the reviewer on this remark. YB-1 is associated with the high proliferation rate of cancer cells (and silencing YB-1 does not induce apoptosis). Therefore, we performed cell proliferation assays using cells treated with siRNA and siNEG allowing us to manipulate the endogenous YB-1 expression level rather than a more artificial rescue experiment. These assays were performed in the presence of 3 PARP-1 inhibitors at low concentrations: Niraparib P1 our hit, and two negative controls Olaparib P2 and Talazoparib P3. We used a 48 h incubation time which allows to observe effects at lower concentration of compounds. All PARP-1 inhibitors decrease cell proliferation, albeit to a higher extent with P3. However, P2 or P3 further decrease cell proliferation in siRNA-treated cells compared to siNEG-treated cells (significant differences at 5 µM)). In contrast, Niraparib rather further decreases cell proliferation in siNEG-treated cells when YB-1 levels are high (non-significant variations but opposite to those observed with P2 and P3). This new result is now presented as new Figure 8a. In addition, we show that the separation distance between cells increases significantly in YB-1-rich cells treated with P1, in contrast to P2 and P3 (significant differences) (new figure Figure 8-Figure supplement 1). A short distance of separation between cells may be due to colony formation when cells were plated at low density and allowed to grow for 48 h. Again, it means that Niraparib better inhibits cell proliferation in YB-1-rich cells when compared with what is observed with the two other PARP inhibitors Talazoparib and Olaparib. The text on page 17 was rewritten to include these new results and put this in evidence.

      Reviewer #3 (Public Review):

      The authors introduce an integrative platform for identifying small molecule ligands that can disrupt RNA-protein interactions (RPIs) in vitro and in cells. The screening assay is based on prior work establishing the MT bench assay (Boca et al. 2015) for evaluating protein-protein interactions in cells by utilizing microtubules as a platform to recruit and detect PPIs in cells. In the current manuscript, the authors adapted this methodology to evaluate small molecules targeting RNA-binding protein (RBPs) interactions with mRNA in cells. By combining the MT bench assay with computational docking/screening and ligand-binding evaluations by NMR, the authors discover inhibitors of the RBP YB-1, which included FDA-approved PARP-1 inhibitors. The impact of this work could be high given the critical roles of RNA-binding proteins in regulating the function and fate of coding and non-coding RNA. While the presented data are promising, the ability to generally apply this method beyond YB-1 and to RBPs in general remains to be addressed.

      We agree with the reviewer on his comments. In the revised version of the manuscript, we have tuned down the statements about the generality of the method. In addition, we elaborate about the potential of our assays and how to deal with RBPs that often have more than one RNA-binding domain. If many RNA-binding domains participate to the binding of a given RBP to mRNA, we may lose the sensitivity of the MT bench assays. However, one point is to use as bait to target isolated RNA-binding domain which could be enough to impair/correct the function of the full length RBP target. A statement has been added on page 20 of the revised manuscript to discuss this point.

    1. Author Response

      Reviewer #1 (Public Review):

      GCaMP indicators have become common, almost ubiquitous tools used by many neuroscientists. As calcium buffers, calcium indicators have the potential to perturb calcium dynamics and thereby alter neuronal physiology. With so many labs using GCaMPs across a variety of applications and brain regions, it's remarkable how few have documented GCaMP-related perturbations of physiology, but there are two main contexts in which perturbations have been observed: after prolonged expression of a high GCaMP concentration (common several weeks after infection with a virus using a strong promoter); and when cytoplasmic GCaMP is present during neuronal development. As a result, GCaMP studies are often designed to avoid these two conditions.

      Here, Xiaodong Liu and colleagues ask whether GCaMP-X series indicators are less toxic that GCaMPs. GCaMP-X indicators are modified GCaMPs with an additional N-terminal calmodulin binding domain that reduces interactions of the calmodulin moiety of GCaMP with other cellular proteins. Xiaodong Liu and colleagues document effects of GCaMP expression on neuronal morphology in vitro, calcium oscillations in vitro, and sensory responses in vivo, in each case showing that GCaMP-X indicators are less toxic. Their results are compelling.

      Unfortunately, the paper suffers two main weaknesses. Firstly, the results demonstrate that GCaMP is toxic during development, after prolonged expression via viruses in vivo, and in cell culture where maturation of the culture likely recapitulates key steps in development. GCaMPs are known to be toxic in these circumstances, such toxicity is readily circumvented by driving expression in the adult, and there are countless examples of studies in which adequate GCaMP expression was achieved without toxicity. These new results are of little relevance to the majority of GCaMP experiments. That GCaMP-X indicators are less toxic during development is a new result and may be of interest to those who wish to deploy calcium indicators during development, but this is a relatively small number of neuroscientists.

      We thank the reviewer for providing valuable opinions on these critical matters. Here, we would like to clarify:

      1. In our work, the status of neurites (length, branching, etc.) is indeed one main aspect to monitor, and neuritogenesis during the early stages of development is known to have temporal trajectories with ample dynamic range thus helpful to quantitatively compare GCaMP-X versus GCaMP. However, the key factor is the actual time and level of probe expression in neurons, and the starting timepoint of expression could vary. We have conducted additional experiments using virus-infected neurons (Figure 5—figure supplement 1) and transgenic neurons with inducible expression (Figure 7—figure supplement 3), both starting to express the probes at the mature stage. Thus, GCaMP-X imaging is not necessarily limited to developing neurons. As in the original reports of GCaMP probes with toxicity, virus injection was performed for both immature (2-3 weeks, Tian 2009 PMID: 19898485) and mature mice (~2 months, Chen 2013 PMID: 23868258). According to the protocol (Huber 2012 PMID: 22538608), GCaMP virus injection was done for adult mice (>2 months), which exhibited functional and morphological deficits in nucleus-filled neurons beyond OTW (Figure 2, Figure 5 and Figure 6). Collectively, the central principles of GCaMP-X versus GCaMP are applicable to both immature and mature neurons.

      2. Chronic GCaMP-X imaging has a broad spectrum of potential applications, not limited to neural development (Resendez 2016 PMID: 26914316). As mentioned, GCaMP-X resolves the problem of longitudinal expression thus making chronic imaging more feasible. We agree with the reviewer that a large body of our data in the original version focused on the characteristics of calcium signals during the early stage of neuronal development, which served as an exemplary scenario to compare GCaMP-X with GCaMP. Indeed, the importance of Ca2+ oscillation in neural development is commonly accepted (Kamijo 2018 PMID: 29773754; Gomez 2006 PMID: 16429121). In vivo Ca2+ imaging (Figure 2 and Figure 5) and morphological analyses (e.g., Figure 6) have extended the major conclusions onto mature neurons where dysregulations of Ca2+ oscillations are also tightly coupled with neuronal health or death/damage. Importantly, GCaMP-X paves the way to unexplored directions previously impeded or discouraged due to GCaMP perturbations, e.g., chronic imaging of cultured neurons to concurrently monitor Ca2+ activities and cell morphology as in this study.

      3. To circumvent the toxicity of GCaMP is not a trivial procedure for viral infection. The expression levels need to be carefully adjusted experimentally, e.g., by dilution studies (Resendez 2016 PMID: 26914316). A delicate balance of GCaMP expression is critical: low level (or short time) of expression would result in weak signals and poor SNR whereas high level (or long time) of expression would cause nuclear filling and neural toxicity. Even for the work-around conditions of time window and dilution dosage, nucleus-filled neurons are not uncommon judged by the expression/fluorescence patterns, e.g., in the original reports of GCaMP6 (Supplementary Figure 7, Chen 2013 PMID: 23868258), and GCaMP3 (Supplementary Figure 11, Tian 2009 PMID: 19898485). Under particular conditions (subtypes of neurons, time window of imaging, dosage of virus injection, etc.), many neurons could be found without apparent perturbation/nuclear-filling to proceed with calcium imaging. Using GCaMP-X, dosage is less restricted (10fold higher concentration for GCaMP-X with improved SNR and overall performance in Figure 2, Figure 5 and Figure 6). Practically, GCaMP-X is a simple solution for the issues related to excessive/prolonged expression. Also, GCaMP-X is expected to help maintain the total number of healthy neurons and thus the general health of the brain. Reportedly, some GCaMP lines of transgenic mice exhibit epileptic activities (Steinmetz 2017 PMID: 28932809), awaiting future studies to explore whether GCaMP-X could help.

      4. As the reviewer pointed out, the key of GCaMP-X is to resolve the unwanted (apo)GCaMP binding to endogenous proteins in neurons. We agree with the reviewer that according to the empirical observations the following factors appear to increase the severity of GCaMP perturbations: prolonged time, high concentration and nuclear accumulation. GCaMP-X is able to protect GCaMP from unwanted binding and the consequent damage to neurons, validated by various tests thus far (in vitro and in vivo). In this context, the prolonged time would result in higher GCaMP concentration, meanwhile accumulating the effects due to GCaMP interactions; higher GCaMP concentration would interfere with more binding events and targets of endogenous CaM; and enhanced/prolonged expression of GCaMP is directly correlated with nuclear accumulation, a hallmark of neuronal damage.

      Secondly, the authors extend their claims to conclude that GCaMP indicators are toxic under other circumstances, claims supported by neither their results nor the literature. To provide one example, at the end of the introduction is the statement, 'chronic GCaMP-X imaging has been successfully implemented in vitro and in vivo, featured with long-term overexpression (free of CaM-interference), high spatiotemporal contents (multiple weeks and intact neuronal network) and subcellular resolution (cytosolic versus nuclear), all of which are nearly infeasible if using conventional GCaMP.' The statement's inaccurate: there are many chronic imaging studies in vitro and in vivo using GCaMP indicators without nuclear accumulation of GCaMP or perturbed sensory responses. There are more examples throughout the paper where the conclusions overreach the results and are inaccurate. The results are simply insufficient to support many of the strong statements in the paper.

      Overall, the critics and suggestions of the reviewer have been well taken and we have revised the text accordingly. For this particular paragraph here mentioned by the reviewer, we want to clarify that it was the summary of our results in the whole manuscript, where each claim referred to the data and analyses shown in corresponding figures. In details, these figures were: 'free of CaM-interference (Figure 1), multiple weeks and intact neuronal network (in vitro: Figure 3 and Figure 4; in vivo: Figure 2, Figure 5 and Figure 6; transgenic neurons: Figure 7) and cytosolic versus nuclear (Figure 1 and the previous Figure 8). The last sentence of 'all of which are nearly infeasible if using conventional GCaMP' was meant to summarize the results comparing GCaMP versus GCaMP-X in our experimental settings of chronic imaging with prolonged/excessive probe expression. Again, we agree that for particular experimental settings and purposes the toxicity of GCaMP can be circumvented empirically. To avoid miscommunications, we have revised this paragraph by moving it to the Discussion (after all the data), also ensuring that the statements on GCaMP are backed up with data or literature. Please also see Essential Revisions, Item 3.

      Reviewer #2 (Public Review):

      Geng and colleagues provide further evidence for the lower neuronal toxicity of their improved GECI, GCaMP-X, which allows improved recordings of Ca2+ signals in neurons. As reported previously and studied in more detail here, the improved properties are primarily due to a lower tendency of GCaMP-Xc (reporting cytosolic Ca2+) to enter the nucleus. They present a systematic comparison of their cytosolic or nucleus-targeted GCamP-Xc (and Xn) with the corresponding "conventional" GCaMPs (jGCaMP7b, GCaMP6m). They, again, confirm the absence of apoGCaMP-X binding to the CaM binding domain of Cav1.3 L L-type Ca2+ channels suggesting that this is the main or one of several GCaMP interactions leading to altered intracellular signaling affecting neuronal survival, development and architecture. Evidence for more (likely) physiological Ca2+ responses were obtained from a battery of experiments, including in vivo recordings of acute sensory responses after viral expression of GCaMPs, monitoring of long-term calcium oscillations in cultured neurons, correlations measured Ca2+ oscillations with hallmarks of neuronal development (soma size, neurite outgrowth/arborizations, and long-term recordings of spontaneous Ca2+ activities in vivo in S1 primary somatosensory cortex. The latter experiments also showed that much higher doses of AAV-GCaMP6m-Xc could be administered than of GCaMP6m. They also show that unfavorable effects of GCaMPs on neurons of adult GCaMP expressing transgenic mice, both in in slices and cultured neurons. While most experiments aim at demonstrating improved performance of GCaMP-X, one finding also provides potential novel insight into the role of neuronal activity patterns during neuronal development in culture. Assuming more undisturbed physiological Ca2+ signaling even through longer time periods they can follow different Ca2+ activity patterns during neuronal development. Oscillation amplitudes and the level of synchrony correlated with neurite length and frequency inversely correlated with neurite outgrowth.

      They provide convincing experimental evidence for the improvements claimed for their novel GCamP-X constructs. Some aspects should be clarified.

      A key finding explaining the construct differences is the nuclear localization. The authors should also provide numbers for the N/C ratio for Ca2+ imaging of sensoryevoked responses in vivo (Fig. 2; pg 6: nuclear accumulation was barely noticeable from GCaMP6m-Xc even beyond OTW). Also, for chronic experiments in brain slices they state for GCaMP6m-Xc in the text that (pg 12) "meanwhile the N/C ratio remained ultra-low", yet Fig. 6 shows a N/C ratio of 0.2. This does not appear to be "ultra low".

      We appreciate the reviewer for bringing up the matter of N/C ratio (indicative of nuclear accumulation). We have appended the values of N/C ratio for in vivo experiments (revised Figure 2). Following the previous report, the criteria of N/C ratio was set to 0.8 to regroup the neurons into two subpopulations. A significant fraction of GCaMP neurons were nucleus-filled (N/C ratio>0.8); meanwhile, nearly no neuron expressing GCaMP-XC was found with N/C ratio greater than 0.8 when examined 8-13 weeks post injection. Generally, due to imaging resolution, confocal microscopy provided more precise evaluation for N/C ratio than two-photon in vivo images. In Figure 6, even more clear difference in nuclear distribution was observed between GCaMP and GCaMP-X, which was described as “ultralow” (GCaMP-X). Of note, the N/C ratio of YFP itself was ~1.3. The N/C ratio for GCaMP-XC was not close to zero, consistent with the measurements from other NES-tagged peptides (Yang 2022 PMID: 35589958). GCaMP-XC was not completely excluded from cell nuclei, thus producing some fluorescence there. In light of this comment, we have revised the relevant text including the phrase of “ultralow” (Page 14, Line 393). In addition, Figure 5 was also revised accordingly.

      Along these lines, since nuclear-filled neurons were observed in their experiments with GCaMP-Xc, the authors should comment if altered Ca2+ signals were also seen for the few neurons expressing GCaMP-Xc in the nucleus.

      During 2-photon imaging experiments in vivo, occasionally GCaMP-XC neurons appeared to have some level of nuclear expression especially in those blurred images of low quality. Judged by the criteria of N/C ratio (0.8), these neurons rarely fell into the nucleus-filled group (Figure 2B and Figure 5C, also see confocal imaging Figure 1B). On the other hand, a small fraction of GCaMP-XC could be “leaked” into the nucleus. GCaMP-XN also eliminated toxic (apo)GCaMP interactions in neurons, sharing the same design principle with GCaMP-XC (Figure 1). Therefore, nuclear GCaMP-XC is expected to resemble GCaMP-XN. Experimentally, with GCaMP-XC or GCaMP-XN present in the nucleus, no significant change in neuronal Ca2+ or neurite morphology has been observed. Meanwhile, this comment has pointed out one important direction of future research, i.e., to more precisely confine GCaMP-X within the targeted organelles, e.g., by improving or replacing localization tags.

      Since they performed a systematic comparison of two constructs to demonstrate an (expected) superiority of one of them, the experiments, or at least the analysis, should ideally be performed in a blinded way. The authors should clarify how they avoided experimental bias.

      For in vitro experiments, multiple independent trials of experiments with analyses were performed by two (or more) researchers to ensure the reproducibility and to minimize any bias. And the results and conclusions have been highly consistent (among different trials/researchers). Following the suggestion, we have assured that in vivo experiments and data analyses were separately conducted by the researchers from two different labs. For long-term expression/imaging, the differences between GCaMP-X and GCaMP were often discernable directly in the images even without further calculations or statistics (e.g., Figure 3B). Related information can be found in the Methods (Page 32, Line 799).

      In their chronic Ca2+ fluorescence imaging for autonomous Ca2+ oscillations in cultured cortical neurons ultralong lasting signals (Fig. 3B, DIV 17, GCaMP6m) could be observed. It would be helpful to further describe the nature of these transients, ideally by adding it to their video collection.

      As suggested by the reviewer, the video for Figure 3B (DIV 17, GCaMP6m) has been included in this revision (Figure 3—video supplement 2). In contrast to the oscillatory signals normally observed from healthy neurons, the pronounced and sustained Ca2+ signals are associated with apoptosis and other pathological conditions in neurons (Khan 2020 PMID: 32989314; Nicotera 1998 PMID: 9601613; Harr 2010 PMID: 20826549). The Ca2+ wave with broadened width (FWHM) was indicative of damaged neurons by GCaMP (Figure 3F), rather than (altered) sensing characteristics of GCaMP. We agree that this observation is a notable and interesting phenomenon, worth to follow up in future studies.

      The discussion is very long. In my opinion it would benefit from shortening, avoid redundancies and focus only on the key findings in this paper. This includes the chapter on design and application guidelines for CaM-based GECIs. The main message what the advantage of their GCaMP-X modifications has been made before in the discussion. A more detailed discussion on this appears more suitable in a review article.

      In response to this suggestion, we have made it as concise as possible, by simplifying or removing several topics including the design and application guidelines for CaMbased GECIs.

      It may be worthwhile to include another aspect in the discussion: does the improved GCaMP-Xc cause no change in neuronal function or morphology or is it just less damaging than other GCaMPs. How can this issue be addressed experimentally.

      We have revised the discussion accordingly (Page 21, Line 588). We agree that additional experiments would help evaluate how close GCaMP-X data are to the reality, considering the Ca2+-buffering effect intrinsic to Ca2+ probes and also other factors. In light of this suggestion and also those from Reviewer #1, we have incorporated more experimental controls, including Ai140 mice (GFP, Figure 7—figure supplement 2) and Fluo-4 AM (Ca2+ dye, Figure 3—figure supplement 4). The results have been encouraging in that GCaMP-X neurons were nearly indistinguishable in the morphological and functional aspects from GFP or Fluo-4 AM controls. The incoming feedbacks from GCaMP-X users should continue to help clarify this matter, which we would like to follow up.

    1. Author Response

      Reviewer #1 (Public Review):

      This study uses the mouse calyx of Held synapse as a model to explore the presynaptic role of rac1, a regulator of actin signaling in the brain. Many of the now-classical methods and theory pioneered by Neher and colleagues are brought to bear on this problem. Additionally, the authors were able to make a cell-specific knockout of rac1 by developing a novel viral construct to express cre in the globular bushy cells of the cochlear nucleus; by doing this in a rac1 floxed mouse, they were able to KO rac1 in these neurons starting at around P14. The authors found that KO of rac1 enhanced EPSC amplitude, vesicle release probability, quantal release rates, EPSC onset time and jitter during high-frequency activity, and fast recovery rates from depression. Because the calyx synapses are the largest and most reliable of central nerve terminals, all these various effects had no effect on suprathreshold transmission during 'in vivo-like' stimulus protocols. Moreover, there was no effect morphologically on the synapse. Through some unavoidably serpentine reasoning, the authors suggest that loss of rac1 affects the so-called molecular priming of vesicles, possibly due to a restructuring of actin barriers at the active zone. The experimental analysis is at a very high level, and the work is definitely an important contribution to the field of presynaptic physiology and biophysics. It will be important to test the effects of the KO on other synapses that are not such high-performers as the calyx, and this direction might reveal significant effects on information processing by altered rac1 expression.

      We thank the reviewer for their comments and view that our work is an important contribution to the field of presynaptic physiology and biophysics.

      Major points:

      1) The measurement of onset delay was used to test whether rac1-/- affects positional priming. While there is a clear effect of the KO on the latency to EPSC onset, there is no singular interpretation one can take, due to the ambiguity of the 'onset delay'. Note that in the Results authors state Lines 201-203: "The time between presynaptic AP and EPSC onset (EPSC onset delay) is determined by the distance between SVs and VGCC which defines the time it takes for Ca2+ to bind to the Ca2+ sensor and trigger SV release (Fedchyshyn and Wang, 2007)." However, in Methods "The duration between stimulus and EPSC onset was defined as EPSC onset delay." Thus the 'onset' measured is not between presynaptic spike and EPSC but from axonal stimulus and EPSC. KO of rac might also affect spike generation, spike conduction, calcium channel function, etc. Indeed some additional options are offered in the Discussion. Since the change in onset is ~100usec at most, a number of small factors all could contribute here. Moreover, the authors conclude that the KO does NOT affect positional priming since they would have expected the onset to shorten, given the other enhancements observed in earlier sections.

      It seems to me that all the authors can really conclude is that the onset shifted and they do not know why. If onset is driven by multiple factors, and differentially affected in the KO, then all bets are off. Thus, data in this section might be removed, or at least the authors could further qualify their interpretations given this ambiguity.

      We have further qualified and clarified our interpretations of the EPSC onset measurement. To do so, we have added additional text to the Discussion (see lines 475-491). We would like to emphasize that we do not see a statistically significant change in EPSC1 onset delay and EPSC onset delays during 50 Hz train stimuli between the Rac1+/+- and Rac1−/− synapses but rather an activity-dependent increase in EPSC onset delays in Rac1−/− synapses during 500 Hz stimulation. It is important to note that based on these data, it is less likely that changes in spike generation, spike conduction, or calcium channel function are responsible for the change in EPSC onset delay. If SVs were closer to CaV2.1 channels, we would expect shorter initial EPSC onset delay time or shorter EPSC onset delay times during 50 Hz stimulation. However, changes in spike generation, spike conduction or calcium channel function could contribute to the increase in the EPSC onset delay at 500 Hz. Finally, it is important to note that EPSC onset delay increase during 50 Hz and 500 Hz stimulation in Rac1+/+ synapses indicating an activity-dependent regulation. However, this activity-dependent increase was pronounced in Rac1−/− synapses during both 50 Hz and 500 Hz stimulation (Fig 4B1-B3).

      2) If the idea is that the loss of Rac1 leads to a reduced actin barrier at the active zone, is there an ultrastructural way to visualize this, labeling for actin for example? Authors conclude that new techniques are needed, but perhaps this is 'just' an EM question.

      We are not aware of a method for ultrastructural visualization of actin and SV distributions relative to the plasma membrane. To do so requires specific labeling and detection of actin filaments while visualizing SVs using EM. While EM on samples prepared by high-pressure freeze with freeze substitution allows for detection of filamentous structures near the AZ, the molecular identity of these filamentous structures would remain uncertain. Super-resolution microscopy is amenable to immunohistochemical techniques to label actin, but visualizing SVs in 3D using super-resolution is a major technical challenge. Furthermore, changes in SV docking on the scale of 1-2 nanometers are correlated with severe changes in SV release, therefore we would need to be able to quantify structural changes at this level of resolution. Currently, we are not aware of any study or report that has analyzed SV docking or reported changes on the scale of 1-2 nm using super-resolution light microscopy. It might be possible to use expansion microscopy to achieve such resolution but the respective protocols would need to be established for the calyx synapse. In addition, it is proposed that the regulation of actin filaments is transient and happens on very fast time scales which complicates their investigation by conventional methods (O'Neil et al., 2021). Thus, even if we were able to solve all these technical hurdles, it is well possible to miss potential differences even if we were able to label actin. Therefore, while we agree that having this type of ultrastructural data available would strongly strengthen our hypothesis, the development of the techniques and protocols needed to perform these types of experiments would likely require many months if not years.

      3) Authors use 1 mM kynurenic acid in the bath to avoid postsynaptic receptor saturation. But since this is a competitive antagonist and since the KO shows a large increase in release, could saturation or desensitization have been enhanced in the KO? This would affect the interpretation of recovery rates in the KO, which are quite fast.

      We agree with the reviewer that differences in saturation or desensitization could potentially impact the measured recovery time course in Rac1−/−. However, we think this is unlikely because of the following reasons: Desensitization and saturation of synaptic AMPARs is strongly reduced during calyx synapse maturation (Taschenberger et al., 2002; Taschenberger et al., 2005). We recorded from >P28 calyx synapses which exhibit a claw-like, fenestrated terminal morphology offering many diffusional exits for released glutamate which is expected to speed up transmitter clearance and therefore reduce postsynaptic effects (Taschenberger et al., 2005; Yang et al., 2021). We used 1 mM Kynurenic acid in the external bath solution which resulted in a ~90% reduction in EPSC amplitude in both Rac1+/+ and Rac1−/−, which is comparable to previous reports (e.g. Lipstein et al., 2021). In our study, we performed all experiments in 1.2 mM Ca2+ and at body temperature which further reduces EPSC amplitudes and minimizes potential receptor saturation and desensitization compared to 2 mM Ca2+ at room temperature. Time constants of recovery from desensitization at the calyx are between 30 ms at P14-P16 (Joshi et al., 2004) and 16 ms at P21 (Koike-Tani et al., 2008), both measured at room temperature. It is conceivable that the recovery from desensitization at P30 and at physiological temperature will be significantly shorter. Since we observed the largest effect in recovery between 1 and 4 seconds, this is at least two orders of magnitude slower than the recovery from desensitization could likely account for. Finally, our numerical simulations are consistent with the possibility of faster recovery rates observed in Rac1−/− being a direct consequence of changes in SV priming. This faster pool replenishment likely also enabled increased steady-state EPSC amplitudes at 50 Hz in Rac1−/− synapses. The fact that we were able to measure enhanced steady-state release in Rac1−/− argues against steady-state EPSC amplitudes being limited by AMPARs desensitization.

      Reviewer #2 (Public Review):

      The aim of the study is an improved understanding of the role of the RhoGTPase Rac1 in neurotransmitter release beyond the known roles in synaptogenesis and postsynaptic function. To this end, Rac1 is ablated at P12 (when synapse development has largely progressed to maturation) and transmission is studied at the adult stage (P28 onwards). The study reports a number of interesting findings, in particular, a large increase in synaptic strength, which is interpreted as an '... increased release probability, which results in faster SV replenishment'. It is not clear whether this statement is supposed to suggest a causal relationship or just a correlation between the two parameters. By and large, the discussion of results is somewhat fuzzy with respect to the distinction between release itself (as characterized by release probability) and priming steps, which precede release.

      Besides, the authors present valuable data on Rac1-dependent timing and synchronicity of neurotransmitter release, which point towards a role of Rac1 in 'positional priming', i. e. the proper localization of synaptic vesicles relative to Ca-channels.

      We thank the reviewer for pointing out that our study present valuable data on Rac1-dependent timing and synchronicity of neurotransmitter release.

    1. Author Response

      Reviewer #1 (Public Review):

      Redox signaling is a dynamic and concerted orchestra of inter-connected cellular pathways. There is always a debate whether ROS (reactive oxygen species) could be a friend or foe. Continued research is needed to dissect out how ROS generation and progression could diverge in physiological versus pathophysiological states. Similarly, there are several paradoxical studies (both animal and human) wherein exercise health benefits were reported to be accompanied by increases in ROS generation. It is in this context, that the present manuscript deserves attention.

      Utilizing the in-vitro studies as well as mice model work, this manuscript illustrates the different regulatory mechanisms of exercise and antioxidant intervention on redox balance and blood glucose level in diabetes. The manuscript does have some limitations and might need additional experiments and explanation.

      The authors should consider addressing the following comments with additional experiments.

      1) Although hepatic AMPK activation appears to be a central signaling element for the benefits of moderate exercise and glucose control, additional signals (on hepatic tissue) related to hepatic gluconeogenesis such as Forkhead box O1 (FoxO1), phosphoenolpyruvate carboxykinase (PEPCK), and GLUT2 needs to be profiled to present a holistic approach. Authors should consider this and revise the manuscript.

      We appreciate the constructive suggestion. Besides glycolysis, gluconeogenesis and glucose uptake are critical in maintaining liver and blood glucose homeostasis.

      FoxO1 has been tightly linked with hepatic gluconeogenesis through inhibiting the transcription of gluconeogenesis-related PEPCK and G6Pase expression (1, 2). Herein, we found the expression of FoxO1 increased in the diabetic group but reduced in the CE, IE and EE groups (Fig. X1A, Fig.5E-F in manuscript). Meanwhile, the mRNA level of Pepck and G6PC (one of the three G6Pase catalytic-subunit-encoding genes) also decreased in the CE, IE, and EE groups (Fig. X1B-1C, Fig.5H-I in manuscript). These results indicates that these three modes of exercise all inhibited gluconeogenesis through down-regulating FoxO1.

      For the glucose uptake, we detected the protein expression of GLUT2 in the liver tissue. Glut2 helps in the uptake of glucose by the hepatocytes for glycolysis and glycogenesis. Accordingly, we found GLUT2,a glucose sensor in liver, was up-regulated in diabetic rats, but down-regulated by the CE and IE intervention. However, GLUT2 didn’t decrease in the EE group, which is consistent with the results of the unimproved blood glucose by EE intervention (Figure X1A, Fig.5E and 5G in manuscript).

      Taken together, moderate exercise could benefits glucose control through increasing glycolysis and decreasing gluconeogenesis. We added this part in Page 9 line 251-263 and Figure 5E-5I in this version.

      Figure X1. A. Representative protein level and quantitative analysis of FOXO1 (82 kDa), GLUT2 (60-70 kDa) and Actin (45 kDa) in the rats in the Ctl, T2D, T2D + CE, T2D + IE and T2D + EE groups. C-D. Expression of hepatic Pepck and G6PC mRNA in the Ctl, T2D, T2D + CE, T2D + IE and T2D + EE groups were evaluated by real-time PCR analysis. Values represent mean ratios of Pepck and G6PC transcripts normalized to GAPDH transcript levels.

      2) Very recently sestrin2 signaling is assumed significant attention in relation to exercise and antioxidant responses. Therefore, authors should profile the sestrin2 levels as it is linked to several targets such as mTOR, AMPK and Sirt1. Additionally, the levels of Nrf2 should be reported as this is the central regulator of the threshold mechanisms of oxidative stress and ROS generation.

      We appreciate reviewer’s expert comments. Nrf2 is an important mediator of antioxidant signaling, playing a fundamental role in maintaining the redox homeostasis of the cell. Under unstressed conditions, Nrf2 activity is suppressed by its innate repressor Kelch-like ECH-associated protein 1 (Keap1) (3). With the increase of ROS level in the development of diabetes, Nrf2 was activated to induce the transcription of several antioxidant enzymes (4, 5).

      Nrf2 expression level has been reported to increase in HFD mice or diabetic patients (6, 7). It has been found from in vitro studies that NRF2 activation is achieved with acute exposure to high glucose, whereas longer incubation times or oscillating glucose concentration failed to activate Nrf2 (8, 9). These suggest that the increase of ROS in diabetes can cause compensatory upregulation of Nrf2. In our study, we found that Nrf2 increased in diabetic rats, which can further initiate the expression of antioxidant enzymes. As shown in Fig.X2A (Fig.2H-2K in manuscript), Grx and Trx involved in thioredoxin metabolism were up-regulated accordingly like Nrf2. After CE intervention, the level of Nrf2 increased further more (Fig.2E-2F), suggesting that CE intervention could activate antioxidant system to achieve a high-level redox balance. We have added these new results into Figure 2.

      On the other hand, the expression level of Sestrin2 and Nrf2 decreased after antioxidant supplement. Our results suggest that the antioxidant treatment improved the diabetes through inhibiting ROS level to achieve a low-level redox balance, but moderate exercise enhanced ROS tolerance to achieve a high-level balance (Fig.X2D-F, Fig.3E-3G in manuscript).

      We added the new data in “Page 5 line 147-153 and Page 7 line 183-186” and Figure 2-3 in current version.

      Figure X2. A-C. Representative protein level and quantitative analysis of Nrf2 (97 kDa), Sestrin2 (57 kDa) and Actin (45 kDa) in the rats in the Ctl, T2D and T2D + CE groups. D-F. Representative protein level and quantitative analysis of Nrf2 (97 kDa), Sestrin2 (57 kDa) and HSP90 (90 kDa) in the rats in the Ctl, T2D and T2D + APO groups.

      3) Authors should discuss the exercise-associated hormesis curve. They should discuss whether moderate exercise could decrease the sensitivity to oxidative stress by altering the bell-shaped dose-response curve.

      We thank the reviewer’s valuable comments. According to literatures, Zsolt Radak et al proposed a bell-shaped dose-response curve between normal physiological function and level of ROS in healthy individuals, and suggested that moderate exercise can extend or stretch the levels of ROS while increases the physiological function (10). Our results validated this hypothesis and further proposed that moderate exercise could produce ROS meanwhile increase antioxidant enzyme activity to maintain high level redox balance according to the Bell-shaped curve, whereas excessive exercise would generate a higher level of ROS, leading to reduced physiological function. In this study, we found the state of diabetic individuals is more applicable to the description of a S-shaped curve, due to the high level of oxidative stress and decreased reduction level in diabetic individuals (Fig.8B). With the increase of ROS, the physiological function of diabetic individuals gradually decreases and enters a state of redox imbalance. Moderate exercise shifts the S-shaped curve into a bell-shaped dose-response curve, thus reducing the sensitivity to oxidative stress in diabetic individuals and restoring redox homeostasis. However, with excessive exercise, ROS production increases beyond the threshold range of redox balance, resulting in decreased physiological function (Fig.8B, see the decreasing portion of the bell curve to the right of the apex).

      Nevertheless, the antioxidant intervention increased physiological activity by reducing ROS levels in diabetic individuals, restoring a bell-shaped dose-response curve at low level of ROS (Fig.8B). Therefore, redox balance could be achieved either at low level of ROS mediated by antioxidant intervention or at high level of ROS mediated by moderate exercise, both of which were regulated by AMPK activation. Therefore, both high and low levels of redox balance can lead to high physiological function as long as they are in the redox balance threshold range. Then, the activation of AMPK is an important sign of exercise or antioxidant intervention to obtain redox dynamic balance which helps restore physiological function. Accordingly, we speculate that the antioxidant intervention based on moderate exercise might offset the effect of exercise, but antioxidants could be beneficial during excessive exercise. The human study also supports that supplementation with antioxidants may preclude the health-promoting effects of exercise (11). Therefore, personalized intervention with respect to redox balance will be crucial for the effective treatment of diabetes patients.

      We added this part into “Discussion” in this version (Page 13-14 line 389-418).

      4) It would not be ideal to single-out AMPK as a sole biomarker in this manuscript. Instead, authors should consider AMPK activation and associated signaling in relation to redox balance. This should also be presented in Fig 7.

      We thank reviewer’s critical comments. According to the comments, we have discussed the AMPK signaling in the discussion part (Page 13, line 373-384) and added the AMPK signaling in Fig.8A.

      Reference:

      1. R. A. Haeusler, K. H. Kaestner, D. Accili, FoxOs function synergistically to promote glucose production. J Biol Chem 285, 35245-35248 (2010).
      2. J. Nakae, T. Kitamura, D. L. Silver, D. Accili, The forkhead transcription factor Foxo1 (Fkhr) confers insulin sensitivity onto glucose-6-phosphatase expression. J Clin Invest 108, 1359-1367 (2001).
      3. M. McMahon, K. Itoh, M. Yamamoto, J. D. Hayes, Keap1-dependent proteasomal degradation of transcription factor Nrf2 contributes to the negative regulation of antioxidant response element-driven gene expression. J Biol Chem 278, 21592-21600 (2003).
      4. R. S. Arnold et al., Hydrogen peroxide mediates the cell growth and transformation caused by the mitogenic oxidase Nox1. Proc Natl Acad Sci U S A 98, 5550-5555 (2001).
      5. J. M. Lee, M. J. Calkins, K. Chan, Y. W. Kan, J. A. Johnson, Identification of the NF-E2-related factor-2-dependent genes conferring protection against oxidative stress in primary cortical astrocytes using oligonucleotide microarray analysis. J Biol Chem 278, 12029-12038 (2003).
      6. T. Jiang et al., The protective role of Nrf2 in streptozotocin-induced diabetic nephropathy. Diabetes 59, 850-860 (2010).
      7. X. H. Wang et al., High Fat Diet-Induced Hepatic 18-Carbon Fatty Acids Accumulation Up-Regulates CYP2A5/CYP2A6 via NF-E2-Related Factor 2. Front Pharmacol 8, 233 (2017).
      8. T. S. Liu et al., Oscillating high glucose enhances oxidative stress and apoptosis in human coronary artery endothelial cells. J Endocrinol Invest 37, 645-651 (2014).
      9. Z. Ungvari et al., Adaptive induction of NF-E2-related factor-2-driven antioxidant genes in endothelial cells in response to hyperglycemia. Am J Physiol Heart Circ Physiol 300, H1133-1140 (2011).
      10. Z. Radak et al., Exercise, oxidants, and antioxidants change the shape of the bell-shaped hormesis curve. Redox Biol 12, 285-290 (2017).
      11. M. Ristow et al., Antioxidants prevent health-promoting effects of physical exercise in humans. Proc Natl Acad Sci U S A 106, 8665-8670 (2009).
    1. Author Response

      Reviewer #2 (Public Review):

      Klein et al. have developed a high-throughput tracker to evaluate operant conditioning in Drosophila larvae. Employing this device, they train larvae to prefer bending towards one specific side (left or right), by using as unconditioned stimulus (US) the optogenetic activation of dopaminergic and serotoninergic neurons, demonstrating that larvae are able to perform this behaviour. Furthermore, they show that serotoninergic neurons alone are sufficient to mediate the reward signal, and that specifically serotoninergic neurons in the VNC are required for this behaviour. However, they do not show whether serotoninergic VNC neurons are sufficient. The results are interesting and novel. Operant conditioning had been shown for Drosophila adult. Furthermore, the existence of VNC circuits sufficient for operant conditioning had been shown for other species, as the authors point out in the discussion. Nonetheless, the genetic dissection to identify serotonine expressing neurons as mediators of operant conditioning in the Drosophila larva, and the identification of VNC serotonine cells as necessary are new. Furthermore, given the experimental advantages of the Drosophila larva, including genetic accessibility and a full connectome, the findings open the door to future research into the circuit mechanisms of operant conditioning. I have some comments that I think would be important to address.

      The high-throughput tracker is impressive. However, there is no sufficient documentation to ensure that an expert would be able to easily reproduce it. All of the hardware assembly files, the list of materials, as well as the electronic circuit maps and all of the required software needs to be appropriately documented and uploaded onto a public repository. This is a basic requirement when publishing new hardware/software, particularly in an open journal such as eLife.

      We have now included all the documentation and CAD files for the high-throughput tracker. The software is publicly available in the following Github repository (https://github.com/ZlaticLab/multi-larva-tracker-scripts-public). The CAD files are available in the Supplementary materials of the paper.

      • The differences observed in the results of operant conditioning are very subtle (see for example figure 3c), which means that it is extremely important that statistic analyses are correctly made. The sample number (n) for these experiments is really high (n>100) and for what I understood is not equivalent to the number of animals, because the same animal can generate n >1, eg. n = 2 or n =3 if it collides one or two times, as each time it collides a new identity is given to the larvae. This means that the datapoints collected are not independent, and I think in that case a Wilcoxon rank-sum test is not the appropriate test to take. I recommend the authors and eLife editors to consult with an expert in this type of statistics. Alternatively, the authors could, for each experiment, take into account only the data from larvae that did not collide, and for those that collide only take into account the data before the collision. This can be calculated easily as they just need to exclude from their analysis in each experiment all of the larval IDs where the ID is larger than the initial number of larvae identified by the software.

      We apologise if we did not clarify sufficiently that we only took into account (for each time bin) larvae that did not collide. Within the Materials and methods, we describe how objects retained for analysis had to satisfy several criteria. The first criterion is that the object needed to be detected in every frame of the given 60 s bin. In this way, the object identity is stable throughout the bin - a reflection that the object did not collide with another object. In other words, within a single time bin, the same animal only contributes once. Text has been added to the Materials and methods to clarify that this first criterion is selecting for larvae that did not collide.

      The reviewer mentions that Wilcoxon rank-sum test is not the appropriate nonparametric test for dependent samples. We agree. In accordance with this, the test used for within-bin comparisons was Wilcoxon signed-rank, which is also nonparametric but is for dependent samples. We believe, then, that there is no need to reconsider the statistical tests used.

      -The finding that serotoninergic neurons in the VNC, which with the line they used amount to only 2 neurons per VNC hemisegment, are required for operant conditioning is very interesting. It would be great if they could also test whether they are sufficient. It seems that they would just need to make two split Gal4 lines one for tsh and one for tph, so the experiment does not seem too difficult and would significantly add to their findings.

      Generating new intersections is beyond the scope of this already large study which has been significantly impacted by the pandemic. We have therefore added the following sections below explaining that we have identified candidate serotonergic neurons that are required for operant learning and that identifying specific single neuron types that may be sufficient would be an exciting avenue for future follow-up work.

      In the Results section entitled, “Serotonergic VNC neurons may play role in operant conditioning of bend direction” we have added:

      “The Tph-Gal4 expression pattern contains two neurons per VNC hemisegment (with the exception of a single neuron in each A8 abdominal hemisegment, Huser2012). Future experiments exclusively targeting a single serotonergic neuron per VNC hemisegment could be valuable in determining whether they are sufficient for operant learning.”

      In the Discussion section entitled: “Automated operant conditioning of Drosophila larvae”

      “Furthermore, developing sparser lines that target single serotonergic and dopaminergic neuron types will enable the identification of the smallest subsets of neurons that are sufficient for providing the operant learning signal. Behavioural experiments with these genetic lines may have the added benefit of mitigating conflicting or non-specific reinforcement signalling.”

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript is clear and well-written and provides a novel and interesting explanation of different illusions in visual numerosity perception. However, the model used in the manuscript is very similar to Dehaene and Changeux (1993) and the manuscript does not clearly identify novel computational principles underlying the number sense, as the title would suggest. Thus, while we were all enthusiastic about the topic and the overall findings, the paper currently reads as a bit of a replication of the influential Dehaene & Changeux (1993)-model, and the authors need to do more to compare/contrast to bring out the main results that they think are novel.

      Major concerns:

      1) The model presented in the current manuscript is very similar to the Dehaene and Changeux 1993 model. The main difference is in the implementation of lateral inhibition in the DoG layer where the 1993 model used a recurrent implementation, and the current model uses divisive normalization (see minor concern #1). The lateral inhibition was also identified as a critical component of numerosity estimation in the 1993 model, so the novelty in elucidating the computational principles underlying the number sense in the current manuscript is not evident.

      If the authors hypothesize that the particular implementation of lateral inhibition used here is more relevant and critical for the number sense than the forms used in previous work (e.g., the recurrent implementation of the 1993 model or the local response normalization of the more recent models), then a direct comparison of the effects of the different forms is necessary to show this. If not, then the focus of the manuscript should be shifted (e.g., changing the title) to the novel aspects of the manuscript such as the use of the model to explain various visual illusions and adaptation and context effects.

      Thank you for bringing up these issues. We acknowledge that there was a lack of clear explanations for the key differences between the proposed model and that of Dehaene & Changeux (hereafter D&C). Please see our revisions below where we: 1) explain the D&C model and its limitations in more in detail; 2) our critical changes to the D&C model; and 3) how those critical changes allow a novel way to explain numerosity perception.

      The paragraph in the Introduction where we first introduce D&C is modified to read:

      “The computational model of Dehaene and Changeux (1993) explains numerosity detection based on several neurocomputational principles. That model (hereafter D&C) assumes a one-dimensional linear retina (each dot is a line segment), and responses are normalized across dot size via a convolution layer that represents combinations of two attributes: 1) dot size, as captured by difference-of-Gaussian contrast filters of different widths; and 2) location, by centering filters at different positions. In the convolution layer, the filter that matches the size of each dot dominates the neuronal activity at the location of the dot owing to a winner-take-all lateral inhibition process. To indicate numerosity, a summation layer pools the total activity over all the units in the convolution layer. While the D&C model provided a proof of concept for numerosity detection, it has several limitations as outlined in the discussion. Of these, the most notable is that strong winner-take-all in the convolution layer discretizes visual information (e.g., discrete locations and discrete sizes yielding a literal count of dots), which is implausible for early vision. As a result, the output of the model is completely insensitive to anything other than number in all situations, which is inconsistent with empirical data (Park et al., 2021).”

      The revised Discussion describes our critical modifications to D&C and their consequences.

      “At first blush, the current model might be considered an extension of Dehaene and Changeux (1993). However, there are four ways in which the current model differs qualitatively from the D&C model. First, the D&C model is one-dimensional, simulating a linear retina, whereas we model a two-dimensional retina feeding into center-surround filters, allowing application to the two-dimensional images used in numerosity experiments (Fig. 1A). Second, extreme winner-take-all normalization in the convolution layer of the D&C model implausibly limits visual precision by discretizing the visual response. For example, the convolution layer in the D&C model only knows which of 9 possible sizes and 50 possible locations occurred. In contrast, by using divisive normalization in the current model, each dot produces activity at many locations and many filter sizes despite normalization, and a population could be used to determine exact location and size. Third, extreme winner-take-all normalization also eliminates all information other than dot size and location. By using divisive normalization, the current model represents other attributes such edges and groupings of dots (Fig. 1B) and these other attributes provide a different explanation of number sensitivity as compared to D&C. For example, the D&C model as applied to the spacing effect between two small dots (Fig. 4A) would represent the dots as existing discretely at two close locations versus two far locations, with the total summed response being two in either case. In contrast, the current model gives the same total response for a different reason. Although the small filters are less active for closely spaced dots, the closely spaced dots look like a group as captured by a larger filter, with this addition for the larger filter offsetting the loss for the smaller filter. Similarly, as applied to the dot size effect (Fig. 4B), the D&C model would only represent the larger dots using larger filters. In contrast, the current model represents larger dots with larger filters and with smaller filters that capture the edges of the larger dots, and yet the summed response remains the same in each case owing to divisive normalization (again, there are offsetting factors across different filter sizes). The final difference is that the D&C model does not include temporal normalization, which we show to be critical for explaining adaptation and context effects.”

      In sum, the current model explains a wider range of effects by using representations and processes that more closely reflect early vision. The change to two-dimensions allows application to real images. The inclusion of temporal normalization allows application to temporal effects. The change from winner-take-all to divisive normalization might appear to be a parameter setting, but it’s one that produces qualitatively different results and explanations (e.g., representations of edges and groupings that are part of the explanation of selective sensitivity to number). These behaviors are consistent with empirical data and are qualitatively different from that of the D&C model. Now that we’ve highlighted the ways in which this model differs qualitatively from the D&C model, we hope that our original title still works.

      Reviewer #2 (Public Review):

      This is a very interesting and novel model of numerosity perception, based on known computational principles of the visual system: center-surround mechanisms at various scales, combined with divisive normalization (over space and time). The model explains, at least qualitatively, several of the important aspects of numerosity perception.

      Firstly, the model makes major and minor predictions. Major: the effect of adaptation, at least 30%, as well as impendence of several densities and dot size; minor: tiny effects like irregularity, around 6%. I think it would make sense to separate these. To my knowledge, it is the first to account for adaptation, which was the major effect that brought numerosity into the realm of psychophysics: and it explains it effortlessly, using an intrinsic component of the model (divisive normalization), not with an ad-hoc add-on. This should be highlighted more. And perhaps, the fit can be more quantitative. Murphy and Burr (who they cite) showed that the adaptation is rapid. How does this fit the model? Very well, I would have thought.

      Thanks for the positive evaluation of our work. In the revised manuscript, we followed the reviewer’s suggestion to highlight the novelty of the model in its explanation of numerosity adaptation. As the reviewer says, one significant aspect of our work is that the model can explain a relatively large effect of numerosity adaptation with minimal effort. To be clear, even though we call it “numerosity” adaptation, the model does not know number in any explicit way. One way to highlight this aspect, we thought, is to compare the current adaptation results to a simulation where the adaptor and target are defined along the dimensions of size or spacing. In such cases (which are now reported in Fig. S6 and S7), no reliable under- or over-estimation was observed. These results suggest that numerosity adaptation is a natural byproduct of divisive normalization working across space and time.

      The question about the rapidity of adaptation is indeed an interesting one. However, the current model is not designed to simulate the effect of exposure duration on neural activity. More specifically, the current model operates across trials and stimuli (e.g., one response per stimulus), using a single parameter that captures the temporal gradient of divisive normalization from prior trials (e.g., the influence of two trials ago as compared to one trial ago). As currently formulated, the model does not address adaptation at the level of milliseconds, as would be necessary to model adaptor duration. To model adaptation at the millisecond level requires a dynamic model that not only specifies the rate of adaptation but also the rate of recovery from adaptation, such as in the visual orientation adaptation model of Jacob, Potter, and Huber (2021), which includes the dynamics of synaptic depression and synaptic recovery. In future work we hope to make such modifications to the model to expand the range of explained effects. Nevertheless, a dynamic version of the model should encompass this simpler trial-by-trial version of the model as a special case. Our goal in this study was a clear demonstration of the neural mechanisms underlying numerosity in early vision and so we have attempted to keep the model as simple as possible while still capturing neural behavior.

      We have elected not to fit data and instead we explored the behavior model in a qualitative way, asking whether the commonly observed numerosity effects emerge from the model in the qualitatively correct direction regardless of its parameter values (e.g., as reported in Fig S2). This approach follows from our central aim, which is to explain the neurocomputational principles of the number sense rather than produce a detailed model with specific parameters values fit to data. Our aim was to show that the correct qualitative behaviors naturally emerge from these principles without requiring specific parameter values (and more importantly, to show how these behaviors emerge from these principles).

      Jacob, L. P., Potter, K. W., & Huber, D. E. (2021). A neural habituation account of the negative compatibility effect. Journal of Experimental Psychology: General, 150(12), 2567.

      Among the tiny predicted effects (visually indistinguishable bar graphs) is the connectedness effect. But this is in fact large, up to 20%. I would say they fail here, by predicting only 6%. And I would say this is to be expected, as the illusion relies on higher-order properties (grouping), which would not immediately result from normalization. Furthermore, the illusion varies with individual personality traits (Pomè et al, JAD, 2021). The fact that it works with very thin lines suggests that it is not the physical energy of the lines that normalizes, but the perceptual grouping effect. I would either drop it, or give it as an example of where the predictions are in the right direction, but clearly fall short quantitatively. No shame in saying that they cannot explain everything with low-level mechanisms. A future revised model could incorporate grouping phenomena.

      Thank you for the suggestion. We agree that trying to explain the connectedness illusion with center-surround filters is not ideal. As the reviewer says, the main driver of the connectedness illusion is likely to be groupings of dots. The current model captures groupings of dots, but it does so in a circularly symmetric way, which is not ideal for capturing the oblong groupings (barbells) that are likely to play a role in the connectedness illusion. It is probably because of this mismatch (between the shape of the groupings and shape of the filters) that the model produces a smaller magnitude connectedness illusion. If the model included a subsequent convolution layer in which the filters were oriented lines of different sizes, it would likely produce a larger connectedness illusion. Following the reviewer’s suggestion, we have placed the connectedness illusion in the supplementary materials and only refer to this in the future directions section of the discussion, writing:

      “Another line of possible future work concerns divisive normalization in higher cortical levels involving neurons with more complex receptive fields. While the current normalization model with center-surround filters successfully explained visual illusions caused by regularity, grouping, and heterogeneity, other numerosity phenomena such as topological invariants and statistical pairing (He et al., 2015; Zhao and Yu, 2016) may require the action of neurons with receptive fields that are more complex than center-surround filters. For example, another well-known visual illusion is the effect of connectedness, whereby an array with dots connected pairwise with thin lines is underestimated (by up to 20%) compared to the same array without the lines connected (Franconeri et al., 2009). This underestimation effect likely arises from barbell-shaped pairwise groupings of dots, rather than the circularly symmetric groupings of dots that are captured with center-surround filters. Nonetheless, a small magnitude (6%) connectedness illusion emerges with center-surround filters (Fig. S10). Augmenting the current model with a subsequent convolution layer containing oriented line filters and oriented normalization neighborhoods of different sizes might increase the predicted magnitude of the illusion.”

      In short, I like the model very much, but think the manuscript could be packaged better. Bring out the large effects more, especially those that have never been explained previously (like adaptation). And try to be more quantitative.

      Thank you. We now highlight the novel computational demonstrations of adaptation to a greater degree and—as also suggested by Reviewer 1—provide more quantitative reports of the illusory effects that the model naturally produces.

    1. Author Response:

      Reviewer #1 (Public Review):

      In this manuscript, the authors leverage novel computational tools to detect, classify and extract information underlying sharp-wave ripples, and synchronous events related to memory. They validate the applicability of their method to several datasets and compare it with a filtering method. In summary, they found that their convolutional neural network detection captures more events than the commonly used filter method. This particular capability of capturing additional events which traditional methods don't detect is very powerful and could open important new avenues worth further investigation. The manuscript in general will be very useful for the community as it will increase the attention towards new tools that can be used to solve ongoing questions in hippocampal physiology.

      We thank the reviewer for the constructive comments and appreciation of the work.

      Additional minor points that could improve the interpretation of this work are listed below:

      • Spectral methods could also be used to capture the variability of events if used properly or run several times through a dataset. I think adjusting the statements where the authors compare CNN with traditional filter detections could be useful as it can be misleading to state otherwise.

      We thank the reviewer for this suggestion. We would like to emphasize that we do not advocate at all for disusing filters. We feel that a combination of methods is required to improve our understanding of the complex electrophysiological processes underlying SWR. We have adjusted the text as suggested. In particular, a) we removed the misleading sentence from the abstract, and instead declared the need for new automatic detection strategies; b) we edited the introduction similarly, and clarified the need for improved online applications.

      • The authors show that their novel method is able to detect "physiological relevant processes" but no further analysis is provided to show that this is indeed the case. I suggest adjusting the statement to "the method is able to detect new processes (or events)".

      We have corrected text as suggested. In particular, we declare that “The new method, in combination with community tagging efforts and optimized filter, could potentially facilitate discovery and interpretation of the complex neurophysiological processes underlying SWR.” (page 12).

      • In Fig.1 the authors show how they tune the parameters that work best for their CNN method and from there they compare it with a filter method. In order to offer a more fair comparison analogous tuning of the filter parameters should be tested alongside to show that filters can also be tuned to improve the detection of "ground truth" data.

      Thank you for this comment. As explained before, see below the results of the parameter study for the filter in the very same sessions used for training the CNN. The parameters chosen (100- 300Hz band, order 2) provided maximal performance in the test set. Therefore, both methods are similarly optimized along training. This is now included (page 4): “In order to compare CNN performance against spectral methods, we implemented a Butterworth filter, which parameters were optimized using the same training set (Fig.1-figure supplement 1D).”

      • Showing a manual score of the performance of their CNN method detection with false positive and false negative flags (and plots) would be clarifying in order to get an idea of the type of events that the method is able to detect and fails to detect.

      We have added information of the categories of False Positives for both the CNN and the filter in the new Fig.4F. We have also prepared an executable figure to show examples and to facilitate understanding how the CNN works. See new Fig.5 and executable notebook https://colab.research.google.com/github/PridaLab/cnn-ripple-executable-figure/blob/main/cnn-ripple-false-positive-examples.ipynb

      • In fig 2E the authors show the differences between CNN with different precision and the filter method, while the performance is better the trends are extremely similar and the numbers are very close for all comparisons (except for the recall where the filter clearly performs worse than CNN).

      This refers to the external dataset (Grosmark and Buzsaki 2016), which is now in the new Fig.3E. To address this point and to improve statistical report, we have added more data resulting in 5 sessions from 2 rats. Data confirm better performance of CNN model versus the filter. The purpose of this figure is to show the effect of the definition of the ground truth on the performance by different methods, and also the proper performance of the CNN on external datasets without retraining. Please, note that in Grosmark and Buzsaki, SWR detection was conditioned on the

      coincidence of both population synchrony and LFP definition thus providing a “partial ground truth” (i.e. SWR without population firing were not annotated in the dataset).

      • The authors acknowledge that various forms of SWRs not consistent with their common definition could be captured by their method. But theoretically, it could also be the case that, due to the spectral continuum of the LFP signals, noisy features of the LFP could also be passed as "relevant events"? Discussing this point in the manuscript could help with the context of where the method might be applied in the future.

      As suggested, we have mentioned this point in the revised version. In particular: “While we cannot discard noisy detection from a continuum of LFP activity, our categorization suggest they may reflect processes underlying buildup of population events (de la Prida et al., 2006). In addition, the ability of CA3 inputs to bring about gamma oscillations and multi-unit firing associated with sharp-waves is already recognized (Sullivan et al., 2011), and variability of the ripple power can be related with different cortical subnetworks (Abadchi et al., 2020; Ramirez- Villegas et al., 2015). Since the power spectral level operationally defines the detection of SWR, part of this microcircuit intrinsic variability may be escaping analysis when using spectral filters” (page 16).

      • In fig. 5 the authors claim that there are striking differences in firing rate and timings of pyramidal cells when comparing events detected in different layers (compare to SP layer). This is not very clear from the figure as the plots 5G and 5H show that the main differences are when compare with SO and SLM.

      We apologize for generating confusion. We meant that the analysis was performed by comparing properties of SWR detected at SO, SR and SLM using z- values scored by SWR detected at SP only). We clarified this point in the revised version: “We found larger sinks and sources for SWR that can be detected at SLM and SR versus those detected at SO (Fig.7G; z-scored by mean values of SWR detected at SP only).” (page 14).

      • Could the above differences be related to the fact that the performance of the CNN could have different percentages of false-positive when applied to different layers?

      The rate of FP is similar/different across layers: 0.52 ± 0.21 for SO, 0.50 ± 0.21 for SR and 0.46 ± 0.19 for SLM. This is now mentioned in the text: “No difference in the rate of False Positives between SO (0.52 ± 0.21), SR (0.50 ± 0.21) and SLM (0.46 ± 0.19) can account for this effect.” (page 12)

      Alternatively, could the variability be related to the occurrence (and detection) of similar events in neighboring spectral bands (i.e., gamma events)? Discussion of this point in the manuscript would be helpful for the readers.

      We have discussed this point: “While we cannot discard noisy detection from a continuum of LFP activity, our categorization suggest they may reflect processes underlying buildup of population events (de la Prida et al., 2006). In addition, the ability of CA3 inputs to bring about gamma oscillations and multi-unit firing associated with sharp-waves is already recognized (Sullivan et al., 2011), and variability of the ripple power can be related with different cortical subnetworks (Abadchi et al., 2020; Ramirez-Villegas et al., 2015).” (Page 16)

      Overall, I think the method is interesting and could be very useful to detect more nuance within hippocampal LFPs and offer new insights into the underlying mechanisms of hippocampal firing and how they organize in various forms of network events related to memory.

      We thank the reviewer for constructive comments and appreciation of the value of our work.

      Reviewer #2 (Public Review):

      Navas-Olive et al. provide a new computational approach that implements convolutional neural networks (CNNs) for detecting and characterizing hippocampal sharp-wave ripples (SWRs). SWRs have been identified as important neural signatures of memory consolidation and retrieval, and there is therefore interest in developing new computational approaches to identify and characterize them. The authors demonstrate that their network model is able to learn to identify SWRs by showing that, following the network training phase, performance on test data is good. Performance of the network varied by the human expert whose tagging was used to train it, but when experts' tags were combined, performance of the network improved, showing it benefits from multiple input. When the network trained on one dataset is applied to data from different experimental conditions, performance was substantially lower, though the authors suggest that this reflected erroneous annotation of the data, and once corrected performance improved. The authors go on to analyze the LFP patterns that nodes in the network develop preferences for and compare the network's performance on SWRs and non-SWRs, both providing insight and validation about the network's function. Finally, the authors apply the model to dense Neuropixels data and confirmed that SWR detection was best in the CA1 cell layer but could also be detected at more distant locations.

      The key strengths of the manuscript lay in a convincing demonstration that a computational model that does not explicitly look for oscillations in specific frequency bands can nevertheless learn to detect them from tagged examples. This provides insight into the capabilities and applications of convolutional neural networks. The manuscript is generally clearly written and the analyses appear to have been carefully done.

      We thank the reviewer for the summary and for highlighting the strengths of our work.

      While the work is informative about the capabilities of CNNs, the potential of its application for neuroscience research is considerably less convincing. As the authors state in the introduction, there are two potential key benefits that their model could provide (for neuroscience research): 1. improved detection of SWRs and 2. providing additional insight into the nature of SWRs, relative to existing approaches. To this end, the authors compare the performance of the CNN to that of a Butterworth filter. However, there are a number of major issues that limit the support for the authors' claims:

      Please, see below the answers to specific questions, which we hope clarify the validity of our approach

      • Putting aside the question of whether the comparison between the CNN and the filter is fair (see below), it is unclear if even as is, the performance of the CNN is better than a simple filter. The authors argue for this based on the data in Fig. 1F-I. However, the main result appears to be that the CNN is less sensitive to changes in the threshold, not that it does better at reasonable thresholds.

      This comment now refers to the new Fig.2A (offline detection) and Fig.2C,D (online detection). Starting from offline detection, yes, the CNN is less sensitive than the filter and that has major consequences both offline and online. For the filter to reach it best performance, the threshold has to be tuned which is a time-consuming process. Importantly, this is only doable when you know the ground truth. In practical terms, most lab run a semi-automatic detection approach where they first detect events and then they are manually validated. The fact that the filter is more sensible to thresholds makes this process very tedious. Instead, the CNN is more stable.

      In trying to be fair, we also tested the performance of the CNN and the filter at their best performance (i.e. looking for the threshold f¡providing the best matching with the ground truth). This is shown at Fig.3A. There are no differences between methods indicating the CNN meet the gold standard provided the filter is optimized. Note again this is only possible if you know the ground truth because optimization is based in looking for the best threshold per session.

      Importantly, both methods reach their best performance at the expert’s limit (gray band in Fig.3A,B). They cannot be better than the individual ground truth. This is why we advocate for community tagging collaborations to consolidate sharp-wave ripple definitions.

      Moreover, the mean performance of the filter across thresholds appears dramatically dampened by its performance on particularly poor thresholds (Fig. F, I, weak traces). How realistic these poorly tested thresholds are is unclear. The single direct statistical test of difference in performance is presented in Fig. 1H but it is unclear if there is a real difference there as graphically it appears that animals and sessions from those animals were treated as independent samples (and comparing only animal averages or only sessions clearly do not show a significant difference).

      Please, note this refers to online detection. We are not sure to understand the comment on whether the thresholds are realistic. To clarify, we detect SWR online using thresholds we similarly optimize for the filter and the CNN over the course of the experiment. This is reported in Fig.2C as both, per session and per animals, reaching statistical differences (we added more experiments to increase statistical power). Since, online defined thresholds may still not been the best, we then annotated these data and run an additional posthoc offline optimization analysis which is presented in Fig.2D. We hope this is now more clear in the revised version.

      Finally, the authors show in Fig. 2A that for the best threshold the CNN does not do better than the filter. Together, these results suggest that the CNN does not generally outperform the filter in detecting SWRs, but only that it is less sensitive to usage of extreme thresholds.

      We hope this is now clarified. See our response to your first bullet point

      Indeed, I am not convinced that a non-spectral method could even theoretically do better than a spectral method to detect events that are defined by their spectrum, assuming all other aspects are optimized (such as combining data from different channels and threshold setting)

      As can be seen in the responses to the editor synthesis, we have optimized the filter parameter similarly (new Fig.1-supp-1D) and there is no improvement by using more channels (see below). In any case, we would like to emphasize that we do not advocate at all for disusing filters. We feel that a combination of methods is required to improve our understanding of the complex electrophysiological processes underlying SWR.

      • The CNN network is trained on data from 8 channels but it appears that the compared filter is run on a single channel only. This is explicitly stated for the online SWR detection and presumably, that is the case for the offline as well. This unfair comparison raises the possibility that whatever improved performance the CNN may have may be due to considerably richer input and not due to the CNN model itself. The authors state that a filter on the data from a single channel is the standard, but many studies use various "consensus" heuristics, e.g. in which elevated ripple power is required to be detected on multiple channels simultaneously, which considerably improves detection reliability. Even if this weren't the case, because the CNN learns how to weight each channel, to argue that better performance is due to the nature of the CNN it must be compared to an algorithm that similarly learns to optimize these weights on filtered data across the same number of channels. It is very likely that if this were done, the filter approach would outperform the CNN as its performance with a single channel is comparable.

      We appreciate this comment. Using one channel to detect SWR is very common for offline detection followed by manual curation. In some cases, a second channel is used either to veto spurious detections (using a non-ripple channel) or to confirm detection (using a second ripple channel and/or a sharp-wave) (Fernandez-Ruiz et al., 2019). Many others use detection of population firing together with the filter to identify replay (such as in Grosmark and Buzsaki 2019, where ripples were conditioned on the coincidence of both population firing and LFP detected ripples). To address this comment, we compared performance using different combinations of channels, from the standard detection at the SP layer (pyr) up to 4 and 8 channels around SP using the consensus heuristics. As can be seen filter performance is consistent across configurations and using 8 channels is not improving detection. We clarify this in the revised version: ”We found no effect of the number of channels used for the filter (1, 4 and 8 channels), and chose that with the higher ripple power” (see caption of Fig.1-supp-1D).

      • Related to the point above, for the proposed CNN model to be a useful tool in the neuroscience field it needs to be amenable to the kind of data and computational resources that are common in the field. As the network requires 8 channels situated in close proximity, the network would not be relevant for numerous studies that use fewer or spaced channels. Further, the filter approach does not require training and it is unclear how generalizable the current CNN model is without additional network training (see below). Together, these points raise the concern that even if the CNN performance is better than a filter approach, it would not be usable by a wide audience.

      Thank you for this comment. To handle with different input channel configurations, we have developed an interpolation approach, which transform any data into 8-channel inputs. We are currently applying the CNN without re-training to data from several labs using different electrode number and configurations, including tetrodes, linear silicon probes and wires. Results confirm performance of the CNN. Since we cannot disclose these third-party data here, we have looked for a new dataset from our own lab to illustrate the case. See below results from 16ch silicon probes (100 um inter-electrode separation), where the CNN performed better than the filter (F1: p=0.0169; Precision, p=0.0110; 7 sessions, from 3 mice). We found that the performance of the CNN depends on the laminar LFP profile, as Neuropixels data illustrate.

      • A key point is whether the CNN generalizes well across new datasets as the authors suggest. When the model trained on mouse data was applied to rat data from Grosmark and Buzsaki, 2016, precision was low. The authors state that "Hence, we evaluated all False Positive predictions and found that many of them were actually unannotated SWR (839 events), meaning that precision was actually higher". How were these events judged as SWRs? Was the test data reannotated?

      We apologize for not explaining this better in the original version. We choose Grosmark and Buzsaki 2016 because it provides an “incomplete ground truth”, since (citing their Methods) “Ripple events were conditioned on the coincidence of both population synchrony events, and LFP detected ripples”. This means there are LFP ripples not included in their GT. This dataset provides a very good example of how the experimental goal (examining replay and thus relying in population firing plus LFP definitions) may limit the ground truth.

      Please, note we use the external dataset for validation purposes only. The CNN model was applied without retraining, so it also helps to exemplify generalization. Consistent with a partial ground truth, the CNN and the filter recalled most of the annotated events, but precision was low. By manually validating False Positive detections, we re-annotated the external dataset and both the CNN and the filter increased precision.

      To make the case clearer, we now include more sessions to increase the data size and test for statistical effects (Fig.3E). We also changed the example to show more cases of re-annotated events (Fig.3D). We have clarified the text: “In that work, SWR detection was conditioned on the coincidence of both population synchrony and LFP definition, thus providing a “partial ground truth” (i.e. SWR without population firing were not annotated in the dataset).” (see page 7).

      • The argument that the network improves with data from multiple experts while the filter does not requires further support. While Fig. 1B shows that the CNN improves performance when the experts' data is combined and the filter doesn't, the final performance on the consolidated data does not appear better in the CNN. This suggests that performance of the CNN when trained on data from single experts was lower to start with.

      This comment refers to the new Fig.3B. We apologize for not have had included a between- method comparison in the original version. To address this, we now include a one-way ANOVA analysis for the effect of the type of the ground truth on each method, and an independent one- way ANOVA for the effect of the method in the consolidated ground truth. To increase statistical power we have added more data. We also detected some mistake with duplicated data in the original figure, which was corrected. Importantly, the rationale behind experts’ consolidated data is that there is about 70% consistency between experts and so many SWR remain not annotated in the individual ground truths. These are typically some ambiguous events, which may generate discussion between experts, such as sharp-wave with population firing and few ripple cycles. Since the CNN is better in detecting them, this is the reason supporting they improve performance when data from multiple experts are integrated.

      Further, regardless of the point in the bullet point above, the data in Fig. 1E does not convincingly show that the CNN improves while the filter doesn't as there are only 3 data points per comparison and no effect on F1.

      Fig.1E shows an example, so we guess the reviewer refers to the new Fig.2C, which show data on online operation, where we originally reported the analysis per session and per animal separately with only 3 mice. We have run more experiments to increase the data size and test for statistical effects (8 sessions, 5 mice; per sessions p=0.0047; per mice p=0.033; t-test). This is now corrected in the text and Fig.1C, caption. Please, note that a posthoc offline evaluation of these online sessions confirmed better performance of the CNN versus the filter, for all normalized thresholds (Fig.2D).

      • Apart from the points above regarding the ability of the network to detect SWRs, the insight into the nature of SWRs that the authors suggest can be achieved with CNNs is limited. For example, the data in Fig. 3 is a nice analysis of what the components of the CNN learn to identify, but the claim that "some predictions not consistent with the current definition of SWR may identify different forms of population firing and oscillatory activities associated to sharp-waves" is not thoroughly supported. The data in Fig. 4 is convincing in showing that the network better identifies SWRs than non-SWRs, but again the insight is about the network rather than about SWRs.

      In the revised version, have now include validation of all false positives detected by the CNN and the filter (Fig.4F). To facilitate the reader examining examples of True Positive and False Positive detection we also include a new figure (Fig.5), which comes with the executable code (see page 9). We also include comparisons of the features of TP events detected by both methods (Fig.2B), where is shown that SWR events detected by the CNN exhibited features more similar to those of the ground truth (GT), than those detected by the filter. We feel the entire manuscript provides support to these claims.

      Finally, the application of the model on Neuropixels data also nicely demonstrates the applicability of the model on this kind of data but does not provide new insight regarding SWRs.

      We respectfully disagree. Please, note that application to ultra-dense Neuropixels not only apply the model to an entirely new dataset without retraining, but it shows that some SWR with larger sinks and sources can be actually detected at input layers (SO, SR and SLM). Importantly, those events result in different firing dynamics providing mechanistic support for heterogeneous behavior underlying, for instance, replay.

      In summary, the authors have constructed an elegant new computational tool and convincingly shown its validity in detecting SWRs and applicability to different kinds of data. Unfortunately, I am not convinced that the model convincingly achieves either of its stated goals: exceeding the performance of SWR detection or providing new insights about SWRs as compared to considerably simpler and more accessible current methods.

      We thank you again for your constructive comments. We hope you are now convinced on the value of the new method in light to the new added data.

    1. Author Response

      We thank the reviewers for their very thorough, detailed, and fair reviews that will help us improve the manuscript. We have two minor comments. First, we emphasize that the evidence is for pervasive positive selection being the main driver of the genetic diversity of Atlantic cod. Secondly, regarding the application of the Moran process to model the reproduction of high fecundity organisms. In the Moran process, a single individual is chosen at random to reproduce at any time, and another individual is chosen to die. However, the parent also persists in the population and can generate a large number of offspring in its lifetime. Hence, the Moran process does not imply an especially low level of fecundity. The multiple mergers seen in coalescent models of highly fecund organisms arise from a combination of high fecundity and reproductive skew; models of high fecundity without skewness are consistent with genealogies with binary mergers only. Hence, the Durrett-Schweinsberg model we employ can be thought of as a model for a highly fecund organism for which reproductive skewness manifests through selective sweeps.