6,781 Matching Annotations
  1. Aug 2021
    1. Author Response:

      Reviewer #1:

      Weaknesses:

      Although the BOLD data is highly spatially specific, there is just one electrophysiological timeseries per subject. This is no doubt a bi-product of the extensive noise cancellation that is necessary to record within the scanner. The caveat therefore is that the covarying BOLD and electrophysiological changes may derive from different regions.

      We recognize this is a limitation which is also not easily solved by approaches for source analysis, given the nature of the data (only 64 channels) and the usually larger imprecisions related to EEG source reconstruction. We circumvented this by choosing a task that is known from previous studies in MEG to induce changes in multiple frequency bands originating from regions the early visual cortex (Hoogenboom et al., 2006; Hoogenboom et al., 2010; Koch et al., 2009; Muthukumaraswamy and Singh, 2013). Furthermore, the EEG responses are highly similar to invasive recordings in animals from visual regions in the context tasks investigating selective attention (Fries et al., 2008). We mention this limitation now in the introduction (lines 102-111).

      The analysis methods are slightly non-standard, perhaps for good reason. The main thing that stands out is the use of correlation coefficients, rather than regression coefficients, at the first level of analysis. This could potentially conflate changes in signal with changes in noise or unexplained variance.

      We chose here for the correlation, since in our opinion this leads to a more interpretable measure of linear association than a regression slope. A regression slope-based analysis will yield different outcomes for the regression of y on x, than for x on y, doubling the number of analyses needed. The different results for a regression of y on x and x on y are often interpreted as implying directionality, which is not warranted and not what we would like to imply with our analysis. The asymmetry is caused by the implicit assumption that x does not contain noise in a regression of y on x. This is valid when x represents a paradigm condition vector, but not when it is a data vector. We therefore opted to use the difference in (Fisher-z-transformed) correlation as our estimate for linear association/connectivity between laminar fMRI signals.

      In both a correlation as well as in a regression approach differences can be attributed to differences in true underlying coupling andin a difference in noise. This is however not different for correlation-based measures in coupling in fMRI than in for instance coupling measures like coherence and phase-locking-factor in electrophysiology. Coherence can be regarded as the frequency domain version of (squared) correlation. The fact that our measure might indeed be related to differences in noise would therefore not be resolved by opting for a regression based approach, and is not different from often used measures of coupling in electrophysiology.

      Reviewer #2 (Public Review):

      Introduction. The introduction provides an overoptimistic view on the current possibilities with respect to the investigation of layer-specific activation or connectivity in the living human brain. Cortical layers cannot yet be segmented, the fMRI measures only provide an indirect signal that is heavily influenced by partial voluming between cortical depths, and EEG and MEG approaches often only measure two compartments due to low spatial resolution. The introduction, however, gives the impression that layer-specific neuronal connectivity can precisely be measured in the living human brain, which is not the case. The authors should take considerably more care with respect to how they introduce the methodology with clear references to the limitations. Also, statements such as "laminar fMRI allows us to study connectivity.." should be removed. In the same vein, I would suggest to replace laminar fMRI and laminar connectivity with cortical depthdependent fMRI and connectivity to account for the above mentioned aspects.

      In laminar fMRI research it is commonly accepted that what we measure are not true layers, but depth dependent fMRI between the boundaries of white/gray matter and gray matter/CSF. For the general audience we will make this distinction clearer and discuss the limitations of the technique (lines 74-80).

      Concept. Whereas the authors provide a model in the introduction that specifies how different frequency bands could relate to cortical depth-dependent connectivity, they do not develop a working hypotheses based on their experimental design. One conceptual step is therefore missing in the introduction, which has to combine present knowledge on the relationship between different frequency bands and present knowledge on how attention influences frequency-specific activation in the visual system to then make statements about which analyses can be performed to test which aspect of the model.

      The primary focus of our study was to investigate how oscillations across several frequency bands in the EEG relate to laminar specific activity. Recent publications on laminar fMRI have demonstrated the possibility of performing laminar level fMRI connectivity analyses, which led us to revisit our previously recorded data in order to explore whether not only laminar specific BOLD amplitude but also laminar fMRI connectivity relates to frequency specific EEG power. Since laminar fMRI, and especially connectivity derived from those measures is very novel, we started this analysis without a preconceived model or notion on how this relation would be. The results from this project should therefore be interpreted as an exploration of how these laminar fMRI derived connectivity measures relate to neural oscillations rather than directly addressing a specific cognitive process like selective attention, or prediction and/or a model of how neural oscillations play a role in these processes. Our experimental paradigm was also not designed to address such processes. We chose a paradigm that is known from previous studies using MEG and EEG to induce changes in multiple frequency bands in the early visual cortex (Hoogenboom et al., 2006; Hoogenboom et al., 2010; Koch et al., 2009; Muthukumaraswamy and Singh, 2013). Furthermore, the EEG responses are highly similar to invasive recordings in animals from visual regions in the context tasks investigating selective attention (Fries et al., 2008). The crude attention/task modulation added to the paradigm (attention On versus Off) was in the first place introduced to induce meaningful variation over subjects in a task effect across the frequency bands modulated by visual stimulation. It was not intended to investigate specific individual processes such as prediction, attention or arousal. The observed effects can therefore also not be ascribed to such specific processes, since they are co-modulated by the task. We will make this more clear in the introduction now. We make this point now explicitly in the introduction.

      • Concept & Methods: With respect to both the concept and the analyses, what is missing is taking into considerations the brain areas that were investigated. Wheres in the abstract the authors only mention "within brain region connectivity" and "between brain region connectivity" also in the Methods section there is no clear relation to the anatomical areas that were investigated, being V1, V2 and V3. The authors rather classify the areas as "high level" and "low level" where V2 is sometimes classified as high-level and sometimes as low-level. The data are therefore not investigated with reference to the anatomy of the visual system. In my view, it would be beneficial if all analyses could be performed with respect specifically to V1-V2 connecitivity and V2-V3 connectivity as well as V1-V3 connectivity so that the specific anatomical interrelations are taken into account. Also, the authors should develop a conceptual framework of how layer-specific attention-driven connectivity changes should influence the visual cortex, and why.

      In the results for between region connectivity we averaged over several connection pairs (V1- V2,V1-V3,V2-V3) and for within region connectivity across regions (V1-V1,V2-V2,V3-V3) before effects in connectivity were correlated with EEG power. There are several reasons why we opted for this approach: First, we wanted to maximally increase the statistical power to observe patterns of association between laminar connectivity and EEG power. Since the analyses as carried out here have not previously been performed, we had no estimate of effect size. Secondly, by averaging over region combinations we drastically limit the multiple comparisons problem, since the number of comparisons scales with the square of the number of regions connectivity is computed between. Third, by averaging over regions, we target more general effects of connectivity between and within regions that are more likely to correspond to patterns observed within other contexts and other modalities. The effect for individual region combinations would likely be more variable.

      For completeness in the first submission we did include the results for every single region combination in the supplementary material (see Supporting Figures S2-S5). We have now included in the main document the results for region combinations V1-V2,V1-V3 and V2-V3 for between region connectivity, and V1-V1, V2-V2 &V3-V3 for within region connectivity, presented alongside the results for the grand average.

      The results for the individual region-pairs suggest that inter- and intra-region connectivity are generally consistent with the average over individual region combinations, but also have unique features.

      Similarities include: A strong negative correlation between beta power and deep-to-deep layer coupling was observed for average inter-regional connectivity. In line with this, for all three individual region pairs (V1-V2,V1-V3,V2-V3) a negative correlation is observed for deep-to-deep layer coupling. Similar patterns can be observed from alpha and beta for intra regionalconnectivity (averaged over all regions) and connectivity within V1,V2 and V3 in isolation.

      Individual features include: The relation between beta and inter-regional coupling shows variation over the individual region-pairs. In particular for V2-V3 connectivity, but also for V1-V2 the relation seems to differ from the pattern observed on average. For V2-V3, deep layer V3 seems to be coupled to both deep and superficial layers in V2, a pattern that might reflect anatomical feedback projections that go from deep layer V3 to both deep and superficial layers in V2.The stronger correlation between deep V1 and more middle deep V2 is however harder to directly place, since direct anatomical connections here are largely absent here. It might therefore reflect an indirect effect.

      Despite some degree of individual variation we think the overall picture is largely consistent. The strongest features present in the averaged results can clearly be observed in each of the individual region-combinations as opposed to the latter being a collection of vastly different random patterns that happen to add up to the average result (see for example the intraregional alpha results).

      With respect to our classification of regions into higher and lower level cortical regions, we based on standard anatomical hierarchies like that of van Felleman & van Essen (Felleman and Van Essen, 1991). Here, V1, V2 and V3 are ordered from low to higher in the visual cortical hierarchy.

      Methods. Given the missing conceptual overview over how attention-induced changes in EEG frequency bands should influence laminar connectivity in the visual system, also the methods lack a clear analyses strategy. The authors computed one correlation between power level of different frequency bands and connectivity between different brain areas without providing an explanation of which question this analysis addresses. The offered results therefore seem random to me, without a clear relationship to an investigated hypothesis.

      The primary focus of our study was to investigate how oscillations across several frequency bands in the EEG relate to laminar specific activity. Recent publications on laminar fMRI have demonstrated the possibility of performing laminar level fMRI connectivity analyses (Sharoh et al., 2019; Huber et al., 2017; Huber et al., 2020), which led us to revisit our previously recorded data in order to explore whether not only laminar specific BOLD amplitude but also laminar fMRI connectivity relates to frequency specific EEG power. Since laminar fMRI and especially connectivity measures derived from it are very novel, we started this analysis without a preconceived model or notion on how this relation would be. The results from this project should therefore be interpreted as an exploration of how these laminar fMRI derived connectivity measures relate to neural oscillations rather than directly addressing a specific cognitive process like selective attention, or prediction and/or a model of how neural oscillations play a role in these processes. Our experimental paradigm was also not designed to address such processes and test hypotheses derived from these. The primary focus of the work presented here is to provide a first insight in how neural oscillations measured by electrophysiological measures relate to cortical depth resolved fMRI coupling, which is usually correlation based. We believe these results will be relevant for research focused on how neural oscillations relate to inter-and intra regional interactions (e.g. (Bastos et al., 2012)(Fries, 2015)), since depth resolved fMRI allows us to study laminar interactions within and between brain regions non-invasively in humans. For this it is important to know if and how neural oscillations relate to laminar fMRI based connectivity measures, of which our research here provides a first insight. It also provides insight into which neural processes underlie observed changes in laminar fMRI based coupling, and is therefore relevant for research using such methods in general.

      Methods. The authors mention that they only analyzed the strongest two connecting vertices within a layer, which was done to improve SNR. In my view, for a connectivity analyses, this is not valid, as it can bias the effect towards superficial connectivity where the SNR and thus correlation is always higher.

      We did not analyze vertex pairs within a layer. We computed vertex pairs that connect the boundary between gray matter and CSF with the boundary of gray matter and white matter based on a high resolution anatomical MRI scan. Between these vertices we sampled 21 points of functional fMRI data using nearest neighbour interpolation. Since not all parts of V1, V2 and V3 will be involved in the task, we selected the most activated vertex pairs for further analysis. This serves as a localizer to select the parts within a region where task related activation is observed. For the main analysis the top 10% activated vertex pairs were chosen based on data collapsed across all depths and all attention conditions. This selection is therefore independent of depth, task condition, and the relation with any EEG feature. For this procedure we actually excluded the top five depth bins to avoid being too biased to superficial depths since it is known that signal to noise is substantially better near the surface of the cortex in part due to larger pial veins. To investigate whether the observed results are not due to this arbitrary threshold of 10%, we repeated the analyses for top 5% and 25% activated vertex pairs, the results of which are included in the supplementary information.

      Methods. The authors report 21 correlations in cortical depth, where their resolution allows to only sample perhaps 2-3 data points. The correlation analyses are therefore oversampled, which influences the statistical results. I would suggest to first run a component analyses across cortical depth, and to then correlate independent components to one another to investigate independent data points.

      The correlations are not oversampled, since the correlations used for the connectivity analyses are over trials, and not over space. These analyses are not influenced by the number of laminar BOLD data points we sample. Furthermore, spatial supersampling is a very common practice in FMRI research. For instance, the default in SPM is to upsample 3 mm isotropic standard voxel (very common for initial acquisition) size to 2 mm isotropic voxel size. In laminar fMRI laminar signals are often upsampled up to several factors above the the original resolution. This is for a number of reasons, well outlined on the laminar fMRI community website, a resource maintained by L. Huber in collaboration with many layer fMRI labs (see: https://layerfmri.com/2019/02/22/how-many-layers-should-i-reconstruct/) and ~20 layers is thought to be optimal.

      For our statistical test we explicitly chose a non-parametric cluster based technique to correct for multiple comparisons that takes dependencies across space into account. Laminar fMRI data are not well suited to decompose into components using techniques like PCA and ICA, since they violate assumptions of orthogonality/independence of the underlying responses in both the spatial as well as the temporal dimensions. To illustrate: in a recent laminar connectivity methods review an hierarchical, iterative ICA approach resulted in data being split up in columnar maps rather than laminar ones (Huber et al., 2020).

      Methods. The authors refer to their previously published paper with respect to the methods, and do not give any speficiations on the image sequence, image resolution, and image processing in this paper. In my view, all basic methodological steps that are critical to understand the paper should be described here.

      We are willing to include all relevant parts of the methodology described in our previous paper. This would involve copying large parts of the methods section, and might have to be coordinated with the publisher of the previous publication for copyright reasons. We would be pleased if the editor could advise us on this issue.

      Results. The figure captions are too short and do not explain the presented data in an appropriate way. In Figure 1, details on the calculated contrasts, number of participants investigated, sampling and analyses methods should be given that allows interpreting the data. Also, it would be beneficial to explain the attention paradigm in a bit more detail in the figure caption so that panel A can be interpreted. In Figure 3, more details should be given on what data are shown, particularly for panel C where the only information given is "attention effect on laminar connectivity" with no further axes labels.

      We extended the figure captions in the revised article.

      Results. I do not fully understand the results as shown in Figure 3. As those form the major part of the manuscript, this needs revision. As said before, I think that the figure and results section would benefit from region-specific data analyses and presentation, but also clear axes labels are needed to allow interpretation of the data. Also, when I interpret the data correctly, correlations are done for altogether 21 different cortical depth, which would not be valid because of artificially inflating the number of correlations, as pointed out above.

      We have extended our analyses and now split original Figure 3 up into current Figures 3 and 4 where we separately depict the results for intra- and inter-regional connectivity. For both intraand inter regional connectivity we have now also shown the region-specific results that underlie these results. We updated the figures and captions to make clearer what is depicted. We addressed the point raised about the 21 data points above. It is not relevant for the analysis presented here.

      Reviewer #3 (Public Review):

      However, a weakness of the technique as currently presented is that patterns of connectivity are only related to oscillations across subjects. It would be more powerful to examine whether the current network state (estimated by trial-by-trial power estimates) relates to laminar connectivity within subject. This would indeed speak to the nature of neuronal communication, which takes place on a moment-to-moment time scale, and which is not reflected in the current analysis. This may explain why laminar patterns of fMRI connectivity were not found to correlate with gammaband oscillatory activity. In addition, the negative effects of attention on fMRI connectivity itself are somewhat puzzling. This may related to the limitations of the task design which do not perfectly separate attention vs. arousal/expectation, as the authors readily discuss.

      The reviewer suggests that a relationship between fMRI connectivity and EEG power within subjects over trials would be more indicative of a direct link between connectivity and neural communication. We agree that establishing such a link would further strengthen the link between neural oscillations and laminar connectivity. This would not be trivial however, since connectivity in (laminar) fMRI is typically expressed as a measure of linear association (e.g. correlation or regression slope) over trials or time. Even at conventional spatial resolution, single trial/time point estimates of the network state are rarely used. These single data-point measures usually indicate to what extent a single data point contributes to the measure over all data points. We did not opt for such an analysis, since such analyses in normal (e.g. resting state) fMRI studies are uncommon, introducing more complexity to a study that already includes considerable novel analytic approaches. Furthermore, research relating fMRI activation and connectivity across subjects with other variables(e.g. clinical test scores, DTI measures, personality traits) is a well established procedure. Here we followed this more common approach.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Firstly, we would like to thank the reviewers for their helpful and insightful comments.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In the manuscript of Ramadan et al. , authors use the ex vivo organoid approach to compare gene expression in organoids derived from adult type stem cells when these organoids are grown using different matrices. The presence of Collagen type I induces the emergence of cells with a transcriptome similar to fetal progenitors. In contrast, laminin the main component of matrigel, induces an organoid-protruded phenotype with transcriptome of stem cell type. Then, they correlate these data with expression of collagens and laminins from data publicly available. They show by qRT -PCR that laminins are more expressed in mesenchymal versus epithelial fractions postnatally. They hypothesize on this basis that the remodeling at postnatal stage is likely only dependent on the mesenchymal compartment and it involves interaction of laminins with integrity a6.

      It seems that some of the presented data have already been described and could not be considered as « novel ».

      For some of the statements, like this one « the basement-membrane produced by the epithelium is not sufficient to increase stem cell numbers and induce a morphological crypt formation », the conclusion is not sustained by provided experiments. To draw definitive conclusion on this particular point, authors could reproduce the experiment presented in Fig. 4d but using Cre recombinases specific for mesenchymal and epithelial compartments rather than the ubiquitous Cre line. It would be interesting to investigate if organoids grown from lamc1-/- mice can generate protruded organoids or not.

      In addition, how interpret the fact that fetal organoids up is associated with « laminin interactions » in fig. 1c?

      The statement that the epithelium-produced basement membrane is not sufficient to increase stem cell numbers is based on our in vitro observations. Analysis of the RNAseq data shows that the expression of several laminins is increased on collagen (see heatmap of laminin interactions below, which will be added to the manuscript). This is also the reason why ‘laminin interactions’ is highly significant in the gene set enrichment analysis (Fig. 1C). Despite this upregulation, we never observed morphological changes (or expression changes) as when laminin is added to the collagen-hydrogel. In addition, we showed that the vast majority of ECM components is produced by the mesenchyme in vivo, in line with previous literature as cited in the manuscript. The mentioned Cre lines to address the question in vivo are unfortunately not available to our collaborators with the Lamc1 k.o. mice and it would therefore take too long to perform these experiments.

      However, to address this point in vitro we will grow organoids from Lamc1 fl/fl mice and induce loss of laminin in the pure epithelial cell culture. Organoids will then be analysed for morphological changes, as well as proliferation and gene expression changes.

      One major point to address regards statistics. In material and methods, the paragraph describing statistical analyses is missing. Moreover, in the figures presenting qPRC data ( figs 1g 3b 3D 3g 4c and f), no statistic analysis is provided; and the number of samples for some conditions is extremely limited (n=2). In general, the term « independent experiment « should be clarified : does it correspond to one organoid line for which the experiment was repeated or one single experiment using different organoid lines?

      In fig 4c , all collagen conditions are set to 1.

      The avoidance of statistical inference for most of the experiments was a deliberate choice. In line with several comments (e.g. 1. Vaux, D. L. (2012) Know when your numbers are significant. Nature. 492, 180–181), we chose to show all individual data points (with exception of Fig. 3D, n=5, to ease interpretation) without statistics. In addition, for most expression data, we have data from RNAseq, single-cell RNAseq and qPCRs repeated at different hydrogel concentrations to obtain reliable results. Further, the in vivo mesenchymal qPCR expression data was validated with RNA in situ hybridization showing the mainly mesenchymal expression.

      The term independent experiment was used mainly for repeated experiments with the same organoid lines (exception RNAseq data, different organoids derived from individual mice). While conducting these experiments, we realised that the variability of these experiments comes from time in culture, density of cells and even Matrigel variation. The experiment in Fig. 4c (n=4, each time with the all controls) was performed with longer intervals in between, and showed variation in the absolute levels of expression. However, relative to each control we believe the effect is clear.

      As we will perform additional experiments for the revision of this paper, we will then perform statistical tests in the key experiments (e.g. Itga6 experiment) to alleviate any concerns regarding significance.

      Regarding the experiment presented in fig 4c, authors should include additional control conditions : anti-a6 integrity antibody in matrigel and use of an isotype antibody.

      We will conduct additional experiments regarding the Itga6. In addition to including the mentioned controls for the neutralizing antibody, we will genetically inactivate Itga6 via an inducible Crispr/Cas9. This should enable us to delete Itga6 when the cells are grown on collagen, and hence reduce the possibility of compensation in matrigel derived organoids.

      Another point regards RNAscope data presented in Fig 4b, it is surprising to observe such difference in terms of expression between E19 and P0. Does this mean that birth dramatically unregulates Itga6 expression in few hours? Authors should comment this point if verified.

      We do believe that birth is a timepoint where a dramatic change in the ECM and their receptors can be observed. The epithelial RNAseq data would already indicate that at 18.5 there is an increase in expression compared to E16. This upregulation of the receptor is in line with the dramatic remodelling of the ECM at birth, as is shown by the expression of the basement membrane components in Fig. 3d.

      Authors should avoid the word « signaling » for laminin-integrin interactions as they do not study this aspect at all in their experiments.

      The word signaling was used for the protein:receptor interaction and to distinguish it from changes to the physical characteristics of the hydrogel. But we agree with the reviewer, that we did not study laminin signaling per se and therefore will change the wording accordingly.

      Regarding Col1a1, authors cannot claim that it's expression only slightly changed (fig 3d) as it is clearly upregulated between E17 and P0.

      The reviewer is right, and we apologise for the misleading sentence. The contrast was meant to the basement membrane components that are very lowly expressed at E17 and then suddenly show the burst of expression at birth, whereas collagen seems to be continuously expressed with a peak at P7. We will rephrase the sentence.

      Reviewer #1 (Significance (Required)):

      Overall, the methodology used for the asked questions is accurate.

      One potential problem for publication comes from the fact that some of the findings are already reported and hat the present data do not provide further advances.

      for example, collagen and fetal-like expression profile, Ly6a sorting and replating in culture-Yui et al, 2018, Jabaji et al, 2013.

      We obviously do not agree with the reviewer on this point. We build upon the work of Jabaji et al. 2013, and Wang 2017 to characterise the specific effect of collagen on the intestinal epithelium compared to a pure Matrigel culture. The emergence of Ly6a cells was nicely shown by Yui et al., however it was unclear if collagen changes the fate of all intestinal cells or only a few. We strongly feel that our data extends these findings as it associates the changes we observe in vitro to the development of the crypt morphology and intestinal stem cells.

      The phenotype of Lamc1-/- mice and the observed reduced stem cell marker expression are also reported by Fields et al, 2019.

      Indeed, as we cited this paper. However we predicted based on our in vitro model, that deletion of laminin would result in this specific fetal-like gene expression and hence were happy to include these findings in our manuscript.

      Infine, authors do not interpret their ex vivo data in the context of fetal progenitors which grow as spheres in matrigel (containing laminin)?

      Our ex-vivo (in vitro) data would suggest that adult epithelial cells express some genes that are characteristic for fetal organoids, however we do not think these cells completely revert back to a fetal stage. Regarding the comment of spheres, it is noteworthy that fetal cells from E14-16 stay as spheres in Matrigel, whereas fetal cultures from E19 initially grow as spheres and then develop into organoids within 30 days in vitro (M. Navis et al., “Mouse fetal intestinal organoids: new model to study epithelial maturation from suckling to weaning,” EMBO Rep., vol. 20, no. 2, pp. 1–12, 2019.).

      In figure 5, should we interpret that there is no laminin at all in the fetal mesenchyme?

      We now see how the image is a bit misleading for that stage. The levels of laminin are lower in the fetal stage as can be seen by the IF image in Fig.3f and the image will be updated. We also apologise for the lack of labeling in Figure 3f, which should be E19, P7 and adult.

      Also, authors do not cite a paper reporting on the role of the epithelium ( stem cells) in regulating its own extracellular matrix composition, this process modulating the stem cell number and fate (Fernandez-Vallone et al. 2020). As this is contradictory with the claim that only mesenchyme impacts on crypt morphogenesis, authors could discuss on this point.

      In the paper by Fernandez-Vallone, deletion of Lgr5 in E16.5 embryos resulted in a decrease expression of several ECM genes. Further, the authors could show that the fetal epithelium does express for example Col1a1 at this point, which decreases with maturation. However even for the example of Col1a1 it is evident in their paper that the mesenchyme expresses Col1a1 at much higher levels. Our proposed experiments with Lamc1 k.o. In organoids will show if the produced laminins of the epithelium are essential.

      This manuscript could be interesting for an audience in the stem cell and developmental fields ( my field of expertise).

      **Referee Cross-commenting**

      Considering the pertinent and sometimes overlapping comments of the two other reviewers, the estimated time is revised to 3-6 months.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      In this manuscript, Ramadan and colleagues demonstrate that depending on the extracellular matrix (ECM) composition in which mouse intestinal organoids and/or 2D intestinal epithelial cells are grown in, cellular composition of the epithelium changes. Organoids plated on 2D collagen layers show a unique cell cluster characteristic of fetal-like genes, while organoids plated with increased amount of Matrigel in 2D or in 3D exhibit a shift towards higher stem cell abundance and the absence of the fetal-like gene cluster. Specifically, the ECM component Laminin supports acquisition of stem cells identities in small intestinal epithelial cells, correlating with a transient increase in expression levels of collagen and laminin genes in vivo spanning time points of crypt formation. The authors reported the functional contribution of laminin signaling (Lamc1 KO) via integrin alpha 6 (antibody-blocking) to intestinal stem cell acquisition in vitro and in vivo.

      There are a handful of comments/concerns that would need to be addressed before publication.

      Major points:

      1. The author claimed: "the effect of ECM components on gene expression is not due to difference in morphology (2D collogen vs. 3D Matrigel)". The conclusion of the 2D vs. 3D experiment should be toned down to that organoid morphology (2D vs. 3D) does not directly impact on the expression of fetal-like genes. Otherwise more analysis of RNAseq data with different group of genes (e.g., in different mechanosensing pathways) should be provided with Fig. S1D. Also, would it be technically feasible to perform experiments of SI in collagen (3D) in all group of experiments? Directly comparing 3D Matrigel with 3D collagen avoids the concern of the 2D vs. 3D effect.

      We apologise for the too strong claim of structure of growth vs. signalling of the ECM and its effect on the transcriptome. Indeed, the main message is that a changed morphology from 3D to 2D is not responsible for the expression of fetal-like genes. The paragraph will be rephrased.

      Also, would it be technically feasible to perform experiments of SI in collagen (3D) in all group of experiments? Directly comparing 3D Matrigel with 3D collagen avoids the concern of the 2D vs. 3D effect.

      To address this point, we want to refer to the excellent idea of growing established organoids in collagen (3D) vs. Matrigel (3D) as suggested by this reviewer (and reviewer #3) in Minor points #5. This circumvents the need for Wnt3a addition, which affects stem cell and Paneth cell gene expression.

      1. For Fig. 1f, the authors should include overlapping stainings of Lyz (or Olfm4, CD44 etc.) and Adolase B signal, or they could perform Aldolase B staining in Lgr-5-DTR-GFP and/or Lyz-RFP organoid line. From the current data provided one cannot draw clear conclusions on the crypt morphology as claimed by the authors. Additionally, when talking about crypt morphology and apical accumulation of Actin specifically in the Lyz+ cells, the authors should show a higher zoom in of the picture and either add an orthogonal slice to see apical and basal side and the specific accumulation in one of the cell types, or also co-label with apical polarity markers.

      We will perform additional co-stainings to further highlight the differences in the spatial distribution of differentiated cells and undifferentiated-crypt-like cells. Further we will provide higher magnification images highlighting the apical accumulation of Actin in the crypt-like structures, which can also be seen in mature organoids.

      1. Authors referred the organoid transient change to fetal-like state. To exam the similarity of ECM-induced reprogramming with the regenerative-type of reprogramming, it would be essential to compare the expression of the selected fetal-like genes (Anxa3, Ly6a/Sca1, Msln, Col4a2 et al.), as well as bulk and single-cell (if applicable) RNA-seq data.

      Here, we would like to refer to the excellent study by Yui et al. ([S. Yui et al., “YAP/TAZ-Dependent Reprogramming of Colonic Epithelium Links ECM Remodeling to Tissue Regeneration,” Cell Stem Cell, vol. 22, no. 1, pp. 35-49.e7, 2018.). In this study the authors detected the same gene signature in the repairing epithelium. We can provide a GSEA for the Ly6a+ signature that was derived from this paper, if necessary.

      1. For in vivo data, authors were looking at the normal development of intestine. Following the point of organoid culture recapitulates regeneration, it would be relevant to check the in vivo ECM change by staining in the process of intestinal regeneration or discuss would the fetal-like genes be involved in regeneration.

      We will address this point in the discussion as it also involves the study by Yui et al.

      1. For Fig2.d and e, it would be important to measure compactness vs. the emergence/probability of Ly6+ cells to see if there is correlation.

      If we understand the reviewer correctly, this would address the important relationship between cell shape and cell fate/type. However, this is a topic that needs more attention than a simple correlation and would exceed the scope of this manuscript as we are not able to modulate cell shape to make any further points about its effect on the fetal gene expression program.

      1. In Fig.2d, Ly6a expression is very obscure, and it would be important to show control staining for cell boundaries (eg. Phalloidin, PM) to visualize which nuclei show Ki67 staining and are high or low in Ly6a (plus quantification).

      We will improve the image in Fig.2d and include the mentioned Actin staining. In addition we will perform an analysis via Flow Cytometry to quantify the level of Ly6a staining and EdU positivity.

      1. In Fig. 2f-g, FACS-ed Ly6a+ and Ly6- cells embedded in Matrigel can grow into organoids with crypts. Here the imaging of Paneth cell staining is not clear, and a quantification on number of Paneth cells per crypt would be very helpful to confirm the phenotype. Also, authors should either provide data on the initial size of seeded cell clusters and report organoid growth and cell type composition in more detail when plating from Ly6a+ and Ly6- cells or report the variation in the respective populations.

      This comment suggests that we may not have described the experimental settings properly. The sorted cells were embedded as single cells, not as clusters, in drops of matrigel (10k cells/25ul Matrigel). The emergence of Paneth cells together with a normal organoid architecture grown from Ly6a+ cells shows their stem cell capacity, as has been shown by Yui et al. before from the regenerating colon. In addition, organoids from both cell populations (Ly6a+ and Ly6a-) could be passaged, indicating presence of intestinal stem cells.

      1. The authors could also test whether Ly6+ cells have any advantages over Ly6- cells when grown on collagen I instead of Matrigel.

      We will sort Ly6a+ and Ly6- negative cells and plate them on collagen I. It will be interesting to see if the Ly6a+ cells can give rise to the other cell types when plated on collagen or if they stay Ly6+ cells. This will also answer whether Ly6a+ need the presence of Ly6a- cells in the cultures. In addition, the experiment proposed in #6 will also highlight any proliferative advantage of Ly6a cells compared to Ly6-negative cells on collagen.

      1. In Fig.3f, a control of membrane protein staining should be added for the experiment. The increased Laminin signal can be caused by the global increase of protein when there are more cells, or tissues are more compact. When authors make conclusion of "Dramatic remodelling of ECM during crypt formation ", the experiment should also count cell numbers vs. Laminin (intensity). The phenotype can come from increased area of interface between epithelium and mesenchyme instead of active remodelling.

      We agree with the reviewer that by itself the IF images are not enough to make such a claim. However, we would point to the qPCR data and RNA in situ, that can be more easily normalised and shows the dramatic increase in expression of all laminins at birth. To show that laminin protein is increasing is more difficult than we initially anticipated. However, in the study by De Arcangelis (A. De Arcangelis et al., “Hemidesmosome integrity protects the colon against colitis and colorectal cancer,” Gut, vol. 66, no. 10, pp. 1748–1760, 2017.) the authors use an EDTA assay to show that the epithelium detaches easily when Itga6 is deleted. Within the figure, it seems also that the epithelium detaches easily at P2, compared to P14. As EDTA is disrupting laminin polymerisation, this would further indicate increased laminin protein deposition after birth.

      1. The authors claim that intestinal stem cells in vivo are controlled by Laminin signalling that goes via Integrin alpha 6. However, there is no evidence provided that supports the contribution of ITGA6 in the in vivo setting. So, the authors should either tone down on that point or show a convincing in vivo experiment (e.g., inhibit ITGA6 in vivo by inhibitor injections or by extracting the ECM of a wild-type mouse and seeding intestinal epithelial cells without vs. with ITGA6 blocking antibody which should recapitulate the phenotype in Fig. 4 c.

      We apologise for this confusion. We are well aware about the limitations of our Itga6 blocking experiment in vitro and its relevance in vivo. We tried to get material of the inducible VilCreER Itga6 mouse as referenced in the discussion of the manuscript, without any luck so far. Therefore we will highlight further that any claims about the laminin:Itga6 interaction can only be made in vitro.

      1. Fig. 4: For the data of ITGA6 expression and all sorts of analysis on protein expression with staining, normalization with cell numbers should be performed.

      The RNAseq data that shows the upregulation of Itga6 in the epithelium at E18 is normalized within. Our RNAscope only further validates these expression changes and highlights the specific enriched expression at the bottom of the nascent crypts. We can add quantification of the RNAscope if required.

      1. Two questions on mechanisms:

      2. What is the mechanism from ITGA signaling to Ly6a+ cell fate?

      3. And would/how Laminin induce ITGA expression? Depends on how much the authors would like to go deep with the project, could be addressed further with functional studies, or touch on the topics with discussion.

      These are important questions, however we do agree that this will go to deep for the scope of this manuscript. We will address these open questions in the discussion and leave the experimental part for a follow-up study.

      **Minor points:**

      1. Text in Fig.S1d regarding 'in' or 'on' collagen, could be clearer by changing the terms to 2D and 3D correspondingly.

      We agree and the text will be changed accordingly.

      1. Fig. S1a, it is great the authors showed that similar stiffness in Matrigel and collagen I. It would be important to check the concentration of collagen I vs. stiffness (also for increasing concentrations of Laminin in Fig. 3b), since this is also the type of ECM change that might lead to the change of cell status in cancer progression or collective cell migration.

      We will perform further stiffness measurements of the hydrogels and update the Fig. S1a.

      1. When plating intestinal epithelial cells on collagen I, is the Ly6+ phenotype altered upon Wnt addition? This is not so clear from the RNAseq data Fig S1d., so authors should provide antibody stainings (stem cells/Paneth cells). This could give insight whether Ly6+ cells are still able to convert into stem cells/ Paneth cells by changing morphogen concentration vs. ECM composition.

      We will reanalyse the RNaseq dataset further, specifically analysing the ratio of stem cell and Paneth cell gene expression. However, as mentioned before, Wnt3a specifically does reduce the expression of Paneth cell markers.

      Similar to this point, also enteroendocrine cell fate is absent in collagen I condition (Fig2.ab), the authors could address this point by medium induced EE cell fate.

      Due to the reduced number of secretory cells the clustering in Fig2 a/b does not separate all the different cell lineages. However, EE cells are present in the collagen cultures as characterised by expression of Chga, just reduced in their number (see Supl Fig. 2B).

      1. It would be more informative to indicate the thickness of ECM layer in culture of 2D collagen I, as well as the image of the whole well, demonstrating the morphological variation in the middle and peripheral of the ECM layer.

      The thickness of the collagen layer is about 1mm in a 6well plate and we do not observe any morphological differences in the cells between the periphery and center of the well.

      1. After the formation of PC/SC clusters, would ECM contribute to maintenance? Putting mature organoids from Matrigel to Collagen I 3D would help to clarify.

      This is an interesting experiment that we will conduct, we thank the reviewer for this suggestion. Indeed, established organoids should be able to grow in collagen I without Wnt3a addition. The paper by Sachs et al. (N. Sachs, Y. Tsukamoto, P. Kujala, P. J. Peters, and H. Clevers, “Intestinal epithelial organoids fuse to form self-organizing tubes in floating collagen gels,” Development, vol. 144, no. 6, pp. 1107–1112, 2017. et al) used extensive washing with PBS to remove Matrigel from the organoids. We will go one step further and trying to completely remove laminin specifically by EDTA incubation, as has been shown recently (J. Y. Co et al., “Controlling Epithelial Polarity: A Human Enteroid Model for Host-Pathogen Interactions,” Cell Rep., vol. 26, no. 9, pp. 2509-2520.e4, 2019.). This should then also answer whether disruption of laminin signalling is sufficient to induce fetal-gene expression without the addition of collagen I in a 3D setting.

      1. Check secretome and individual culture of Mesenchyme, see if the increase of Laminin is epithelium independent.

      We agree that the mesenchyme is key for laminin production, therefore these are important questions. Our prediction would be that epithelium from birth (P0) versus adult might result in different responses on the mesenchyme. However, we feel these experiments are better suited for a follow-up study.

      1. In general, the authors should look at cell polarity markers to check the ECM contribution to cell polarity in different cell types.

      We thank the reviewer for the suggestions and as mentioned above, we will perform additional stainings.

      Reviewer #2 (Significance (Required)):

      **Significance:**

      The work highlights the role of ECM on stem cell niche and is of great interest to the organoid and stem cell community.

      Our field of expertise is image- and seq-technology-based quantitative biology, regeneration and mechanics in organoid.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      Ramadan et al present a highly informative paper detailing how the Extracellular matrix influences the development of the intestine. Specifically, the authors provide a thorough analysis of how manipulating the components of the ECM can affect organoid growth, morphology, and gene expression of the organoids. Most importantly, the authors isolate laminin as a critical component of the ECM which impacts the development of fetal-like epithelium. While the in vitro work is generally compelling and of interest to the field, the in vivo data is someone lacking in depth and novelty. Particularly, these conclusions from the abstract could be much better supported: "This laminin:ITGA6 signalling is essential for the stem cell induction and crypt formation in vitro. Importantly, deletion of laminin in the adult mouse results in a fetal-like epithelium with a marked reduction of adult intestinal stem cells." The in vivo work was largely published previously and has caveats noted below, while the in vitro association of ITGA6 signaling with crypt formation is over-interpreted based upon an antibody blocking experiment and a lack of statistical rigor. Despite these concerns, this reviewer finds the work of considerable interest in an important area of the field (epithelial/stromal interactions of the intestine).

      **Major Concerns:**

      The use of the Ubc-Cre Lamc1-flox mouse model is an interesting way to test the impact of loss Lamc1 on intestinal development. However, with the Ubc-Cre, does the mouse model have other deleterious effects on the mouse beyond the intestine?

      Can the authors use a more localized Cre to observe specifically the impacts of Lamc1 loss in the intestine? What is the fate of these mice? Can the authors show swiss roll, low mag sections to let the reader know the extent of this phenotype? OLFM4 and Ki67 IHC should be conducted over a timecourse to show how the changes occur over time after loss of Lamc1. How long does Lamc1's protein product perdure after tamoxifen treatment? More details of this exciting, in vivo validation of the authors' in vitro studies are key to elevating the impact of this work. However, it appears that much of this mouse model was previously published, but the previous findings are not well summarized in the current manuscript.

      We will describe the model in more detail and refer readers to the excellent study of our collaborators which answers most of the raised questions. It is interesting to note that although a ubiquitous Cre was used to delete Lamc1 in adult mice, a phenotype was only observed in the intestine indicating a specific role for continuous laminin production here.

      Can the authors show that ITGA6 loss has functional consequences in vivo with an epithelial knockout or via an organoid knockdown? A more rigorous genetic test of this proposed function would be important for substantiating the claims made in the abstract.

      As referenced in the discussion of the manuscript, there is a VilCreER Itga6 mouse described in the literature (A. De Arcangelis et al., “Hemidesmosome integrity protects the colon against colitis and colorectal cancer,” Gut, vol. 66, no. 10, pp. 1748–1760, 2017.), which mainly focus on the colon. However, the authors use an EDTA assay in the small intestine to show that the epithelium detaches easily when Itga6 is deleted (Fig. 1J). Within the figure, it seems also that the epithelium detaches easily at P2, compared to P14. As EDTA is disrupting laminin polymerisation, this would further indicate increased laminin protein deposition after birth which is dependent on Itga6.

      We tried to get material of the inducible VilCreER Itga6 mouse however without any luck so far.

      We will conduct additional experiments regarding the Itga6 in vitro. In addition to including additional controls for the neutralizing antibody, we will genetically inactivate Itga6 via an inducible Crispr/Cas9. This should enable us to delete Itga6 when the cells are grown on collagen, and hence reduce the possibility of compensation in matrigel derived organoids.

      The authors state that the changes in gene expression are not due to differences in morphology, but rather are specific to the components of the environment. While the authors show that organoids treated with Wnt3a "in Matrigel" and "CollagenI" appear to have similar morphologies and yet still result in a different gene expression profiles, it would be of great interest to see whether that difference persists without Wnt3a when organoids are "in Matrigel" and "in CollagenI". While the reviewer understands the difficulties of culturing organoids in 3D Collagen without Wnt3a, organoids can be indeed be cultured in 3D using "floating collagenI rings" (Sachs et al 2017).

      This is an interesting experiment that we will conduct, we thank the reviewer for this suggestion. Indeed, established organoids should be able to grow in collagen I without Wnt3a addition. The paper by Sachs et al. (N. Sachs, Y. Tsukamoto, P. Kujala, P. J. Peters, and H. Clevers, “Intestinal epithelial organoids fuse to form self-organizing tubes in floating collagen gels,” Development, vol. 144, no. 6, pp. 1107–1112, 2017. et al) used extensive washing with PBS to remove Matrigel from the organoids. We will go one step further and trying to completely remove laminin specifically by EDTA incubation, as has been shown recently (J. Y. Co et al., “Controlling Epithelial Polarity: A Human Enteroid Model for Host-Pathogen Interactions,” Cell Rep., vol. 26, no. 9, pp. 2509-2520.e4, 2019.). This should then also answer whether disruption of laminin signalling is sufficient to induce fetal-gene expression without the addition of collagen I in a 3D setting.

      Similarly, while the authors indicate that increasing Matrigel concentrations altered the gene expression patterns in a dose-dependent manner, it is unknown whether this can fully be attributed to the Matrigel composition, or whether the layer of Matrigel is providing the capability to transition from 2D to 3D culture.

      We are not entirely sure, we understood this point. The Matrigel and the collagen I are mixed before they solidify, therefore enabling a homogenous hydrogel. The different hydrogels are not layered (if that is what the reviewer is referring to).

      The authors cultured organoids in different concentrations of Laminin/CollagenIV when mixed with CollagenI. Can organoids be sustained only on a matrix of CollagenIV and/or Laminin? Would this show more direct differences between CollagenI vs. Laminin cultured organoids?

      Organoids cannot be grown in pure Collagen IV, but pure laminin should be feasible. We did initial experiments with 3-5mg/ml laminin in PBS and that was sufficient to allow organoid growth. We will perform additional experiments with pure laminin and show the impact on organoid growth.

      With the reduction in stem-cell and Paneth cells in the Lamc1-KO mice, it would also be of interest to determine what cell types are now prominent within the heavily elongated intestinal "crypt" structures seen in the Lamc1-KO mice and whether populations are more TA-cells or enterocytes to consider differentiation status of the cells. Additionally, it would also of interest to see if the Itga6 expression is significantly altered in the absence of Lamc1.

      We will test expression changes for Itga6 in the Lamc1-KO mice, in the epithelium via qPCR. Additionally we can stain tissue from these mice for Sox9, Ki67 and differentiated markers eg. CD44, AldolaseB, Villin etc .to determine whether the elongated, hyperproliferative crypts contain progenitor cells or secretory enterocytes.

      **Minor concerns:**

      Matrigel is a complex matrix derived from mouse tumors. In many instances in the manuscript, the authors portray it as a more simpler mix of laminin/Collagen4 (fig 3a). It should be made clearer to the reader that Matrigel is not a mix of recombinant proteins, but a more clear depiction of how Matrigel is derived will be critical for this study, given the focus on specific ECM components and how they affect intestinal epithelial growth.

      We agree and will change the oversimplified view of Matrigel.

      Some labels of specific conditions would be appreciated in the figures as opposed to only the figure legends (ie. Fig. 1b and 1d should be labeled with comparisons; Fig. 3f labels of fluorescence, Fig. 4b label of itga6 staining).

      We apologise for this and the Figure labels will be updated.

      With the light staining of Lamc1 in-situ, it is hard to appreciate the expression of laminin within the stroma of the intestine when compared to Col4. This reviewer is also curious of the biological relevance of the concentrations of Laminin/CollagenIV when culturing organoids in Fig. 4a.

      Indeed, the Lamc1 due to its lower expression than Col4a1 is more difficult to see. Maybe the reviewer overlooked Suppl.Fig.4A, where the blue channel of these in situ images show more contrast. If required, we can try to optimise the hybridisation times to increase the signal a bit further.

      When culturing the organoids with the mixture of collagen and laminin or collagen IV, the concentrations of the two single components were selected similar to their concentrations in Matrigel. Regarding the absolute concentrations of laminin/collagen IV in vitro versus their “concentration” in vivo is much harder to answer. In addition to the unknown concentrations in vivo, there are many more Laminin types present with specific localisation and even specific receptor interactions. For our in vitro studies we relied on the Laminin present in EHS tumours, which is Laminin alpha 1 beta 1 gamma 1. We are currently investigating decellularization protocols to purify the ECM from mouse intestinal tissue, but again this would be more suited for a follow up study.

      It would be appreciated if gene expression analyses presented in figures would include p-values to provide context for differences in gene expression.

      The avoidance of statistical inference for most of the experiments was a deliberate choice. In line with several comments (e.g. 1. Vaux, D. L. (2012) Know when your numbers are significant. Nature. 492, 180–181), we chose to show all individual data points (with exception of Fig. 3D, n=5, to ease interpretation) without statistical testing. For most expression data, we have data from RNAseq, single-cell RNAseq and qPCRs repeated at different hydrogel concentrations to obtain reliable results. Further, the in vivo mesenchymal qPCR expression data was validated with RNA in situ hybridization showing the mainly mesenchymal expression.

      As we will perform additional experiments for the revision of this paper, we can perform statistical tests in the key experiments (e.g. Itga6 experiment) to alleviate any concerns regarding significance.

      In figure 3f, the authors report "immunofluorescence of laminin". How is this measured? Can more details be given about the antibody in the text and figure legend? Laminins are a family of genes, and it's not clear what's being demonstrated in this figure panel. Developmental stages of the samples are also not clear.

      We apologise for the lack of labeling in Fig.3f. The details of the antibody were hidden in the Materials and Methods of the manuscript (Slides were incubated with Laminin Polyclonal Antibody (1/200, Thermo Fisher #PA5-22901) overnight at 4C ). This pan-laminin antibody reacts with most Laminin isoforms alpha1, alpha2, beta1, gamma1. We will declare it as a pan-laminin antibody in the Figure legend to help future readers.

      Reviewer #3 (Significance (Required)):

      This work is in an exciting "hot" area of research to understand the role of non-epithelial cells in intestinal epithelial development and function. The audience would be those in the GI field and those studying tissue-tissue interactions.

      There's some concern that the in vivo portion of the manuscript (4th figure) uses a model that was previously characterized and published by this group, and that isn't clearly disclosed. The manuscript would benefit from more disclosure and detail about the in vivo phenotype. Such changes would substantially increase the impact and novelty of the study.

      We would like to point out that we cited the paper of the original study that uses the model throughout the manuscript. We will disclose in more detail that this group did the study and that the reduction in stem cell genes was already mentioned in the original publication.

    1. Author Response:

      Reviewer #2 (Public Review):

      1) The authors describe their algorithm as a tool that (i) was validated "across heterogeneous populations around the world"; (ii) has an "accuracy matching or exceeding human accuracy"; (iii) "is easy to use". I take issue with these three statements. First, the authors did not test the performance of their algorithm in clinical populations with sleep disorders, despite the fact that individuals with sleep disorders represent (logically) the vast majority of sleep recordings. Crucially, such a comparison was made in the best (to my knowledge) published automated sleep staging algorithm (Stephansen et al. Nature Communications 2018, doi: 10.1038/s41467-018-07229-3). The omission of this work is very surprising. Quantifying the impact of sleep disorders on a sleep scoring algorithm is critical for its deployment in sleep clinics.

      We apologize as we were not clear in describing our training and testing data set. Indeed, both the training and testing set 1 included a significant number of individuals with sleep disorders. Indeed, about 30% of the individuals had moderate to severe sleep apnea (AHI >= 15). The validation dataset (DOD, or testing set 2) also includes 55 nights from individuals with obstructive sleep apnea (average AHI = 18.5 ± 16.2). Furthermore, both the training and testing set 1 included individuals with a medical diagnosis of insomnia, depression, diabete and hypertension.

      The health status and demographics data of the training and testing sets have now been clarified throughout in the manuscript to avoid any such confusion:

      1) Methods: We have added an extensive description of each dataset in the training and testing sets, including data on health and sleep disorders.

      2) Results: We have added a new table to report and compare demographics/health data of the training and testing set, as suggested in a later comment by the reviewer.

      3) Results: Performance results of the testing set 2 are now reported separately for healthy individuals and individuals with sleep disorders.

      Second, the authors wrote that their algorithm is "matching or exceeding" human accuracy but seem to present uncorrected one-to-one comparisons to support their claim. The fact that an algorithm is better than some humans do not mean it exceeds human performance.

      Thanks for noting that. We have now removed all instances of “exceeding human accuracy”.

      Third, although I agree that the tool seems easy to use even for individuals with limited programming skills, it still requires some. I don't think someone who is used to software with graphical interfaces and who has never used (or heard of!) python would describe the tool as easy to use. This poses an important implementation challenge.

      2) An important limitation of this algorithm is that it captures only one part of the visual examination of sleep data. Indeed, especially in clinical settings, the data is not only examined to establish the hypnogram but to also identify markers of common sleep disorders (e.g. sleep apnea, leg movements, etc). Although this algorithm could significantly speed up sleep scoring, it does not allow to detect these other important markers. Currently, and in link with the previous comment, the algorithm could not replace the visual inspection of the data for clinical diagnoses.

      We have now revised the manuscript such that we discussed in this possibility in “Limitations and future directions” subsection of the new Discussion:

      “The algorithm is not currently able to identify markers of common sleep disorders (such as sleep apnea, leg movements) and as such may not be suited for clinical purposes. It should be noted however that our software does include several other functions to quantify phasic events during sleep (slow-waves, spindles, REMs, artefacts) as well as sleep fragmentation of the hypnogram. Rather than replacing the crucial expertise of clinicians, YASA may thus provide a helpful starting point to accelerate clinical scoring of polysomnography recordings. Furthermore, future developments of the algorithm should prioritize automated scoring of clinical disorders, in particular apnea-hypopnea events. On the latter, YASA could implement some of the algorithms that have been developed over the last few years to detect apnea-hypopnea events from the ECG or respiratory channels (e.g. Varon et al. 2015; Koley and Dey 2013).”

      3) The data were curated with some recordings or portions of recordings being excluded (see p. 7). While I understand that this curation is important for the training set, I think it should not be applied to the test set. Indeed, it goes contrary to the logic of automating sleep staging. For example, cutting the beginning and end of the recording according to sleep start and end (p. 7) supposes that the start and end of sleep are already known (i.e. it has already been scored).

      This truncation step has now been removed from the pipeline and all the results have been updated accordingly. In addition, we have also removed all other exclusion criteria (e.g. PSG data quality, recording duration, etc) to improve the generalization power of the algorithm, thanks to the suggestions of the reviewer.

      4) Two types of EEG derivations were used (C4-M1 or C4-Fpz). Was the performance impacted by this variable? Is it fair to assume that the choice of features (spectral features or summary statistics of time series data) could explain the absence of differences but that introducing new features (i.e. phase-sensitive features) could increase the influence of the choice of the derivation?

      Thanks for raising this. First, our choice of the EEG reference was determined by the datasets: the CFS, CCSHS, MrOS, CHAT and HomePAP datasets were all referenced to Fpz, while the MESA, SHHS and DOD datasets were referenced to the contralateral mastoid. The montage of each dataset has now been added to the Methods section.

      Second, as rightly pointed out by the reviewer, the features implemented in the algorithm were chosen to be robust to various recording montages. This is now explicitly discussed in the “Features extraction” subsection of the Methods:

      “The features included in the current algorithm were chosen to be robust to different recording montages. As such, we did not include features that are dependent on the phase of the signal, and/or that require specific events detection (e.g. slow-waves, rapid eye movements). However, the time-domain features are dependent upon the amplitude of the signal, and the algorithm may fail if the input data is not expressed in standard units (uV) or has been z-scored prior to applying the automatic sleep staging.”

      5) Given that markers of sleep stages are very different in EOG, EMG and EEG time series, could the authors explain the logic behind applying the same pre-processing and extracting the same features on these three very different types of data? Could this explain why the majority of the features in the top-20 features were EEG features?

      We now provide a more detailed explanation on the inclusion of EOG and EMG features in the “Features extraction” subsection of the Methods:

      “These features were selected based on prior work in features-based classification algorithms for automatic sleep staging (Krakovská and Mezeiová 2011; Lajnef et al. 2015; Sun et al. 2017). For example, it was previously reported that the permutation entropy of the EOG/EMG as well as the EEG spectral powers in the traditional frequency bands are the most important features for accurate sleep staging (Lajnef et al. 2015), thus warranting their inclusion in the current algorithm. Several other features are derived from the authors’ previous works with entropy/fractal dimension metrics1. ” https://github.com/raphaelvallat/antropy

      Furthermore, we have added a “Limitations and future directions” section in the Discussion in which we propose future improvements of the algorithm. One of these potential improvements is the development of EOG and EMG features that would provide a higher discrimination of the sleep stages:

      “This suggests that one way to improve performance on this population could be the inclusion of more EEG channels and/or bilateral EOGs. For instance, using the negative product of bilateral EOGs may increase sensitivity to rapid eye movements in REM sleep or slow eye movements in N1 sleep (Stephansen et al. 2018; Agarwal et al. 2005). Interestingly, the Perslev 2021 algorithm does not use an EMG channel, which is consistent with our observation of a negligible benefit on accuracy when adding EMG to the model. This may also indicate that while the current set of features implemented in the algorithm performs well for EEG and EOG channels, it does not fully capture the meaningful dynamic information nested within muscle activity during sleep.”

      6) Sleep scoring guidelines incorporate not only what can be observed on a given epoch of data but also what is observed in the previous epoch(s). For example, an epoch can be scored as N2 even if there is no marker of N2 but there was (i) a marker of N2 in a previous epoch, (ii) no reason to change the score since. To reproduce this, the authors employed a symmetrical smoothing approach (a combination of a triangular-weighted rolling average and asymmetrical rolling average). Why did the authors choose to incorporate data from following epochs, which is not implemented in established guidelines? How was the duration of the smoothing window chosen? Indeed, 5 minutes appear as rather long could explain the poor performance of the algorithm for fast changing portions of the data (i.e. N1 or transitions). Importantly, these transitions can be very relevant in clinical settings and to establish a diagnosis.

      This is a great question. We have addressed this in the revised manuscript.

      Temporal smoothing

      We have also conducted a new analysis of the influence of the temporal smoothing on the performance. The results are described in Supplementary File 3a. Briefly, using a cross-validation approach, we have tested a total of 49 combinations of time lengths for the past and centered smoothing windows. Results demonstrated that the best performance is obtained when using a 2 min past rolling average in combination with a 7.5 minutes centered, triangular-weighted rolling average. Removing the centered rolling average resulted in poorer performance, suggesting that there is an added benefit of incorporating data from both before and after the current epoch. Removing both the past and centered rolling averages resulted in the worst performance (-3.6% decrease in F1-macro). Therefore, the new version of the manuscript and algorithm now uses a 2 min past and 7.5 min centered rolling averages. All the results in the manuscript have been updated accordingly. We have now edited the “Smoothing and normalization” subsection of the Methods section as follow:

      “In particular, the features were first duplicated and then smoothed using two different rolling windows: 1) a 7.5 minutes centered, and triangular-weighted rolling average (i.e. 15 epochs centered around the current epoch with the following weights: [0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1., 0.875, 0.75, 0.625, 0.5, 0.375, 0.25, 0.125]), and 2) a rolling average of the last 2 minutes prior to the current epoch. The optimal time length of these two rolling windows was found using a parameter search with cross-validation (Supplementary File 3a). [...] The final model includes the 30-sec based features in original units (no smoothing or scaling), as well as the smoothed and normalized version of these raw features.”

      Reviewer #3 (Public Review):

      This study presents a new sleep scoring tool that is based on a classification algorithm using machine-learning approaches in which a set of features is extracted from the EEG signal. The algorithm was trained and validated on a very large number of nocturnal sleep datasets including participants with various ethnicities, age and health status. Results show that the algorithm offers a high level of sensitivity, specificity and accuracy matching or sometimes even exceeding that of typical interscorer agreement. The conclusions are supported by the data. Importantly, a measure of the algorithm's confidence is provided for each scored epoch in order to guide users during their review of the output. The software is described as easy to use, computationally low-demanding, open source and free. This paper addresses an important need for the field of sleep research. There is indeed a lack of accurate, flexible and open source sleep scoring tools. I would like to commend the authors for their efforts in providing such a tool for the community and for their adherence to the open science framework as the data and codes related to the current manuscript are made available. I predict that this automated tool will be of use for a large number of researchers in the field. However, there are plenty of automated sleep scoring tools already available in the field (most of them are not open source and rather expensive, as noted by the authors). The current work does not provide a clear view on whether the new algorithm presented in this research performs better than algorithms already available in the field. No formal comparisons between algorithms is provided and the matter is not discussed in the paper.

      Thanks so much for pointing this out. We have now added this relevant reference throughout the manuscript. To build on the reviewer’s point, the current algorithm and Stephansen’s algorithm did not use the same public data. The Stephansen 2018 algorithm was trained and validated on “10 different cohorts recorded at 12 sleep centers across 3 continents: SSC, WSC, IS-RC, JCTS, KHC1, AHC, IHC, DHC, FHC and CNC”, none of which are included in the training/testing sets of the current algorithm. Nevertheless, we certainly agree that the manuscript will benefit from a more extensive comparison against existing tools. To this end, we have made several major modifications to the manuscript. First, we have added a dedicated paragraph in the Introduction to review existing sleep staging algorithms:

      “Advances in machine-learning have led efforts to classify sleep with automated systems. Indeed, recent years have seen the emergence of several automatic sleep staging algorithms. While an exhaustive review of the existing sleep staging algorithms is out of the scope of this article, we review below — in chronological order — some of the most significant algorithms of the last five years. For a more in-depth review, we refer the reader to Fiorillo et al. 2019. The Sun et al. 2017 algorithm was trained on 2,000 PSG recordings from a single sleep clinic. The overall Cohen's kappa on the testing set was 0.68 (n=1,000 PSG nights). The “Z3Score” algorithm (Patanaik et al. 2018) was trained and evaluated on ~1,700 PSG recordings from four datasets, with an overall accuracy ranging from 89.8% in healthy adults/adolescents to 72.1% in patients with Parkinson’s disease. The freely available “Stanford-stage” algorithm (Stephansen et al. 2018) was trained and evaluated on 10 clinical cohorts (~3,000 recordings). The overall accuracy was 87% against the consensus scoring of several human experts in an independent testing set. The “SeqSleepNet” algorithm (Phan et al. 2019) was trained and tested using a 20-fold cross-validation on 200 nights (overall accuracy = 87.1%). Finally, the recent U-Sleep algorithm (Perslev et al. 2021) was trained and evaluated on PSG recordings from 15,660 participants of 16 clinical studies. While the overall accuracy was not reported, the mean F1-score against the consensus scoring of five human experts was 0.79 for healthy adults and 0.76 for patients with sleep apnea.”

      Second, and importantly, we now perform an in-depth comparison of YASA’s performance against the Stephansen 2018 algorithm and the Perslev 2021 algorithm using the same data for all three datasets. Specifically, we have applied the three algorithms to each night of the Dreem Open Datasets (DOD) and compared their performance in dedicated tables in the Results section (Table 2 and Table 3). This procedure is fully described in a new “Comparison against existing algorithms” subsection of the Methods. None of these algorithms included nights from the DOD in their training set, thus ensuring a fair comparison of the three algorithms. Related to point 4 of the Essential Revisions, performance of the three algorithms are reported separately for healthy individuals (DOD-Healthy, n=25) and patients with sleep apnea (DOD-Obstructive, n=50). To facilitate future validation of our algorithm, we also provide the predicted hypnograms of each night in Supplementary File 1 (healthy) and Supplementary File 2 (patients).

      Overall, the comparison results show that YASA’s accuracy is not significantly different from the Stephansen 2018 algorithm for both healthy adults and patients with obstructive sleep apnea. The accuracy of the Perslev 2021 algorithm is not significantly different from YASA in healthy adults, but is higher in patients with sleep apnea. However, it should be noted that while the YASA algorithm only uses one central EEG, one EOG and one EMG, the Perslev 2021 algorithm uses all available EEGs as well as two EOGs. This suggests that adding more EEG channels and/or using the two EOGs may improve the performance of YASA in patients with sleep apnea. Though an important counterpoint is that YASA requires a far less extensive array of data (channels) to accomplish very similar levels of accuracy, which has the favorable benefit of reducing analysis computational and processing demands, improves speed of analysis (i.e. a few seconds per recording versus ~10 min for the Stephansen 2018 algorithm), and is amenable to more data recordings since many may not have sufficient EEG channels. All these points are now discussed in detail in the new “Limitations and future directions” subsection of the Discussion (see point 3 of the Essential Revisions).

      There are some overstatements in the manuscript. For example, the algorithm was trained and validated on nocturnal sleep data. Sleep characteristics (eg duration and distribution of sleep stages etc.) are different, for example, during diurnal sleep (nap) and the algorithm might not perform as well on nap data. As such, the tool might not be as "universal" as stated in the title. Additionally, as human scores are used as the ground-truth for the validation step, it might be misleading to state that "this tool offers high sleep-staging accuracy matching or exceeding human accuracy". The algorithm exceeded the accuracy of some human scorers and matched the scores of the best scorer.

      We have now removed the word “universal” from the title and replaced “exceeded human accuracy” with “matched human accuracy”. Furthermore, we have now added the fact that the algorithm was trained and validated only on nocturnal data in the Limitations section of the discussion, and as such, noted that there is the possibility that the algorithm may not perform at the same accuracy levels for daytime nap data.

      No reflection on further improvement is offered in the paper. The algorithm performs worse on N1 stage, older individuals and patients presenting sleep disorders (sleep fragmentation) and it is unclear how this could be improved in future research. In the same vein, the current work does not present performance accuracy separately for healthy individuals and patients when it is expected that accuracy would be poorer in the patient group.

      The revised manuscript now includes a dedicated section in the Discussion to propose ideas for improvements.

      First, we have now added a “Limitations and Future Directions” subsection in the Discussion to present ideas for improving the algorithm, with a particular focus on fragmented nights and/or nights from patients with sleep disorders:

      “Despite its numerous advantages, there are limitations to the algorithm that must be considered. These are discussed below, together with ideas for future improvements of the algorithm. First, while the accuracy of YASA against consensus scoring was not significantly different from the Stephansen 2018 and Perslev 2021 algorithms on healthy adults, it was significantly lower than the latter algorithm on patients with obstructive sleep apnea. The Perslev 2021 algorithm used all available EEGs and two (bilateral) EOGs, whereas YASA’s scoring was based on one central EEG, one EOG and one EMG. This suggests that one way to improve performance in this population could be the inclusion of more EEG channels and/or bilateral EOGs. For instance, using the negative product of bilateral EOGs may increase sensitivity to rapid eye movements in REM sleep or slow eye movements in N1 sleep (Stephansen et al. 2018; Agarwal et al. 2005). Interestingly, the Perslev 2021 algorithm does not use an EMG channel, which is consistent with our observation of a negligible benefit on accuracy when adding EMG to the model. This may also indicate that while the current set of features implemented in the algorithm performs well for EEG and EOG channels, it does not fully capture the meaningful dynamic information nested within muscle activity during sleep.”

      Second, we have now conducted a random forest analysis to identify the main contributors of accuracy variability. The analysis is described in detail in the “Moderator Analyses” subsection of the Results as well as Supplementary File 3b, the revision now states:

      “To better understand how these moderators influence variability in accuracy, we quantified the relative contribution of the moderators using a random forest analysis. Specifically, we included all aforementioned demographics variables in the model, together with medical diagnosis of depression, diabetes, hypertension and insomnia, and features extracted from the ground-truth sleep scoring such as the percentage of each sleep stage, the duration of the recording and the percentage of stage transitions in the hypnograms. The outcome variable of the model was the accuracy score of YASA against ground-truth sleep staging, calculated separately for each night. All the nights in the testing set 1 were included, leading to a sample size of 585 unique nights. Results are presented in Supplementary File 3b. The percentage of N1 sleep and percentage of stage transitions — both markers of sleep fragmentation — were the two top predictors of accuracy, accounting for 40% of the total relative importance. By contrast, the combined contribution of age, sex, race and medical diagnosis of insomnia, hypertension, diabete and depression accounted for roughly 10% of the total importance.”

      In addition, the performance of the algorithm in the DOD testing dataset is now reported separately for healthy individuals and patients with sleep disorders.

      As requested by the reviewer, we now analyze and report the performance of YASA on the DOD testing set separately for healthy individuals (DOD-healthy) and patients with obstructive sleep apnea (DOD-Obstructive), which can be found in section “Testing set 2”.

      There is series of methodological choices that is not justified. For example, nights were cropped to 15 minutes before and after sleep to remove irrelevant extra periods of wakefulness or artefacts on both ends of the recording. This represents an issue for the computation of important sleep measures such as sleep efficiency and latency as the onset/offset of sleep might be missed. It is also unclear how the features were selected and a description of said features is currently missing. The custom sleep stage weights procedure is unclear. The length of the time window for the smoothing procedure seems arbitrary. Last, it is currently unclear when / how the EEG and EMG data were analyzed.

      As recommended by the reviewers, the 15-min truncation step has now been removed from the pipeline. Furthermore, the Methods section has been improved to provide more details on the features. Finally, the best class-weights and smoothing windows are now found using a cross-validation analysis on the training set. For more details, we refer the reviewer to the “Justification for some methodological choices” section below.

    1. Author Response:

      Reviewer #1 (Public Review):

      The manuscript provides very high quality single-cell physiology combined with population physiology to reveal distinctives roles for two anatomically dfferent LN populations in the cockroach antennal lobe. The conclusion that non-spiking LNs with graded responses show glomerular-restricted responses to odorants and spiking LNs show similar responses across glomeruli generally supported with strong and clean data, although the possibility of selective interglomerular inhibition has not been ruled out. On balance, the single-cell biophysics and physiology provides foundational information useful for well-grounded mechanistic understanding of how information is processed in insect antennal lobes, and how each LN class contributes to odor perception and behavior.

      Thank you for this positive feedback.

      Reviewer #2 (Public Review):

      The manuscript "Task-specific roles of local interneurons for inter- and intraglomerular signaling in the insect antennal lobe" evaluates the spatial distribution of calcium signals evoked by odors in two major classes of olfactory local neurons (LNs) in the cockroach P. Americana, which are defined by their physiological and morphological properties. Spiking type I LNs have a patchy innervation pattern of a subset of glomeruli, whereas non-spiking type II LNs innervate almost all glomeruli (Type II). The authors' overall conclusion is that odors evoke calcium signals globally and relatively uniformly across glomeruli in type I spiking LNs, and LN neurites in each glomerulus are broadly tuned to odor. In contrast, the authors conclude that they observe odor-specific patterns of calcium signals in type II nonspiking LNs, and LN neurites in different glomeruli display distinct local odor tuning. Blockade of action potentials in type I LNs eliminates global calcium signaling and decorrelates glomerular tuning curves, converting their response profile to be more similar to that of type II LNs. From these conclusions, the authors infer a primary role of type I LNs in interglomerular signaling and type III LNs in intraglomerular signaling.

      The question investigated by this study - to understand the computational significance of different types of LNs in olfactory circuits - is an important and significant problem. The design of the study is straightforward, but methodological and conceptual gaps raise some concerns about the authors' interpretation of their results. These can be broadly grouped into three main areas.

      1) The comparison of the spatial (glomerular) pattern of odor-evoked calcium signals in type I versus type II LNs may not necessarily be a true apples-to-apples comparison. Odor-evoked calcium signals are an order of magnitude larger in type I versus type II cells, which will lead to a higher apparent correlation in type I cells. In type IIb cells, and type I cells with sodium channel blockade, odor-evoked calcium signals are much smaller, and the method of quantification of odor tuning (normalized area under the curve) is noisy. Compare, for instance, ROI 4 & 15 (Figure 4) or ROI 16 & 23 (Figure 5) which are pairs of ROIs that their quantification concludes have dramatically different odor tuning, but which visual inspection shows to be less convincing. The fact that glomerular tuning looks more correlated in type IIa cells, which have larger, more reliable responses compared to type IIb cells, also supports this concern.

      We agree with the reviewer that "the comparison of the spatial (glomerular) pattern of odor-evoked calcium signals is not necessarily a true apples-to-apples comparison". Type I and type II LNs are different neuron types. Given their different physiology and morphology, this is not even close to a "true apples-to-apples comparison" - and a key point of the manuscript is to show just that.

      As we have emphasized in response to Essential Revision 1, the differences in Ca2+ signals are not an experimental shortcoming but a physiologically relevant finding per se. These data, especially when combined with the electrophysiological data, contribute to a better understanding of these neurons’ physiological and computational properties.

      It is physiologically determined that the Ca2+ signals during odorant stimulation in the type II LNs are smaller than in type I LNs. And yes, the signals are small because small postsynpathetic Ca2+ currents predominantly cause the signals. Regardless of the imaging method, this naturally reduces the signal-to-noise ratio, making it more challenging to detect signals. To address this issue, we used a well-defined and reproducible method for analyzing these signals. In this context, we do not agree with the very general criticism of the method. The reviewer questions whether the signals are odorant-induced or just noise (see also minor point 12). If we had recorded only noise, we would expect all tuning curves (for each odorant and glomerulus) to be the same. In this context, we disagree with the reviewer's statement that the tuning curves do not represent the Ca2+ signals in Figure 4 (ROI 4 and 15) and Figure 5 (ROI 16 and 23). This debate reflects precisely the kind of 'visual inspection bias' that our clearly defined analysis aims to avoid. On close inspection, the differences in Ca2+ signals can indeed be seen. Figure II (of this letter) shows the signals from the glomeruli in question at higher magnification. The sections of the recordings that were used for the tuning curves are marked in red.

      Figure II: Ca2+ signals of selected glomeruli that were questioned by the reviewer.

      2) An additional methodological issue that compounds the first concern is that calcium signals are imaged with wide-field imaging, and signals from each ROI likely reflect out of plane signals. Out of plane artifacts will be larger for larger calcium signals, which may also make it impossible to resolve any glomerular-specific signals in the type I LNs.

      Thank you for allowing us to clarify this point. The reviewer comment implies that the different amplitudes of the Ca2+ signals indicate some technical-methodological deficiency (poorly chosen odor concentration). But in fact, this is a key finding of this study that is physiologically relevant and crucial for understanding the function of the neurons studied. These very differences in the Ca2+ signals are evidence of the different roles these neurons play in AL. The different signal amplitudes directly show the distinct physiology and Ca2+ sources that dominate the Ca2+ signals in type I and type II LNs. Accordingly, it is impractical to equalize the magnitude of Ca2+ signals under physiological conditions by adjusting the concentration of odor stimuli.

      In the following, we address these issues in more detail: 1) Imaging Method 2) Odorant stimulation 3) Cell type-specific Ca2+ signals

      1) Imaging Method:

      Of course, we agree with the reviewer comment that out-of-focus and out-of-glomerulus fluorescence can potentially affect measurements, especially in widefield optical imaging in thick tissue. This issue was carefully addressed in initial experiments. In type I LNs, which innervate a subset of glomeruli, we detected fluorescence signals, which matched the spike pattern of the electrophysiological recordings 1:1, only in the innervated glomeruli. In the not innervated ROIs (glomeruli), we detected no or comparatively very little fluorescence, even in glomeruli directly adjacent to innervated glomeruli.

      To illustrate this, FIGURE I (of this response letter) shows measurements from an AL in which an uniglomerular projection neuron was investigated in an a set of experiments that were not directly related to the current study. In this experiment, a train of action potential was induced by depolarizing current. The traces show the action potential induced fluorescent signals from the innervated glomerulus (glomerulus #1) and the directly adjacent glomeruli.

      These results do not entirely exclude that the large Ca2+ signals from the innervated LN glomeruli may include out-of-focus and out-of-glomerulus fluorescence, but they do show that the bulk of the signal is generated from the recorded neuron in the respective glomeruli.

      Figure I: Simultaneous electrophysiological and optophysiological recordings of a uniglomerular projection using the ratiometric Ca2+ indicator fura-2. The projection neuron has its arborization in glomerulus 1. The train of action potentials was induced with a depolarizing current pulse (grey bar).

      2) Odorant Stimulation: It is important to note that the odorant concentration cannot be varied freely. For these experiments, the odorant concentrations have to be within a 'physiologically meaningful' range, which means: On the one hand, they have to be high enough to induce a clear response in the projection neurons (the antennal lobe output). On the other hand, however, the concentration was not allowed to be so high that the ORNs were stimulated nonspecifically. These criteria were met with the used concentrations since they induced clear and odorant-specific activity in projection neurons.

      3) Cell type-specific Ca2+ signals:

      The differences in Ca2+ signals are described and discussed in some detail throughout the text (e.g., page 6, lines 119-136; page 9, lines 193-198; page 10-11, lines 226-235; page 14-15, line 309-333). Briefly: In spiking type I LNs, the observed large Ca2+ signals are mediated mainly by voltage-depended Ca2+ channels activated by the Na+-driven action potential's strong depolarization. These large Ca2+ signals mask smaller signals that originate, for example, from excitatory synaptic input (i.e., evoked by ligand-activated Ca2+ conductances). Preventing the firing of action potentials can unmask the ligand-activated signals, as shown in Figure 4 (see also minor comments 8. and 10.). In nonspiking type II LNs, the action potential-generated Ca2+ signals are absent; accordingly, the Ca2+ signals are much smaller. In our model, the comparatively small Ca2+ signals in type II LNs are mediated mainly by (synaptic) ligand-gated Ca2+ conductances, possibly with contributions from voltage-gated Ca2+ channels activated by the comparatively small depolarization (compared with type I LNs).

      Accordingly, our main conclusion, that spiking LNs play a primary role in interglomerular signaling, while nonspiking LNs play an essential role in intraglomeular signaling, can be DIRECTLY inferred from the differences in odorant induced Ca2+ signals alone.

      a) Type I LN: The large, simultaneous, and uniform Ca2+ signals in the innervated glomeruli of an individual type I LN clearly show that they are triggered in each glomerulus by the propagated action potentials, which conclusively shows lateral interglomerular signal propagation.

      b) Type II LNs: In the type II LNs, we observed relatively small Ca2+ signals in single glomeruli or a small fraction of glomeruli of a given neuron. Importantly, the time course and amplitude of the Ca2+ signals varied between different glomeruli and different odors. Considering that type II LNs in principle, can generate large voltage-activated Ca2+ currents (larger that type I LNS; page 4, lines 82-86, Husch et al. 2009a,b; Fusca and Kloppenburg 2021), these data suggest that in type II LNs electrical or Ca2+ signals spread only within the same glomerulus; and laterally only to glomeruli that are electrotonically close to the odorant stimulated glomerulus.

      Taken together, this means that our conclusions regarding inter- and intraglomerular signaling can be derived from the simultaneously recorded amplitudes and the dynamics of the membrane potential and Ca2+ signals alone. This also means that although the correlation analyses support this conclusion nicely, the actual conclusion does not ultimately depend on the correlation analysis. We had (tried to) expressed this with the wording, “Quantitatively, this is reflected in the glomerulus-specific odorant responses and the diverse correlation coefficiiants across…” (page 10, lines 216-217) and “ …This is also reflected in the highly correlated tuning curves in type I LNs and low correlations between tuning curves in type II LNs”(page 13, lines 293-295).

      3) Apart from the above methodological concerns, the authors' interpretation of these data as supporting inter- versus intra-glomerular signaling are not well supported. The odors used in the study are general odors that presumably excite feedforward input to many glomeruli. Since the glomerular source of excitation is not determined, it's not possible to assign the signals in type II LNs as arising locally - selective interglomerular signal propagation is entirely possible. Likewise, the study design does not allow the authors to rule out the possibility that significant intraglomerular inhibition may be mediated by type I LNs.

      The reviewer addresses an important point. However, from the comment, we get the impression that he/she has not taken into account the entire data set and the DISCUSSION. In fact, this topic has already been discussed in some detail in the original version (page 12, lines 268-271; page 15-16; lines 358-374). This section even has a respective heading: "Inter- and intraglomerular signaling via nonspiking type II LNs" (page 15, line 338). We apologize if our explanations regarding this point were unclear, but we also feel that the reviewer is arguing against statements that we did not make in this way.

      a) In 11 out of 18 type II LNs we found 'relatively uncorrelated' (r=0.43±0.16, N=11) glomerular tuning curves. These experiments argue strongly for a 'local excitation' with restricted signal propagation and do not provide support for interglomerular signal propagation. Thus, these results support our interpretation of intraglomerular signaling in this set of neurons.

      b) In 7 out of 18 experiments, we observed 'higher correlated' glomerular tuning curves (r=0.78±0.07, N=7). We agree with the reviewer that this could be caused by various mechanisms, including simultaneous input to several glomeruli or by interglomerular signaling. Both possibilities were mentioned and discussed in the original version of the manuscript (page 12, lines 268-271; page 15-16; lines 358-374). In the Discussion, we considered the latter possibility in particular (but not exclusively) for the type IIa1 neurons that generate spikelets. Their comparatively stronger active membrane properties may be particularly suitable for selective signal transduction between glomeruli.

      c) We have not ruled out that local signaling exists in type I LNs – in addition to interglomerular signaling. The highly localized Ca2+ signals in type I LNs, which we observed when Na+ -driven action potential generation was prevented, may support this interpretation. However, we would like to reiterate that the simultaneous electrophysiological and optophysiological recordings, which show highly correlated glomerular Ca2+ dynamics that match 1:1 with the simultaneously recorded action potential pattern, clearly suggest interglomerular signaling. We also want to emphasize that this interpretation is in agreement with previous models derived from electrophysiological studies(Assisi et al., 2011; Fujiwara et al., 2014; Hong and Wilson, 2015; Nagel and Wilson, 2016; Olsen and Wilson, 2008; Sachse and Galizia, 2002; Wilson, 2013).

      In light of the reviewer's comment(s), we have modified the text to clarify these points (page 14, lines 317-319).

      Reviewer #3 (Public Review):

      To elucidate the role of the two types of LNs, the authors combined whole-cell patch clamp recordings with calcium imaging via single cell dye injection. This method enables to monitor calcium dynamics of the different axons and branches of single LNs in identified glomeruli of the antennal lobe, while the membrane potential can be recorded at the same time. The authors recorded in total from 23 spiking (type I LN) and 18 non-spiking (type II LN) neurons to a set of 9 odors and analyzed the firing pattern as well as calcium signals during odor stimulation for individual glomeruli. The recordings reveal on one side that odor-evoked calcium responses of type I LNs are odor-specific, but homogeneous across glomeruli and therefore highly correlated regarding the tuning curves. In contrast, odor-evoked responses of type II LNs show less correlated tuning patterns and rather specific odor-evoked calcium signals for each glomerulus. Moreover the authors demonstrate that both LN types exhibit distinct glomerular branching patterns, with type I innervating many, but not all glomeruli, while type II LNs branch in all glomeruli.

      From these results and further experiments using pharmacological manipulation, the authors conclude that type I LNs rather play a role regarding interglomerular inhibition in form of lateral inhibition between different glomeruli, while type II LNs are involved in intraglomerular signaling by developing microcircuits in individual glomeruli.

      In my opinion the methodological approach is quite challenging and all subsequent analyses have been carried out thoroughly. The obtained data are highly relevant, but provide rather an indirect proof regarding the distinct roles of the two LN types investigated. Nevertheless, the conclusions are convincing and the study generally represents a valuable and important contribution to our understanding of the neuronal mechanisms underlying odor processing in the insect antennal lobe. I think the authors should emphasize their take-home messages and resulting conclusions even stronger. They do a good job in explaining their results in their discussion, but need to improve and highlight the outcome and meaning of their individual experiments in their results section.

      Thank you for this positive feedback.

      References:

      Assisi, C., Stopfer, M., Bazhenov, M., 2011. Using the structure of inhibitory networks to unravel mechanisms of spatiotemporal patterning. Neuron 69, 373–386. https://doi.org/10.1016/j.neuron.2010.12.019

      Das, S., Trona, F., Khallaf, M.A., Schuh, E., Knaden, M., Hansson, B.S., Sachse, S., 2017. Electrical synapses mediate synergism between pheromone and food odors in Drosophila melanogaster . Proc Natl Acad Sci U S A 114, E9962–E9971. https://doi.org/10.1073/pnas.1712706114

      Fujiwara, T., Kazawa, T., Haupt, S.S., Kanzaki, R., 2014. Postsynaptic odorant concentration dependent inhibition controls temporal properties of spike responses of projection neurons in the moth antennal lobe. PLOS ONE 9, e89132. https://doi.org/10.1371/journal.pone.0089132

      Fusca, D., Husch, A., Baumann, A., Kloppenburg, P., 2013. Choline acetyltransferase-like immunoreactivity in a physiologically distinct subtype of olfactory nonspiking local interneurons in the cockroach (Periplaneta americana). J Comp Neurol 521, 3556–3569. https://doi.org/10.1002/cne.23371

      Fuscà, D., and Kloppenburg, P. (2021). Odor processing in the cockroach antennal lobe-the network components. Cell Tissue Res.

      Hong, E.J., Wilson, R.I., 2015. Simultaneous encoding of odors by channels with diverse sensitivity to inhibition. Neuron 85, 573–589. https://doi.org/10.1016/j.neuron.2014.12.040

      Husch, A., Paehler, M., Fusca, D., Paeger, L., Kloppenburg, P., 2009a. Calcium current diversity in physiologically different local interneuron types of the antennal lobe. J Neurosci 29, 716–726. https://doi.org/10.1523/JNEUROSCI.3677-08.2009

      Husch, A., Paehler, M., Fusca, D., Paeger, L., Kloppenburg, P., 2009b. Distinct electrophysiological properties in subtypes of nonspiking olfactory local interneurons correlate with their cell type-specific Ca2+ current profiles. J Neurophysiol 102, 2834–2845. https://doi.org/10.1152/jn.00627.2009

      Nagel, K.I., Wilson, R.I., 2016. Mechanisms Underlying Population Response Dynamics in Inhibitory Interneurons of the Drosophila Antennal Lobe. J Neurosci 36, 4325–4338. https://doi.org/10.1523/JNEUROSCI.3887-15.2016

      Neupert, S., Fusca, D., Kloppenburg, P., Predel, R., 2018. Analysis of single neurons by perforated patch clamp recordings and MALDI-TOF mass spectrometry. ACS Chem Neurosci 9, 2089–2096.

      Olsen, S.R., Bhandawat, V., Wilson, R.I., 2007. Excitatory interactions between olfactory processing channels in the Drosophila antennal lobe. Neuron 54, 89–103. https://doi.org/10.1016/j.neuron.2007.03.010

      Olsen, S.R., Wilson, R.I., 2008. Lateral presynaptic inhibition mediates gain control in an olfactory circuit. Nature 452, 956–960. https://doi.org/10.1038/nature06864

      Sachse, S., Galizia, C., 2002. Role of inhibition for temporal and spatial odor representation in olfactory output neurons: a calcium imaging study. J Neurophysiol. 87, 1106–17.

      Shang, Y., Claridge-Chang, A., Sjulson, L., Pypaert, M., Miesenbock, G., 2007. Excitatory Local Circuits and Their Implications for Olfactory Processing in the Fly Antennal Lobe. Cell 128, 601–612.

      Wilson, R.I., 2013. Early olfactory processing in Drosophila: mechanisms and principles. Annu Rev Neurosci 36, 217–241. https://doi.org/10.1146/annurev-neuro-062111-150533

      Yaksi, E., Wilson, R.I., 2010. Electrical coupling between olfactory glomeruli. Neuron 67, 1034–1047. https://doi.org/10.1016/j.neuron.2010.08.041

    1. Author Response:

      Reviewer #1 (Public Review):

      The paper is a tour-de-force across multiple techniques and model systems from classical forward screening in C. elegans over ChIP to targeted CRISPR mutagenesis. The data is of a very high quality and supports most of the authors' claims strongly and convincingly. Finally, the manuscript is well written and, in spite of complex experiments and genetics, interesting and easy to comprehend.

      • CAMTA, as the name CaM-binding transcription activator implies, have been studied previously and across many different organisms including plants, mice and humans. It was thus presumed and in part shown that CAMTAs regulate transcription depending on CaM levels.
      • The authors confirm that the gene cmd-1 (encoding CaM) is directly regulated by Camt-1 by using a combination of cell-specific RNAseq and ChIP. This allows them to identify three binding sites upstream of the cmb-1 gene that bind to Camt-1.
      • Moreover, the authors show that overexpression of CaM in the nervous system fully rescues the observed behavioral phenotypes.
      • Importantly, the authors make another discovery. They show that CaM can directly repress its own transcription by binding to specific residues of Camt-1. Thereby, the authors argue, Camt-1 is used to precisely and bidirectionally regulate CaM levels dependent on the cell, animal's state etc.

      The reported data are interesting and, in particular, the aspect that CAMTAs likely act as activators AND repressors is a novel aspect previously not appreciated. In spite of all these strengths, a potential weakness is that it remains open whether this mechanism is primarily a house-keeping mechanism or is indeed, as the authors speculate, regulated by internal and external factors that might, through CAMTA, make cells more or less responsive to Ca2+-CaM signaling.

      We are grateful for and encouraged by our reviewer’s comments. We think that our discovery that CAMTAs regulate CaM expression is thought provoking.

      Reviewer #2 (Public Review):

      Vuong-Brender, Flynn, and de Bono report a detailed analysis of the function of a highly conserved calcium-calmodulin-dependent transcriptional regulator in the function of the C. elegans sensory nervous system. The C. elegans homolog of this factor - CAMT-1 - emerged from a genetic screen for mutants defective in a sensory-driven aggregation behavior. The authors find that multiple chemosensory modalities are disrupted by loss of CAMT-1, and this factor has distributed functions in the nervous system, including in interneurons that receive inputs from sensory neurons. A major finding of this study is that many of the effects of CAMT-1 mutation can be linked to a critical role for CAMT-1 in regulating expression of calmodulin itself. This finding is supported by multiple lines of experimentation, including a demonstration that the effects of losing CAMT-1 can be compensated by restoring expression of calmodulin. The authors further show that what is true for CAMT-1 and calmodulin in C. elegans also applies to Drosophila, indicating that CAMT-1 is a regulator of calmodulin expression whose function has been conserved throughout evolution. This manuscript has many strengths. Key hypotheses are tested using quantitative and technically independent experimental methods. The case that CAMT-1 is a regulator of calmodulin expression is built carefully and, for the most part, the logic of the argument is made clearly and supported by compelling data. Another strength of the manuscript is its candid exposition of data that do not fit neatly into the most simple and accessible model. It is refreshing to see authors who freely admit that they haven't neatly wrapped up every question in a field. The loose ends in this study do not impact the authors' main conclusions. However, some observations seem to consume more bandwidth than warranted, and the authors should consider reorganizing the manuscript so that the loose ends do not distract from the main thread of the narrative. The paper does have a few minor weaknesses that could be addressed. These are listed below.

      We thank our reviewer for their thoughtful review.

      Specific comments:

      1. The initial description of the isolation of camt-1 mutants seemed a bit disorganized. A description of the gene and gene product preceded descriptions of the mutants. Also, some mutants were mentioned in the text but not presented in the corresponding figure. The authors should consider minor changes to better communicate how the mutations were cloned.

      We have sought to do this.

      1. In Fig. 2 npr-1 baselines vary a great deal between panels A, B, and C. It is not clear why npr-1 behavior is this variable, and the authors do not mention this obvious feature of their data. Data presented in Fig. 2 indicate that heat-shock-induced expression of camt-1 restores a defect in basal locomotion, but it is unclear whether it restores O2-sensitivity - the effect of oxygen on speed of transgenics seems the same +/- heatshock (compare black traces in panels 2B and 2C). We understand the concern of the reviewer. Since the design of these experiments was different from the rest (with only one shift in O2 concentration), we repeated them with 3 O2 changes, bringing them in line with the rest of the manuscript. The results are presented in the new Figure 2. We observed a more consistent baseline speed between different conditions, however some differences still exist (for example between panel 2A and 2B). One explanation is that for heatshock experiments we keep npr-1 animals at lower temperature (20 degree Celsius, panels 2B and 2C) to minimize basal activity of the heatshock promoter, whereas in the rescue experiment in Figure 2A, and in the rest of the manuscript, animals were kept at 22 oC. Figure 2B-C of our original submission used worms raised at 15 oC for the heatshock experiment, which may explain the greater discrepancy in npr-1 speed values. Heatshock also modifies slightly the response of the npr-1 control animals to O2.

      Regarding whether heat-shock-induced expression of camt-1 restores O2 responses, we found that the npr-1; camt-1; dbExhsp-16p::camt-1 heat-shocked strains aggregated much more than npr-1; camt-1 heat-shocked animals. However, the rescue is not complete. Thus expressing camt-1 using heatshock-induced expression restores some O2 sensitivity which correlates well with the partial rescue of the baseline in Figure 2C. We have noted this in the results.

      1. Unlike other datasets, the responses of wild-type AFDs to CO2 do not look particularly convincing (panel 3C). There is clearly an effect of camt-1 mutation on AFD calcium, but the AFD responses seem qualitatively different from the responses of BAGs to CO2 or URXs to O2. The authors might consider moving these data to a supplementary figure and tempering their description of wild-type AFDs as CO2-sensors.

      The data on AFD has been moved to Figure 3 – figure supplement 1. We should add that we agree that in the absence of an identified CO2 sensor expressed in AFD, we cannot be sure that AFD neurons are primary CO2 sensors. Although the AFD CO2-evoked responses are retained in mutants defective in synaptic transmission, they may very well still be indirectly evoked by other neurons.

      1. The authors candidly present data that do not conform to a simple model for how camt-1 affects behavior. Loss of camt-1 increases calcium in sensory neurons that activate the speed-controlling interneuron RMG. However, RMG calcium is reduced in camt-1 mutants. This inversion in the effect of camt-1 mutation might be caused by a homeostatic mechanism, as the authors propose. It might be possible to test this hypothesis by testing whether reducing excitatory input into RMGs elevates resting calcium in camt-1 mutants, for example via mutations that affect sensory transduction.

      In the interest of simplifying the manuscript, and given other comments, we have now removed the RMG Ca2+ imaging data. However, this is an interesting way of testing what is going.

      1. In Fig. 4H RMG data are presented as fractional ratio change - all other imaging data are presented as absolute ratios of YFP and CFP fluorescence. It is not clear why these data are treated differently. It is also no clear that these data are consistent with data shown in Fig. 3F. Which dataset represents the effect of camt-1 mutation on RMG calcium? More measurements might be warranted.

      As highlighted above we have removed the RMG imaging data from the paper. .

      1. Nice experiments show that regulation of calmodulin in Drosophila requires a CAMT-1 homolog. The bar graphs showing unity for values normalized to themselves are a bit odd - perhaps there's a more compact way to plot these data.

      We have sought to address this question in two ways. First, we have further buttressed our results by performing in situ immunofluorescence staining of dissected fly retinas with a calmodulin antibody. We see a significant decrease in calmodulin expression in fly CAMTA mutants compared to controls.

      Prompted by this comment, we also realized we omitted an explanation of how we normalized the data for the qPCR graphs in the figure legend. This was done using rRNA as a control. The Yamamoto lab had previously used the same control to normalize CAMTA expression in wild type and mutant flies. We add a note saying this.

      1. ChIPseq analysis of CAMT-1 is also quite nice. Is there a sequence motif for CAMT-1 binding that emerges from this study? If so, how does this motif compare to motifs from studies of CAMT-1 homologs in other species?

      We used the MEME algorithm, (motif-based sequence analysis tools (https://meme-suite.org/) to seek enriched sequence motifs in our ChIPSeq data. This identified a series of enriched motifs, although none coincided with the peaks at the CMD-1 promoter. However, we did observe sequences resembling the mouse CAMTA1 binding site at the centre of each of the three CAMT-1 binding peaks upstream of cmd-1. We now say this is the discussion.

      1. Figure 7 shows that CMD-1 inhibits cmd-1 expression via interaction with CAMT-1. These data are interesting, but it is not clear how this effect can be related to prior data showing that forced expression of CMD-1 can compensate for loss of CAMT-1. The authors behavioral and physiological studies suggest that in vivo CAMT-1 promotes CMD-1 expression. In Figure 7, they suggest that CAMT-1 inhibits expression of CMD-1, but there is no clear link to behavior or physiology for this repressor-function of CAMT-1. The manuscript might be more clear without these data, and the absence of these data would not affect the overall impact of the study.

      We agree that the feedback control of cmd-1 gene expression by CMD-1 interacting with CAMT-1 is a part of the story that has not been fully developed. Given the feedback from our reviewers and Editors to give these findings less prominence, but not remove them entirely, we moved the data into supplementary information. We have also altered the main text and the legend of Figure 7 to explicitly say that further experiments are needed to establish if this feedback is relevant under physiological conditions.

      Reviewer #3 (Public Review):

      Vuong-Brender et al present a thorough study investigating how CaM-binding transcription activators (CAMTAs) in C. elegans and Drosophila are required for numerous behaviors and proper neuronal function. The study is strong in how it uses a variety of approaches to study a major underlying mechanism for CAMTA. First, they use reporters, mutant analysis, and heat-shock rescue to show how cart-1 is expressed widely in neurons and functions in adults in several behaviors. They used transcriptional profiling to show that cart-1 is required to upregulate CaM in subsets of neurons in worm. They next use ChIP-seq to zero in on where worm CAMT-1 binds regulatory regions upstream of the CaM gene cmd-1 to promote its expression. They find that overexpression of CaM compensates for behavioral and neuronal response deficits in a cart-1 mutants. Lastly, they propose that when CaM highly expressed, it may down regulate its own expression by binding CART-1.

      We thank our reviewer for their critique of our work.

      1. Overall, I feel that the study is excellent and most conclusions are justified by evidence. However, I do not think the title is supported by the data. It currently is listed as: CAMTA TUNES NEURAL EXCITABILITY AND BEHAVIOR BY MODULATING CALMODULIN EXPRESSION. The authors show evidence that camt-1 is required for the normal function of neurons and behavior by promoting expression of CaM. Their only evidence that camt-1 downregulates CaM is a more artificial situation where CaM is overexpressed. I don't think they provide any evidence that camt-1 is used to "tune" behavior or neuron activity up and down in a wild-type strain. Tuning implies that the molecule modulates a physiological system bidirectionally in a natural situation. I suggest using a more accurate title that better fits the experimental evidence.

      We have changed the title to ‘Neuronal Calmodulin levels are controlled by CAMTA transcription factors’. We hope this more neutral title is appropriate to describe our findings.

      1. They show ample evidence that cart-1 appears to promote the expression of cmd-1 in most cases. This includes showing that overexpression of cmd-1 suppresses the behavioral and imaging phenotypes of cart-1. But they didn't perform the more straight forward epistasis test with the cart-1;cmd-1 double mutant in worm or fly , presumably because there is no viable loss-of-function allele in the coding area of the cmd-1 gene. It would help the readers understand why this simpler experiment was not performed if they explain this in the paper. A good place would be near line 220, where they generate hypomorphic promoter alleles using CRISPR. If they have tried to make their own loss-of-function alleles by mutating the coding area of cmd-1, but it resulted in presumed lethality, this might be mentioned here too.

      This is a good point, and one that we had overlooked. cmd-1 loss of function mutations do indeed confer lethality. We have added a sentence to say:

      ‘Straightforward comparison of camt-1 and cmd-1 loss of function phenotypes was not possible, since disrupting cmd-1 confers lethality (7, 8).’

      1. I am most worried about the potential caveats with the calcium imaging experiments. As the authors note, it is challenging to infer absolute levels of calcium using the ratiometric sensor cameleon across different individuals and genotypes. However, the authors do not note that the YFP/CFP FRET signal from cameleon might be perturbed because it uses calmodulin to bind calcium. At the end of their study (line 244), they provide evidence that calmodulin may bind to CART-1 to suppress its own expression when calmodulin is highly expressed. This is worrisome because cameleon is probably expressed highly in some or most of these strains. The authors may want to re-examine neuronal activity for a subset of experiments with a method that is independent of a calmodulin-based sensor (if possible).

      We agree that this is a potential concern. As suggested by our referee, we therefore repeated some of our Ca2+ imaging experiments using a genetically-encoded Ca2+ indicator that does not contain CaM. We opted to use TN-XL, an indicator that uses troponin C as the Ca2+ binding moiety, and which has previously been used successfully in C. elegans. We imaged CO2-evoked Ca2+ responses in BAG sensory neurons, in wild type and in camt-1 mutant animals. The data obtained using TN-XL recapitulated what we observed using YC3.60 (BAG).

      1. The title of "Fig 3 - Figure supplement 1" is confusing because it suggests that they measured the levels of YC2.60 cameleon, when in fact they measured a separate GFP reporter, albeit using the same promoter. So they could clarify the figure title.

      The reviewer is right – our heading was confusing. We have changed it, and now say: ‘Expression from the gcy-37 promoter is reduced when CAMT-1 is overexpressed.’

      1. D. Bazopoulou, A. R. Chaudhury, A. Pantazis, N. Chronis, An automated compound screening for anti-aging effects on the function of C. elegans sensory neurons. Sci Rep 7, 9403 (2017).
      2. M. S. Choi et al., Isolation of a calmodulin-binding transcription factor from rice (Oryza sativa L.). J Biol Chem 280, 40820-40831 (2005).
      3. J. Han et al., The fly CAMTA transcription factor potentiates deactivation of rhodopsin, a G protein-coupled light receptor. Cell 127, 847-858 (2006).
      4. N. Bouche, A. Scharlat, W. Snedden, D. Bouchez, H. Fromm, A novel family of calmodulin-binding transcription activators in multicellular organisms. J Biol Chem 277, 21851-21861 (2002).
      5. T. Yang, B. W. Poovaiah, A calmodulin-binding/CGCG box DNA-binding protein family involved in multiple signaling pathways in plants. J Biol Chem 277, 45049-45058 (2002).
      6. E. Kodama-Namba et al., Cross-modulation of homeostatic responses to temperature, oxygen and carbon dioxide in C. elegans. PLoS Genet 9, e1004011 (2013).
      7. V. Au et al., CRISPR/Cas9 Methodology for the Generation of Knockout Deletions in Caenorhabditis elegans. G3 (Bethesda) 9, 135-144 (2019).
      8. A. Karabinos et al., Functional analysis of the single calmodulin gene in the nematode Caenorhabditis elegans by RNA interference and 4-D microscopy. Eur J Cell Biol 82, 557-563 (2003).
    1. As Schilt and Westbrook (2009) argue, the binary gender system is a heterosexist one, which privileges masculinity and straightness over femininity and queerness.

      I think this is really interesting. As a cis woman I can recognise where acting straight and feminine in society can have its benefits, rather than deviating away from the "norm". Although we find in many aspects fitting into the traditional norm can be beneficial potentially to acquiring government funds, or being more desirable for a job position because you act in a certain way. Although these are not fair, it is seen often.

      However, I would argue everyone is in some way queer. When alone, or with those they are comfortable with, may present traits that are not considered as desirable by society. Whether it be based gender roles throughout the household. For an example, my partner and father of my children, doing the nightly routine with our children, playing games such as dress ups and tea parties, reading books with the children and giving them affection and more on the emotional level, all whilst I work in the office. While we believe this is equal, would society suggest these activities are "queer" for a "Male" to be doing? Or are we at a place where these activities are more accepted? (I believe it is more accepted" but based on traditional nuclear families, are these activities queer?

      This is just one example but could relate to many activities that go on behind closed doors, that the public eye does not see where people would generally act out of typical "straightness, feminine or masculine roles". I would argue people are more 'queer" in those situations. I hope this makes sense.

    1. Skip navigation Search Search Search with your voice 9+ {"@context":"https://schema.org","@type":"VideoObject","description":"The Pretty Reckless - And So It Went [feat. Tom Morello] (Official Lyric Video)\nFrom the upcoming album 'Death By Rock And Roll' | Available February 12, 2021: http://found.ee/dbrr \nSubscribe to The Pretty Reckless on YouTube: https://found.ee/tpr_subscribeyt\n\nPhoto by: Rob Fenn\n\nPre-order/Pre-Save the album 'Death By Rock And Roll': \niTunes: http://found.ee/dbrr_it\nApple Music: http://found.ee/dbrr_am\nSpotify: http://found.ee/dbrr_sp\nAmazon Music: http://found.ee/dbrr_amzm\nAmazon: http://found.ee/dbrr_amz\nYouTube Music: http://found.ee/dbrr_ytm\n\nStay connected with The Pretty Reckless\nWebsite: https://deathbyrockandroll.com/\nFacebook: https://found.ee/tpr_facebook\nTwitter: https://found.ee/tpr_twitter\nInstagram: https://found.ee/tpr_instagram\n\nLYRICS\nAnd so it went the children lost their minds\n\nBegging for forgiveness was such a waste of time\n\nAnd the bullets start to fly\n\nAnd the bough’s about to break\n\nWhen you hear them cry\n\nIt’s too much for me to take\n\n\nThe world does not belong to you\n\nYou are not the king I am not the fool\n\nThey said the world does not belong to you\n\nIt don’t belong to you\n\nIt belongs to me\n\n\nAnd, so it went the children lost their minds\n\n\nCrawling over bodies of those who gave their lives\n\n\nAnd the fists begin to throw\n\n\nAnd the fire starts to blaze\n\n\nDon’t you think they know\n\n\nThey’re the fucking human race\n\n\nThe world does not belong to you\n\n\nYou are not the king, I am not the fool\n\n\nThey said the world does not belong to you\n\n\nIt don’t belong to you\n\n\nIt belongs to\n\nEveryone is crying out, I can hear them scream\n\n\nWith all these eyes upon us but no one seems to see\n\n\nThat you and me are just the same as god meant it to be\n\n\nBut you’re much too close to me. \n\n\nYou’re much too close to me\n\n\nSo it went\n\n\nThe children lost their minds\n\n\nNowhere to run, nowhere to hide \n\nAnd the wind begins to howl\n\n\nAnd the wolf is at your door\n\n\nYou have so much of everything\n\n\nBut still you wanted more\n\nThey said the world does not belong to you\n\nYou are not the king, I am not the fool\n\nThey said the world does not belong to you\n\nIt don’t belong to you\n\nIt belongs to me\n\n#ThePrettyReckless #DeathByRockAndRoll","duration":"PT267S","embedUrl":"https://www.youtube.com/embed/0MpJv8DW4_U","interactionCount":"638635","name":"The Pretty Reckless - And So It Went [feat. Tom Morello] (Official Lyric Video)","thumbnailUrl":["https://i.ytimg.com/vi/0MpJv8DW4_U/maxresdefault.jpg"],"uploadDate":"2021-01-11","genre":"Music","author":"The Pretty Reckless"} The Pretty Reckless - And So It Went [feat. Tom Morello] (Official Lyric Video)Watch laterShareCopy linkInfoShoppingTap to unmuteIf playback doesn't begin shortly, try restarting your device.Miniplayer (i)You're signed outVideos you watch may be added to the TV's watch history and influence TV recommendations. To avoid this, cancel and sign in to YouTube on your computer.CancelConfirmUp nextLiveUpcomingCancelPlay NowSwitch cameraShareInclude playlistAn error occurred while retrieving sharing information. Please try again later.0:004:260:03 / 4:26Live•Scroll for details #ThePrettyReckless #DeathByRockAndRoll The Pretty Reckless - And So It Went [feat. Tom Morello] (Official Lyric Video) 638,635 views • Jan 11, 2021 • The Pretty Reckless - And So It Went [feat. Tom Morello] (Official Lyric Video) From the upcoming album 'Death By Rock And Roll' | Available February 12, 2021: http://found.ee/dbrr Subscribe to The Pretty Reckless on YouTube: https://found.ee/tpr_subscribeyt Photo by: Rob Fenn Pre-order/Pre-Save the album 'Death By Rock And Roll': iTunes: http://found.ee/dbrr_it Apple Music: http://found.ee/dbrr_am Spotify: http://found.ee/dbrr_sp Amazon Music: http://found.ee/dbrr_amzm Amazon: http://found.ee/dbrr_amz YouTube Music: http://found.ee/dbrr_ytm Stay connected with The Pretty Reckless Website: https://deathbyrockandroll.com/ Facebook: https://found.ee/tpr_facebook Twitter: https://found.ee/tpr_twitter Instagram: https://found.ee/tpr_instagram LYRICS And so it went the children lost their minds Begging for forgiveness was such a waste of time And the bullets start to fly And the bough’s about to break When you hear them cry It’s too much for me to take The world does not belong to you You are not the king I am not the fool They said the world does not belong to you It don’t belong to you It belongs to me And, so it went the children lost their minds Crawling over bodies of those who gave their lives And the fists begin to throw And the fire starts to blaze Don’t you think they know They’re the fucking human race The world does not belong to you You are not the king, I am not the fool They said the world does not belong to you It don’t belong to you It belongs to Everyone is crying out, I can hear them scream With all these eyes upon us but no one seems to see That you and me are just the same as god meant it to be But you’re much too close to me. You’re much too close to me So it went The children lost their minds Nowhere to run, nowhere to hide And the wind begins to howl And the wolf is at your door You have so much of everything But still you wanted more They said the world does not belong to you You are not the king, I am not the fool They said the world does not belong to you It don’t belong to you It belongs to me #ThePrettyReckless #DeathByRockAndRoll Show less Show more 20K277ShareSave 20,184 / 277 The Pretty Reckless The Pretty Reckless Official Artist Channel 1.49M subscribers Subscribed #ThePrettyReckless #DeathByRockAndRollThe Pretty Reckless - And So It Went [feat. Tom Morello] (Official Lyric Video)638,635 views638K viewsJan 11, 202120K277ShareSave 20,184 / 277 The Pretty Reckless The Pretty Reckless Official Artist Channel 1.49M subscribers Subscribed The Pretty Reckless - And So It Went [feat. Tom Morello] (Official Lyric Video) From the upcoming album 'Death By Rock And Roll' | Available February 12, 2021: http://found.ee/dbrr Subscribe to The Pretty Reckless on YouTube: https://found.ee/tpr_subscribeyt Photo by: Rob Fenn Pre-order/Pre-Save the album 'Death By Rock And Roll': iTunes: http://found.ee/dbrr_it Apple Music: http://found.ee/dbrr_am Spotify: http://found.ee/dbrr_sp Amazon Music: http://found.ee/dbrr_amzm Amazon: http://found.ee/dbrr_amz YouTube Music: http://found.ee/dbrr_ytm Stay connected with The Pretty Reckless Website: https://deathbyrockandroll.com/ Facebook: https://found.ee/tpr_facebook Twitter: https://found.ee/tpr_twitter Instagram: https://found.ee/tpr_instagram LYRICS And so it went the children lost their minds Begging for forgiveness was such a waste of time And the bullets start to fly And the bough’s about to break When you hear them cry It’s too much for me to take The world does not belong to you You are not the king I am not the fool They said the world does not belong to you It don’t belong to you It belongs to me And, so it went the children lost their minds Crawling over bodies of those who gave their lives And the fists begin to throw And the fire starts to blaze Don’t you think they know They’re the fucking human race The world does not belong to you You are not the king, I am not the fool They said the world does not belong to you It don’t belong to you It belongs to Everyone is crying out, I can hear them scream With all these eyes upon us but no one seems to see That you and me are just the same as god meant it to be But you’re much too close to me. You’re much too close to me So it went The children lost their minds Nowhere to run, nowhere to hide And the wind begins to howl And the wolf is at your door You have so much of everything But still you wanted more They said the world does not belong to you You are not the king, I am not the fool They said the world does not belong to you It don’t belong to you It belongs to me #ThePrettyReckless #DeathByRockAndRoll Show less Show more The Pretty Reckless - On Tour Google has partnered with official 3rd party sellers listed below to show you ticketed events with the performer or content used in the video. The links and information listed below are from those sellers and may change. For a given show, if there are multiple sellers, the sellers are listed in alphabetical order. Click on links for more information and to buy from these sellers' sites. Your activities and purchases made on 3rd party sellers' sites are governed by the seller's terms and conditions (including their privacy policies). The artist and/or their label may receive compensation from these links. Oct 3 Upcoming show · Nashville, TN Sun 7:00 PM · The Cowan Ticketmaster VIEW TICKETS Buy The Pretty Reckless merchandise The Pretty Reckless - Death By Rock And Roll Coke Bottle Clear Vinyl $28.00 Merchbar This item is a PRE-ORDER. Ships on or before 10/15/2021. 2LP Gatefold w/ Special D-Side Vinyl Etching Coke Bottle Clear Vinyl Limited to 1000 1 Death By Rock And Roll2 Only Love Can Save Me Now3 And So It Went4 255 My Bones6 Got So High7 Broomsticks8 Witches Burn9 Standing At The Wall10 Turning Gold11 Rock And Roll Heaven12 Harley Darling The Pretty Reckless store / shop. SHOP Merchbar The Pretty Reckless - Skull-Cycle Hoodie $60.00 Merchbar The Pretty Reckless store / shop. SHOP Merchbar The Pretty Reckless - Harley Long Sleeve $45.00 Merchbar The Pretty Reckless store / shop. SHOP Merchbar The Pretty Reckless - Death By Rock And Roll Black Vinyl $28.00 Merchbar 2LP Gatefold w/ Special D-Side Vinyl Etching Classic Black 180g Vinyl 1 Death By Rock And Roll2 Only Love Can Save Me Now3 And So It Went4 255 My Bones6 Got So High7 Broomsticks8 Witches Burn9 Standing At The Wall10 Turning Gold11 Rock And Roll Heaven12 Harley Darling The Pretty Reckless store / shop. SHOP Merchbar The Pretty Reckless LIGHT ME UP CD $14.84 Merchbar LIGHT ME UP CD by The Pretty Reckless. Every CD is brand new, shipped in original factory-applied shrink wrap, and has never been touched by human hands. From the The Pretty Reckless store / shop. SHOP Merchbar 802 Comments Sort comments Sort by Top comments Newest first Add a public comment...   0/ Cancel Comment Adam Dobrin 1 second ago • VERBATIM: "stop hurting people, stop hurting people. if you do not believe people should be hurt" ... ((ishing)) ~either disable the attackers or leave the area~ Show less Read more 1 second ago 1 Like 1 Reply   0/ Cancel Reply Add a public reply... Adam Dobrin 1 second ago • sometimes words have two meanings. sometimes "kill" means "excorcized the demons" @LOUDON @SALEM Show less Read more 1 second ago 0 0 Dislike Reply   0/ Cancel Reply Add a public reply... Raziel Moreno 7 months ago • I'm so damn happy she chose to rock instead of staying as an actress. Show less Read more 7 months ago 426 426 Reply View 3 replies Hide 3 replies sonya fuks 7 months ago • How is it possible they just keep on getting better and better with each album?!?! Show less Read more 7 months ago 341 341 Reply View 12 replies Hide 12 replies Graziano D'Ovidio 7 months ago • I love her voice. So much. Show less Read more 7 months ago 652 652 Reply View 18 replies Hide 18 replies Natsumi Ikari 7 months ago • Most of people know Tom Morello from RATM of Prophets of Rage, but don't forget he was also the guitarist of Audioslave, whose singer was Chris Cornell. Must be emotional for Taylor to record this song him. I'm sure Chris is proud of them both. Show less Read more 7 months ago 631 631 Reply View 28 replies Hide 28 replies Matstrr Salazar 7 months ago • This is why TPR is one of my all time favorite bands. They release bad-ass singles and albums Show less Read more 7 months ago 322 322 Reply View reply Hide reply Mike Whicker 7 months ago • The entire death by rock n roll album is first on my list of reasons why 2021 will not stink as much as last year did! Show less Read more 7 months ago 180 180 Reply View 5 replies Hide 5 replies Mackenzie Dinkins 5 months ago • A good theme for this year's Elimination Chamber event. Show less Read more 5 months ago 25 25 Reply View 2 replies Hide 2 replies FDCAnselmo 7 months ago • The Anthem of Youth... Fuck me what a tune. Show less Read more 7 months ago 159 159 Reply View 8 replies Hide 8 replies norangutan 7 months ago • I love how obviously this song draws inspiration from Audioslave’s sound, and in a way it’s paying homage to chris cornell especially with the “you have so much of everything still you wanted more” line - which is almost the same as a line from Audioslave’s ‘What you are’ Show less Read more 7 months ago 165 165 Reply View 7 replies Hide 7 replies Shiteyanyo 7 months ago • Tom Morello?! Each single they release for this album just keeps getting better and better Show less Read more 7 months ago 55 55 Reply View reply Hide reply Lau pin 7 months ago • I fell in love with every song they released Show less Read more 7 months ago 90 90 Reply View 4 replies Hide 4 replies 🕸️Haley Munster🕷️ 7 months ago • I really hope this album has more Going to Hell vibes 🥺🥺🥺🖤 Show less Read more 7 months ago 201 201 Reply View 22 replies Hide 22 replies M K 7 months ago • Great tune and a good example for why I think TPR is among the best of the current rock bands. They always have catchy riffs and melodies however, especially in the bridge, you never know which turn the song is gonna take. Yet it always fits the song and never sounds out of place. Show less Read more 7 months ago 15 15 Reply Mike B 7 months ago • Underrated band Show less Read more 7 months ago 154 154 Reply View 8 replies Hide 8 replies 🕸️Haley Munster🕷️ 7 months ago • THIS is the Pretty Reckless I love !! 💙 Show less Read more 7 months ago 45 45 Reply View reply Hide reply pedrogoularth 7 months ago • This is SO fucking good! 💖 🇧🇷 Show less Read more 7 months ago 31 31 Reply Nick Brick 7 months ago • Kicking the year off with a banger. Looking forward to the album! Show less Read more 7 months ago 19 19 Reply Jo Mill Hyde 6 months ago • The fucking power in her voice and this song Show less Read more 6 months ago 49 49 Reply View reply Hide reply HT82 Smash 6 months ago • WWE main roster stepping up their game with these theme songs. Elimination Chamber 2021 everyone Show less Read more 6 months ago 17 17 Reply View 2 replies Hide 2 replies Hide 7 months ago • The addition of the children's choir killed and buried me. 😔🤘🏾 Show less Read more 7 months ago 80 80 Reply View 5 replies Hide 5 replies Maryalee Scarlet 7 months ago • Will 2021 be the year that the world realizes this band is a MAJOR TALENT? We await. Show less Read more 7 months ago 9 9 Reply AzizaX Blue 7 months ago • FUCK YEAH!! I love this new era of TPR! 🤘🏽🤘🏽🤘🏽🤘🏽 Show less Read more 7 months ago 153 153 Reply View 5 replies Hide 5 replies karantinada sıkılmış biri 6 months ago • i cannot wait till 12 february. im sure the album gonna be awesome Show less Read more 6 months ago 13 13 Reply Diego Santos 7 months ago • Amooo. Taylor você e sua banda são incríveis... Show less Read more 7 months ago 34 34 Reply Michelle Szymanski 7 months ago • SHE BEOUGHT THE KIDS BACK I CANT - Show less Read more 7 months ago 346 346 Reply View 13 replies Hide 13 replies Tim Ferderer 7 months ago • Thank YOU for providing us with relevant music. Growing up in the 60's with all the turmoil then there was always music to help express how the people/youth felt. It's nice to see someone continue that. Show less Read more 7 months ago 20 20 Reply View 5 replies Hide 5 replies Shaun X 7 months ago • Don’t really know how to describe the solo other than FUCK YEAH! Show less Read more 7 months ago 22 22 Reply View 2 replies Hide 2 replies Adriano Diniz 7 months ago • I feel so powerfull hearing TPR Show less Read more 7 months ago 30 30 Reply deathbyrockandroll 7 months ago (edited) • YES I LOVE THIS SONG SO MUCH 🖤🎸 Also I love the 'Heaven Knows' vibes with the children's choir Show less Read more 7 months ago (edited) 22 22 Reply View reply Hide reply quinnsi 7 months ago • THIS.IS.MUSIC. So f'cking powerful, still love every second. And this timing..... Rock on, dudes! All the strength and love to you. ✌️ Show less Read more 7 months ago 25 25 Reply Lio Murdest 7 months ago • The rusty parts of her voice are so amazing, I can't get enough. Show less Read more 7 months ago 7 7 Reply View reply Hide reply TARDIStraveller96 7 months ago • This is such a damn bop Show less Read more 7 months ago 10 10 Reply Creepy Clown 7 months ago • I swear. Everytime they release an album, I always think it's their best... everytime ♥️♥️♥️ Show less Read more 7 months ago 8 8 Reply View reply Hide reply Choongie Studio HOME 6 months ago • Oh shoot! It's Elimination Chamber time!!! Show less Read more 6 months ago 4 4 Reply View reply Hide reply B_d_v N 6 months ago • This deserves a music video. Sounds like road rage. Taylor and the band in spandex and bullet bikes/muscle cars. Show less Read more 6 months ago 4 4 Reply Joe Martin 6 months ago • "the world does not belong to you, you are not the king, I am not the fool." I think this song has been in everyone's subconscious and finally, it has lyrics and a musical form now. i normally don't listen to radio, I was driving to krogers and forgot my iPod, this song was playing. her voice sounded familiar, but they never said who it was, so i had to try to figure it out. Show less Read more 6 months ago 4 4 Reply Ripley 7 months ago • every album gets better. can’t wait to hear the other new songs soon 🖤 Show less Read more 7 months ago 4 4 Reply Charyxard 6 months ago • And so it went. Perfect loving every bit of this song can't wait for the album so soon. Show less Read more 6 months ago 1 1 Reply TPR' Lawyer 7 months ago • There's something so powerful about The Pretty Reckless🤘🏽⚡ Show less Read more 7 months ago 13 13 Reply View 2 replies Hide 2 replies Palma 7 months ago (edited) • Arriba The Pretty Reckless! Y saludos desde México. This song is fucking amazing. Show less Read more 7 months ago (edited) 13 13 Reply View reply Hide reply NaN / undefined Why this ad? Try Google Fi Ad fi.google.com Sign Up All The Pretty Reckless Pop Music Related Watched 1:16:03 Now playing S.O.A.D Greatest Hits 2021 - S.O.A.D Best Songs Playlist HARD ROCK COLLECTION HARD ROCK COLLECTION • 552K views 6 months ago 1:57:34 Now playing Corey Taylor - Live in London (Full Show) Corey Taylor Corey Taylor Official Artist Channel • 12M views 3 years ago 109 Now playing Modern Rock Hits YouTube Music YouTube Music • 1:47:37 Now playing Best of 90s Rock - 90s Rock Music Hits - Greatest 90s Rock songs Memory Music Boxx. Memory Music Boxx. • 19M views 2 years ago 49:52 Now playing Fall Out Boy - The Young Blood Chronicles (Uncut Longform Video) Fall Out Boy Fall Out Boy Official Artist Channel • 19M views 7 years ago 2:54:28 Now playing Dave Matthews Band - 07/23/2021 {Full Show | 4K} Walnut Creek Amphitheatre - Raleigh, NC Andy Mendoza Andy Mendoza • 42K views 3 weeks ago 1:01:46 Now playing Sheryl Crow - Outlaw Music Festival - Live in Milwaukee, WI (Summerfest 2017) Crow Archives Crow Archives • 1M views 4 years ago 54:10 Now playing Evanescence - Acoustic Songs Evanescence vds Evanescence vds • 2.7M views 5 years ago 1:16:16 Now playing Halestorm - Live from Download UK 2019 Halestorm Halestorm Official Artist Channel • 685K views 1 year ago 1:18:34 Now playing Shinedown - The Sound Of Madness (Deluxe Edition) (Full Album) The Rock Cafe The Rock Cafe • 212K views 4 months ago 45:56 Now playing Linkin Park - Hybrid Theory [Full Album] 2000 quake quake • 5M views 9 months ago 46:50 Now playing Matchbox Twenty - Yourself or Someone Like You (Full Album) Gal Godonut Gal Godonut • 1.4M views 1 year ago 2:24:37 Now playing E̲v̲a̲n̲escence - The Bitter Truth (Deluxe) [Full Album] Santiago Lee Santiago Lee • 222K views 4 months ago 1:22:20 Now playing E V A N E S C E N C E Greatest Hits Full Album - Best Songs Of E V A N E S C E N C E Playlist 2021 Parker Michael Parker Michael • 624K views 5 months ago 57:12 Now playing The Pretty Reckless best lives compilation Marianne Audouin Marianne Audouin • 260K views 8 months ago 1:06:00 Now playing Chevelle - Stray Arrows: A Collection of Favorites (Full Album) Ita Depeeza Ita Depeeza • 345K views 1 year ago 12:02 Now playing The Pretty Reckless FULL acoustic session in Paris Marianne Audouin Marianne Audouin • 705K views 3 years ago 53:38 Now playing Guardians of the Galaxy Awesome Mix Vol 1 Vol 2 Alternitive Videos Alternitive Videos • 1.8M views 1 year ago 1:27:39 Now playing 2000's Rock Songs Mix 🎸 Best Rock Hits of the 2000's Playlist Redlist - Rock Mixes Redlist - Rock Mixes • 913K views 1 month ago 2:27:10 Now playing Motley Crue greatest hits full songs \m/ soulsickk\m/ soulsickk\m/ • 2M views 3 years ago Show more Experiencing interruptions? Find out why NaN / undefined

      SKIP NAVIGATION

      9+

      Avatar image

      0:01 / 4:26

      #ThePrettyReckless #DeathByRockAndRoll

      The Pretty Reckless - And So It Went [feat. Tom Morello] (Official Lyric Video)

      638,635 views

      Jan 11, 2021

      20K277SHARESAVE

      The Pretty Reckless

      1.49M subscribers

      SUBSCRIBED

      The Pretty Reckless - And So It Went [feat. Tom Morello] (Official Lyric Video) From the upcoming album 'Death By Rock And Roll' | Available February 12, 2021: http://found.ee/dbrr Subscribe to The Pretty Reckless on YouTube: https://found.ee/tpr_subscribeyt Photo by: Rob Fenn Pre-order/Pre-Save the album 'Death By Rock And Roll': iTunes: http://found.ee/dbrr_it Apple Music: http://found.ee/dbrr_am Spotify: http://found.ee/dbrr_sp Amazon Music: http://found.ee/dbrr_amzm Amazon: http://found.ee/dbrr_amz YouTube Music: http://found.ee/dbrr_ytm Stay connected with The Pretty Reckless Website: https://deathbyrockandroll.com/ Facebook: https://found.ee/tpr_facebook Twitter: https://found.ee/tpr_twitter Instagram: https://found.ee/tpr_instagram LYRICS And so it went the children lost their minds Begging for forgiveness was such a waste of time And the bullets start to fly And the bough's about to break When you hear them cry It's too much for me to take The world does not belong to you You are not the king I am not the fool They said the world does not belong to you It don't belong to you It belongs to me And, so it went the children lost their minds Crawling over bodies of those who gave their lives And the fists begin to throw And the fire starts to blaze Don't you think they know They're the fucking human race The world does not belong to you You are not the king, I am not the fool They said the world does not belong to you It don't belong to you It belongs to Everyone is crying out, I can hear them scream With all these eyes upon us but no one seems to see That you and me are just the same as god meant it to be But you're much too close to me. You're much too close to me So it went The children lost their minds Nowhere to run, nowhere to hide And the wind begins to howl And the wolf is at your door You have so much of everything But still you wanted more They said the world does not belong to you You are not the king, I am not the fool They said the world does not belong to you It don't belong to you It belongs to me #ThePrettyReckless #DeathByRockAndRoll

      SHOW MORE

      The Pretty Reckless - On Tour

      OCT

      3

      Upcoming show - Nashville, TN

      Sun 7:00 PM - The Cowan

      Ticketmaster

      VIEW TICKETS 

      Buy The Pretty Reckless merchandise

      [

      The Pretty Reckless - Death By Rock And Roll Coke Bottle Clear Vinyl

      $28.00

      Merchbar

      ](https://www.youtube.com/redirect?event=product_shelf&redir_token=QUFFLUhqbWpfbjNWZ1d0eXhXV0I1a2Y2UEd0NGwtc0V3UXxBQ3Jtc0trQ0hnNm85MktDcks1MW9IYTZseGpoRHJYNkRxSWdZZUxZclZkUXpzb09HbDJNa2xkT3dFUnJSWURqX2Q5VmE0U2E4RUh3T0dQcVppeEYwQ0Y4SVNQWmZ4VFVmd0Nzb3lDRjBjYmRwMFN2UkFWdWJycw&q=https%3A%2F%2Fwww.merchbar.com%2Fhard-rock-metal%2Fthe-pretty-reckless%2Fthe-pretty-reckless-death-by-rock-and-roll-coke-bottle-clear-vinyl%3Fucc%3DUS%26utm_term%3DUCrrhJmM3H7eyxrevc4lpRIw%26utm_medium%3Dproduct_shelf%26utm_source%3Dyoutube%26utm_content%3DYT-ACRcEUr70GaLvLpWdb-jwGYTdm5aFCJNFojnyaIECNbjjwKI8jU72BoLJ12L7jH6MU5EGFwxipDrzN0NpDpElIKGxFWXGREW6018kt6-ufDHVIUy2Mh8lkTGEpCRSHUyNw-hqfR1DDvQU4U0qp-C1ZG2UBSeoKHNMVEgpLUlvKnjcPhdwz0ieiBM9eTvKJp_moCQPETZl6zHKPQYB2rBbhPL8up4yIWMlrU%253D&v=0MpJv8DW4_U) [

      The Pretty Reckless - Skull-Cycle Hoodie

      $60.00

      Merchbar

      ](https://www.youtube.com/redirect?event=product_shelf&redir_token=QUFFLUhqbUQ3TG80TzJWczM2bnMzZWRCcXh6U2RhX1VjZ3xBQ3Jtc0tsVk1KOUYtTjNBLTZpVkEyU2ZIUEx1YUhEdUlIVV9vX3k3ck1fQmlJVUpfODRXdTh5dWxJbW5aLTNFclhseEFuV1Y0b211R013bURvNmMyZ01LbE4ycjBINVptalJzX2t6Wkd4STJGVlJiUGZpTDNERQ&q=https%3A%2F%2Fwww.merchbar.com%2Fhard-rock-metal%2Fthe-pretty-reckless%2Fthe-pretty-reckless-skull-cycle-hoodie%3Fv%3D1273424%26ucc%3DUS%26utm_term%3DUCrrhJmM3H7eyxrevc4lpRIw%26utm_medium%3Dproduct_shelf%26utm_source%3Dyoutube%26utm_content%3DYT-ACRcEUrDJLp45ux0a5dMJ0016ICyrRgV97AmOfXg2ju4mSw4wkzsV1E3Z6uvmNk1H0HnluO_vHPfH4EfRup20jes5zjP1yM9QZxLVaz9ClEzm3IafKMMPGNO6Genrsqu7gtpNn-Hr0GhxbM1AevJ7BPiBg4Thg998W-ekyqCf447Azl5EpVMLnOSOSKYXt2fPVyxTBlVQLzkvVV7DmukIOPaw7JAU8e1odV0E7fPpnA%253D&v=0MpJv8DW4_U) [

      The Pretty Reckless - Harley Long Sleeve

      $45.00

      Merchbar

      ](https://www.youtube.com/redirect?event=product_shelf&redir_token=QUFFLUhqa1dCUGtkZEtZdzhrSFZIWkRpZVo5XzZYdzBQd3xBQ3Jtc0tua3pFZ2RpNVl6aTVMdDYydWtjYmY3U1pEZmlFc3lRWS1NOHFkWHRUd2dLSjVmZGhoZEd4N0tiV2Z3VnUzMHNxa1JyeGVpREdIcV80RzFEb2F2NjJpREtNekFwYTVCdlJmU0tLVXIwUDVPX1RvYlBucw&q=https%3A%2F%2Fwww.merchbar.com%2Fhard-rock-metal%2Fthe-pretty-reckless%2Fthe-pretty-reckless-harley-long-sleeve%3Fv%3D1273416%26ucc%3DUS%26utm_term%3DUCrrhJmM3H7eyxrevc4lpRIw%26utm_medium%3Dproduct_shelf%26utm_source%3Dyoutube%26utm_content%3DYT-ACRcEUoAlSm71DWkwiqiKv3-Tshusvr18ByCwU2l03YABaFIh0wv3kBlNSXXDSabdZgekxDaVxQdTKeg131ld61IykjLodPHlRqnABsJCpYFv57fFbcxyvMYpNMo3NzrFi_Af3mvtDkOUeQRyqdKYkE36ezRxqkG9VygCNFfgiPJOoL8UonACBVY8WJO356BiAaP9zePitV0l4c0SKmhKrARs4TofQ4MLLO_ab1_lUY%253D&v=0MpJv8DW4_U) [

      The Pretty Reckless - Death By Rock And Roll Black Vinyl

      $28.00

      Merchbar

      ](https://www.youtube.com/redirect?event=product_shelf&redir_token=QUFFLUhqbUF0b09QaFlGTDBrSEtpam1oTERvdDhkblpRd3xBQ3Jtc0tuSk5Oa182dENZdFRJOEg0Nk9UY0dtRl9vR0p1R2tqMHBfWjJxNTBjT2QxNmtwcFROYVBQbDlXLUNFejdoLWprT2NWSjJVOEx1eGFWUzdTcGlNSnBIZlFoNUJfUzFjTGVLMHB4Yi1WdHFNNW5TQU9GMA&q=https%3A%2F%2Fwww.merchbar.com%2Fhard-rock-metal%2Fthe-pretty-reckless%2Fthe-pretty-reckless-death-by-rock-and-roll-black-vinyl%3Fucc%3DUS%26utm_term%3DUCrrhJmM3H7eyxrevc4lpRIw%26utm_medium%3Dproduct_shelf%26utm_source%3Dyoutube%26utm_content%3DYT-ACRcEUovlrxz0nQ4InXCWvl-mbuMilggo1lYilIUQWvAfwCWeeyWwMh95m-WEJZ9hhbK8q4ovFBWFJQVs0mh-1RNGRFIqTO9DgGoBX6tEeliATAa6ycUeaP84L51uo1mKkWsLcdbpJ0kZAduNYCM8ULbuqx_q04uSvSqybI4LLGIh4Q7_QQGQtYEXtCRGbhmzq14vEiM3zIW-fuHnqIDHHaWcVuWpug0ox1ePRZf&v=0MpJv8DW4_U) [

      The Pretty Reckless LIGHT ME UP CD

      $14.84

      Merchbar

      ](https://www.youtube.com/redirect?event=product_shelf&redir_token=QUFFLUhqa21WR2lJNGhWLXhnc21CTTZSTk5jRG5uODlFd3xBQ3Jtc0trc1FKZi1iZ0M5V3R0eVFOdVQ1ZnhkQ0s1VkVEXzR4S214RVNlRUVmM0FmVHFxdjZ5ZXpDRy1TWDhEN1ZaUGIxWkpHcWM5MXZ1ak9acHdNallvVVZhWm9zeHhHa0VmY1R5ejlyQkxPOGtDaHVQZkRVZw&q=https%3A%2F%2Fwww.merchbar.com%2Fhard-rock-metal%2Fthe-pretty-reckless%2Fpretty-reckless-light-me-up-cd%3Fucc%3DUS%26utm_term%3DUCrrhJmM3H7eyxrevc4lpRIw%26utm_medium%3Dproduct_shelf%26utm_source%3Dyoutube%26utm_content%3DYT-ACRcEUo19Va_nceiNgLL_XOuiRVxi_E4tLq3xIYvnCIgV8ui9sPYX0ayGIYd1Ez-UIz2Q3RIqr2kx206Y1dHxDLjRmjJQUuEzo-EfKlU8x1CjvUZE3FoIzxiRmGWHzMF2xiOhEasZphPFAH780jN21NmOkUfakQ6t4szQ0tVWepj1AxTfxv8E8hOSyNTUSEtkzTrxjv69LjSdKt_GzRwZmKsashY2_I%253D&v=0MpJv8DW4_U)

      802 Comments

      SORT BY

      Adam Dobrin

      Add a public comment...

      Adam Dobrin

      Adam Dobrin

      1 second ago

      VERBATIM: "stop hurting people, stop hurting people. if you do not believe people should be hurt" ... ((ishing)) ~either disable the attackers or leave the area~

      1

      REPLY

      Adam Dobrin

      Adam Dobrin

      1 second ago

      sometimes words have two meanings. sometimes "kill" means "excorcized the demons" @LOUDON @SALEM

      REPLY

      Adam Dobrin

      CANCELREPLY

      Raziel Moreno

      Raziel Moreno

      7 months ago

      I'm so damn happy she chose to rock instead of staying as an actress.

      426

      REPLY

      View 3 replies

      sonya fuks

      sonya fuks

      7 months ago

      How is it possible they just keep on getting better and better with each album?!?!

      341

      REPLY

      View 12 replies

      Graziano D'Ovidio

      Graziano D'Ovidio

      7 months ago

      I love her voice. So much.

      652

      REPLY

      View 18 replies

      Natsumi Ikari

      Natsumi Ikari

      7 months ago

      Most of people know Tom Morello from RATM of Prophets of Rage, but don't forget he was also the guitarist of Audioslave, whose singer was Chris Cornell. Must be emotional for Taylor to record this song him. I'm sure Chris is proud of them both.

      631

      REPLY

      View 28 replies

      Matstrr Salazar

      Matstrr Salazar

      7 months ago

      This is why TPR is one of my all time favorite bands. They release bad-ass singles and albums

      322

      REPLY

      View reply

      Mike Whicker

      7 months ago

      The entire death by rock n roll album is first on my list of reasons why 2021 will not stink as much as last year did!

      180

      REPLY

      View 5 replies

      Mackenzie Dinkins

      5 months ago

      A good theme for this year's Elimination Chamber event.

      25

      REPLY

      View 2 replies

      FDCAnselmo

      7 months ago

      The Anthem of Youth... Fuck me what a tune.

      159

      REPLY

      View 8 replies

      norangutan

      7 months ago

      I love how obviously this song draws inspiration from Audioslave's sound, and in a way it's paying homage to chris cornell especially with the "you have so much of everything still you wanted more" line - which is almost the same as a line from Audioslave's 'What you are'

      165

      REPLY

      View 7 replies

      Shiteyanyo

      7 months ago

      Tom Morello?! Each single they release for this album just keeps getting better and better

      55

      REPLY

      View reply

      Lau pin

      7 months ago

      I fell in love with every song they released

      90

      REPLY

      View 4 replies

      🕸️Haley Munster🕷️

      7 months ago

      I really hope this album has more Going to Hell vibes 🥺🥺🥺🖤

      201

      REPLY

      View 22 replies

      M K

      7 months ago

      Great tune and a good example for why I think TPR is among the best of the current rock bands. They always have catchy riffs and melodies however, especially in the bridge, you never know which turn the song is gonna take. Yet it always fits the song and never sounds out of place.

      15

      REPLY

      Mike B

      7 months ago

      Underrated band

      154

      REPLY

      View 8 replies

      🕸️Haley Munster🕷️

      7 months ago

      THIS is the Pretty Reckless I love !! 💙

      45

      REPLY

      View reply

      pedrogoularth

      7 months ago

      This is SO fucking good! 💖 🇧🇷

      31

      REPLY

      Nick Brick

      7 months ago

      Kicking the year off with a banger. Looking forward to the album!

      19

      REPLY

      Jo Mill Hyde

      6 months ago

      The fucking power in her voice and this song

      49

      REPLY

      View reply

      HT82 Smash

      6 months ago

      WWE main roster stepping up their game with these theme songs. Elimination Chamber 2021 everyone

      17

      REPLY

      View 2 replies

      Hide

      7 months ago

      The addition of the children's choir killed and buried me. 😔🤘🏾

      80

      REPLY

      View 5 replies

      Try Google Fi

      Ad

       fi.google.com

      SIGN UP

      AllThe Pretty RecklessPop MusicRelatedWatched

      [

      1:16:03NOW PLAYING

      ](https://www.youtube.com/watch?v=3DMM87DMF5o)

      [

      S.O.A.D Greatest Hits 2021 - S.O.A.D Best Songs Playlist

      HARD ROCK COLLECTION

      552K views6 months ago

      ](https://www.youtube.com/watch?v=3DMM87DMF5o)

      [

      1:57:34NOW PLAYING

      ](https://www.youtube.com/watch?v=uetFO7y8WPA&t=96s)

      [

      Corey Taylor - Live in London (Full Show)

      Corey Taylor

      12M views3 years ago

      ](https://www.youtube.com/watch?v=uetFO7y8WPA&t=96s)

      [

      109NOW PLAYING

      ](https://www.youtube.com/watch?v=0J2QdDbelmY&list=RDCLAK5uy_l3PeyHeqJh1dR78WjfsMJwRHJx9ofMvvc&start_radio=1&rv=0MpJv8DW4_U)

      [

      Modern Rock Hits

      YouTube Music

      ](https://www.youtube.com/watch?v=0J2QdDbelmY&list=RDCLAK5uy_l3PeyHeqJh1dR78WjfsMJwRHJx9ofMvvc&start_radio=1&rv=0MpJv8DW4_U)

      [

      1:47:37NOW PLAYING

      ](https://www.youtube.com/watch?v=26nsBfLXwSQ)

      [

      Best of 90s Rock - 90s Rock Music Hits - Greatest 90s Rock songs

      Memory Music Boxx.

      19M views2 years ago

      ](https://www.youtube.com/watch?v=26nsBfLXwSQ)

      [

      49:52NOW PLAYING

      ](https://www.youtube.com/watch?v=zZyWxbojYH8)

      [

      Fall Out Boy - The Young Blood Chronicles (Uncut Longform Video)

      Fall Out Boy

      19M views7 years ago

      ](https://www.youtube.com/watch?v=zZyWxbojYH8)

      [

      2:54:28NOW PLAYING

      ](https://www.youtube.com/watch?v=KLcqekMfO9E&t=5188s)

      [

      Dave Matthews Band - 07/23/2021 {Full Show | 4K} Walnut Creek Amphitheatre - Raleigh, NC

      Andy Mendoza

      42K views3 weeks ago

      ](https://www.youtube.com/watch?v=KLcqekMfO9E&t=5188s)

      [

      1:01:46NOW PLAYING

      ](https://www.youtube.com/watch?v=MRQt5h-IDCg)

      [

      Sheryl Crow - Outlaw Music Festival - Live in Milwaukee, WI (Summerfest 2017)

      Crow Archives

      1M views4 years ago

      ](https://www.youtube.com/watch?v=MRQt5h-IDCg)

      [

      54:10NOW PLAYING

      ](https://www.youtube.com/watch?v=ofil7qB842o&t=1106s)

      [

      Evanescence - Acoustic Songs

      Evanescence vds

      2.7M views5 years ago

      ](https://www.youtube.com/watch?v=ofil7qB842o&t=1106s)

      [

      1:16:16NOW PLAYING

      ](https://www.youtube.com/watch?v=u7zuAlf5FBo)

      [

      Halestorm - Live from Download UK 2019

      Halestorm

      685K views1 year ago

      ](https://www.youtube.com/watch?v=u7zuAlf5FBo)

      [

      1:18:34NOW PLAYING

      ](https://www.youtube.com/watch?v=zYON-KnG3rM)

      [

      Shinedown - The Sound Of Madness (Deluxe Edition) (Full Album)

      The Rock Cafe

      212K views4 months ago

      ](https://www.youtube.com/watch?v=zYON-KnG3rM)

      [

      45:56NOW PLAYING

      ](https://www.youtube.com/watch?v=X3-Bw31fI2s)

      [

      Linkin Park - Hybrid Theory [Full Album] 2000

      quake

      5M views9 months ago

      ](https://www.youtube.com/watch?v=X3-Bw31fI2s)

      [

      46:50NOW PLAYING

      ](https://www.youtube.com/watch?v=GUa_AJa7uOc)

      [

      Matchbox Twenty - Yourself or Someone Like You (Full Album)

      Gal Godonut

      1.4M views1 year ago

      ](https://www.youtube.com/watch?v=GUa_AJa7uOc)

      [

      2:24:37NOW PLAYING

      ](https://www.youtube.com/watch?v=v5Z1LyfgAuo&t=7102s)

      [

      E̲v̲a̲n̲escence - The Bitter Truth (Deluxe) [Full Album]

      Santiago Lee

      222K views4 months ago

      ](https://www.youtube.com/watch?v=v5Z1LyfgAuo&t=7102s)

      [

      1:22:20NOW PLAYING

      ](https://www.youtube.com/watch?v=bf4tLUMxrDc)

      [

      E V A N E S C E N C E Greatest Hits Full Album - Best Songs Of E V A N E S C E N C E Playlist 2021

      Parker Michael

      624K views5 months ago

      ](https://www.youtube.com/watch?v=bf4tLUMxrDc)

      [

      57:12NOW PLAYING

      ](https://www.youtube.com/watch?v=TgJnsOy_EeM&t=452s)

      [

      The Pretty Reckless best lives compilation

      Marianne Audouin

      260K views8 months ago

      ](https://www.youtube.com/watch?v=TgJnsOy_EeM&t=452s)

      [

      1:06:00NOW PLAYING

      ](https://www.youtube.com/watch?v=nWjN8SA5KEU)

      [

      Chevelle - Stray Arrows: A Collection of Favorites (Full Album)

      Ita Depeeza

      345K views1 year ago

      ](https://www.youtube.com/watch?v=nWjN8SA5KEU)

      [

      12:02NOW PLAYING

      ](https://www.youtube.com/watch?v=76lIGyw7zL8)

      [

      The Pretty Reckless FULL acoustic session in Paris

      Marianne Audouin

      705K views3 years ago

      ](https://www.youtube.com/watch?v=76lIGyw7zL8)

      [

      53:38NOW PLAYING

      ](https://www.youtube.com/watch?v=OYMLjqoYhdw)

      [

      Guardians of the Galaxy Awesome Mix Vol 1 Vol 2

      Alternitive Videos

      1.8M views1 year ago

      ](https://www.youtube.com/watch?v=OYMLjqoYhdw)

      [

      2000's Rock Songs Mix 🎸 Best Rock Hits of the 2000's Playlist

      Redlist - Rock Mixes

      913K views1 month ago

      ](https://www.youtube.com/watch?v=_3A4JW9RQWM)

      [

      Motley Crue greatest hits full songs \m/

      soulsickk\m/

      2M views3 years ago

      ](https://www.youtube.com/watch?v=3-FQCgVZG5o)

    1. I am beginning to think that the significant difference is that with songlines, learning is always done in the physical ‘memory palace’ which is constantly revisited. It can be recalled from memory, but is encoded in place. For me, that is way more effective, but I have aphantasia and very poor visualisation, so it may not be as big a factor for others. So recalling your childhood home can be a memory palace, but not a songline.

      Lynne Kelly is correct here that we need better delineations of the words we're using here.

      To some of us, we're taking historical methods and expanding them into larger super sets based on our personal experiences. I've read enough of Kelly's work and her personal experiences on her website (and that of many others) that I better understand the shorthand she uses when she describes pieces.

      Even in the literature throughout the middle ages and the Renaissance we see this same sort of picking and choosing of methods in descriptions of various texts. Some will choose to focus on one or two keys, which seemed to work for them, but they'd leave out the others which means that subsequent generations would miss out on the lost bits and pieces.

      Having a larger superset of methods to choose from as well as encouraging further explorations is certainly desired.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      This manuscript by Gulyrutlu and co-workers addresses the role of CUG expanded repeat RNA associated with DM1 in regulating the formation of higher order RNP assemblies such as stress granules and P-bodies in the cell. The authors used lens epithelial cells (hLECs) derived from a DM1 patient

      We used cell lines from several patients and age-matched controls to avoid effects of individual cell-line variation. We will make sure that this is clear in the text.

      or a HeLa cell inducible model of DM1 to investigate whether expression of the CUG repeat-associated protein MBNL1 and CUGBP1 affected the formation and dispersal of stress granules and P-bodies. The authors show that MBNL1 and CUGBP1 are components of SGs and PBs in hLECs and HeLa cells. In cells expressing the CUG repeat, there are minor alterations in the dispersal of stress granules as well as in the formation of P-bodies.

      The alterations in the formation and dispersal of stress granules are not minor. For example, in the HeLa cell model, stress granules take more than twice as long to form in cell expressing the CUGexp repeats associated with DM1 and disperse in half the time. These data are already in the results section, but we will highlight them in a revision and have included an additional representation of the data to the figure, using graphs of ‘proportion of cells with stress granules’ against time. The changes we see are as large, or larger, then results published elsewhere (see appendix below)

      MBNL1 could affect the formation and dispersal of SGs independent of the CUG repeat.

      In fact, we present data in HeLa cells with MBNL1 almost completely removed by shRNA revealing that this has a much smaller effect on stress granules than does the expression of CUGexp RNA. This is an important point, as it is widely assumed that most of the cellular defects in DM1 are caused by the ‘sequestration’ of MBNL1 in the CUGexp foci. Since only . This is not what our data show. In the hexanediol experiments, both cell lines over-express MBNL1 in similar amounts. The difference between them is that one cell line expresses a DMPK1 mini-gene with a CUG expansion and the other expresses a mini-gene without the expansion. Again, our results show that the alteration to P-body responses to 1,6-hexanediol can be attributed to the presence of the CUGexp RNA, rather than altered levels of MBNL1. We will revise the results and discussion to further emphasise this point.

      Finally, in HeLa cells, overexpression of MBNL1 can reduce the dispersal of P-bodies upon 1,6-hexanediol treatment.

      This is not what our data show. In the hexanediol experiments, both cell lines over-express MBNL1 in similar amounts. The difference between them is that one cell line expresses a DMPK1 mini-gene with a CUG expansion and the other expresses a mini-gene without the expansion. Again, our results show that the alteration to P-body responses to 1,6-hexanediol can be attributed to the presence of the CUGexp RNA, rather than altered levels of MBNL1. We will revise the results and discussion to further emphasise this point.

      Major comments:

      One limitation of the work is that the perturbations seen with stress granules or P-bodies are all relatively small, and no evidence for a functional consequence on gene expression is demonstrated. Specifically, the authors observe only minor alterations in the formation or disaggregation of PBs and SGs in these DM1 models. Further, some of the effects observed are independent of the CUG repeat expression, suggesting that MBNL1 and CUGBP1 might have independent roles in modulating some properties of SG and PB formation or dispersal.

      As above, the changes we see in SG formation and dispersal are not small. There are already numerous studies of the effects of DM1 on gene expression and mRNA splicing. This is not what we set out to study: we are interested in perturbations to the organisation of cellular structures associated with the expression of the CUGexp repeat RNA characteristic of DM1. We do show some data relating specifically to the proteins MBNL1 and CUGBP1 in the paper. shRNA resulting in almost complete loss of these proteins has much smaller effects that the expression of CUGexp RNA, suggesting that the major part of the effects caused by expression of the CUGexp RNA is not mediated through changes in MBNL1 or CUGBP1 levels. MBNL1 and CUGBP1 levels may well contribute to alterations in SG dynamics, but our data suggest that they are minor contributors. This is an advance in our current knowledge

      1. The authors could investigate whether the CUG repeat RNA itself is localized to SGs or PBs in their models, and whether the presence of the repeat RNA is absolutely necessary for regulating the dynamics of SG or PB formation.

      We have now done this. The CUG repeat RNA is not localised in stress granules or PBs to a detectable extent. This suggests that the effect we see on these structures by expression of the expanded RNA occurs despite the absence of the RNA from the structures. This is similar to the effects of the ALS-associated paraspeckle protein FUS, which can affect the integrity of nuclear LLPS structures (gems) despite not co-localising with them https://doi.org/10.1016/j.celrep.2012.08.025 We have added these data to the manuscript, as part of draft figure 8, and will add text emphasising this as an additional example of disease-causing macromolecules affecting the structure of LLPS domains in which they are not found.

      1. The authors use 1,6-hexanediol to suggests that PBs and SGs in HeLa cells show behavior analogous to LLPS. However, the use of 1,6,-hexanediol to establish an assembly as a LLPS is a relatively limited analysis (despite its widespread use in the field), since this compound can affect the formation of multiple cellular substructures that are not always LLPS (for example, see Wheeler et al, 2016, eLife).

      We are aware of and have cited this publication. Our comments about LLPS structures are measured, as there is still controversy about how to definitively identify them in cells. SGs and PBs have, however, previously been widely published to be formed by LLPS. The rapid exchange of SG and PB components during FRAP and the ability of SGs to both fuse and bud (seen in our supplementary movies) are also supportive of these structures behaving as LLPS structures in our models. Wheeler et showed that, in yeast, the nuclear pore complex and some cytoskeletal structures were affected by 1,6-hexanediol but membrane-bound structures such as the ER and mitochondria were not. The disruption of the nuclear pore complex is not unexpected, since phase separation is involved in cargo shuttling through the NPC (reviewed in https://doi.org/10.1016/j.devcel.2020.06.033). We will revise our discussion to make it more clear that we are not relying only on the use of 1,6-hexanediol to define SGs and PBs as LLPS structures but also on other aspects of their dynamic behaviour and on extensive prior literature.

      Significance

      This study would be of interest to the field if the impact of the DM! repeat RNAs on PB and SG were more substantial...

      As above, the effects we see on SG formation and loss are substantial. Tissue types affected in DM1 are prone to stress, particularly the lens of the eye, so alterations to cellular response to stress associated with the presence of CUG repeats are of key importance to understanding the cellular pathology of DM1.

      ...and if some functional consequences were demonstrated.

      In terms of function, we show altered responses to stress caused by the expression of CUGexp RNA and probably mediated through alterations in the propensity of LLPS cytoplasmic structures (SGs and PBs) to form and be resolved. Additionally, we can now show that SGs in HeLa cells expressing CUGexp RNA contain less total polyA RNA than is seen in controls, and that ‘docking’ events between SGs and PBs are compromised in cells with CUGexp RNA. These docking events are proposed to mediate transfer of RNA from SGs to PBs (reviewed in https://doi.org/10.1007/978-1-4614-5107-5_12). These new data demonstrate functional impairment of SGs and PBs associated with DM1. We have included this as an additional draft figure 8.

      The lack of a strong effect on SG or PB formation in the DM1 models, along with the CUG repeat-independent effect of MBNL1 on the formation and dispersal of these complexes, argues that MBNL1/CUGBP1 may not significantly affect the formation or dispersal of SGs and PBs.

      We are actually not arguing that MBNL1 and CUGBP1 are the main effectors in the changes we see to SGs and PBs, but that the CUGexp RNA is the key player, so are a little confused by this comment.


      Reviewer #2:

      In the current study, the authors compared the dynamics of P-bodies (PBs) and stress granules (SGs) between control and several DM1 cell lines. They found that MBNL1 and CUGBP1, two CUG repeat RNA-binding proteins that are primarily nuclear, could also co-localize with PBs in the cytoplasm and re-localize to SGs under stress. Small differences were observed in SG assembly and disassembly dynamics between control and DM1 HLECs, between HeLa cells expressing either CTG12 or CTG960, and between HeLa cells with and without shRNAs targeting CUGBP1 or MBNL1.

      As detailed above, the alterations in SG assembly and disassembly in cells expressing CUGexp RNA are not small, in contrast to those in cells will lowered expression of MBNL1 and CUGBP1, which are much smaller suggesting that the changes caused by CUGexp RNA largely do not result from loss of MBNL1 (or CUGBP1). We have inserted additional graphs of ‘proportion of cells with stress granules’ against time' and will modify the text to emphasise both of these points.

      Overall, the experiments were clearly described and the results properly presented. However, critical controls, as detailed below, are missing in multiple analyses. The mechanisms underlying these apparent differences are also unknown.

      We do not consider that any ‘critical controls’ are missing, but can supply all of the additional analysis of our data that the reviewer requests below. We can also now provide additional mechanistic insight and will add an additional figure showing lowered amount of polyA RNA in stress granules in cells expressing CUGexp RNA and compromised docking events between stress granules and P-bodies, suggesting impaired communication between them.

      Major concerns:

      1. Throughout the study, the authors compared MBNL1 and CUGBP1 association with PBs and SGs without considering the potential differences in their cytoplasmic abundance between control and DM1 cell lines, which seems to be case for MBNL1 abundance in CTG960-expressing HeLa cells (Fig. 3). Provided that PBs and SGs exchange components with the cytosol at an equilibrium, if the cytoplasmic abundance of, for example, MBNL1 is decreased in DM1, one would expect the equilibrium being shifted resulting in less MBNL1 associated with PB/SG. Therefore, before measuring the association or the assembly/disassembly kinetics of PB and SG, the authors should first test whether MBNL1 and CUGBP1 abundance may be different between control and DM cell lines.

      There is, in fact, no difference in the relative cytoplasmic abundance of GFP-MBNL1 between CTG12 and CTG960- expressing HeLa cells. Each has approximately a 50/50 split between nucleus and cytoplasm, with <3% of nuclear GFP-MBNL1 found in nuclear CUGexp foci when they are present. We have added a graph demonstrating this to the supplementary data. The abundance of total endogenous MBNL1 is also not altered in DM1 patient-derived lens cell lines compared to controls, as shown by semi-quantitative western blot analysis, which we have also added to the supplementary data. However, if the expression of CUGexp RNA did cause a major loss of cytoplasmic MBNL1, this change would be reflective of the situation seen in DM1 and would not invalidate our results or conclusions.

      The same caveat applies to MBNL1/CUGBP1 knockdown experiments, where knocking down one may change the abundance of the other.

      To carry out FRAP experiments or live cell analysis of SG formation and loss, it is necessary to over-express a tagged version of the protein being studied. For the knockdown experiments shown in figure 6, therefore, when we knocked down MBNL1, CUGBP1 was present in excess as a GFP-tagged protein and when we knocked down CUGBP1, MBNL1 was present in excess as a GFP-tagged protein. Thus, any effects of the knockdowns on expression of the endogenous proteins being analysed would be highly unlikely to influence the results.

      1. Similarly, the authors did not consider the possibility that changes in SG/PB dynamics may be due to changes in the abundance/availability of essential SG/PB components such as GE1 and G3BP1.

      From our immunofluorescence experiments, there was certainly no obvious reduction in GE1 or TIA1 abundance (we did not assess G3BP1). We have quantitative proteomic analysis (unpublished) from a similar pair of cell lines, expressing CUGexp RNA alongside GFP rather than GFP-MBNL1. This shows no change in GE1 or G3BP1, so we would not expect to see any here either. We can easily carry out a quantitative western blot analysis to confirm and will add this to the supplementary data

      1. Most of the observed differences between control and DM cell lines were modest, leaving one wonder whether it could be simply due to cell line-to-cell line variability. Whenever possible, the authors should present results for each individual lines. For example, in Fig.2, 3 DM1 lines and 2 control lines were used. Was the difference in SG disassembly (Fig. 2B) observed in each of the 3 lines?

      Some of the alterations were modest and there is cell line-to-cell line variability in the lens cell lines. This is why we pooled the data: on average, DM1 cells disperse their SGs more quickly than control cell lines do on average. This is not an unusual way to present data from patient cell lines of diverse genetic background. We have added data for stress granule loss in the individual cell lines to the supplementary data. These data show a consistent trend towards quicker dispersal of stress granules in patient cell lines. The variability between the patient lens cell lines was also the primary reason for us to develop the inducible system in HeLa cells, on a fixed genetic background, as explained in the manuscript.

      Minor points:

      1. Western blot in Fig. 3 shows two protein products from both endogenous and overexpressed MBNL1. Please explain.

      Many of the commercially available anti-MBNL1 antibodies show this double-band in some cell lines as evidenced in numerous publications and on manufacturers’ websites (for example https://abclonal.com/catalog-antibodies/MBNL1RabbitmAb/A5149, https://www.ptglab.com/products/MBNL1-Antibody-66837-1-Ig.htm). We haven’t analysed the two bands in detail, but assume this to be a result of a post translational modification of some sort. Since GFP-MBNL1 and endogenous MBNL1 show the same thing, we do not consider it to be a major concern. We do mention the double-band as ‘characteristic’ in the figure legend for figure 3 so are not seeking to conceal anything here.

      1. No data were shown to substantiate the statement that "MBNL1 localises to CUGexp foci and CUGBP1 does not" (page 6).

      This has been published many times and is shown in figure 1A. However, we will add in a citation for this and have added an additional supplementary figure showing the lack of co-localisation in the foci from figure 1A more clearly together with separate data confirming that MBNL1 and CUGexp RNA do not co-localise with CUGBP1 in the nuclei of line HeLa_CTG960_GFPMBNL1.

      1. The y-axis of Fig. 4D should not go beyond 1.

      We will trim the axis. There are no data points above 1.0, just the indicator of statistical significance

      Significance:

      The nature of the current study is highly descriptive with little mechanistic insights.

      Our work is not descriptive, as we observed a change in stress granules in patient cells, which we could then replicate (and enhance) in a novel inducible model of DM1 designed to abrogate the unavoidable variation in patient-derived cell lines. We also now have additional mechanistic insights (see above) and have added an additional figure (draft figure 8) detailing these.

      For the subtle differences observed between control and DM1 cell lines, it remains unclear whether it may be due to cell line-to-cell line variation (see above).

      We cannot completely rule out an influence of cell line-to-cell line variation in the patient-derived lens cell lines (see above), though we think this unlikely as we saw the same effect repeated and amplified in the inducible HeLa-derived cell model, which was designed to minimise this concern. Furthermore, for stress granule loss, we see a larger effect in the HeLa cell model after 72hrs of induction than after 24hrs (figure 5C). This argues strongly that the effects seen are due to the expression of CUGexp RNA and we will emphasise this point more strongly.

      Some difference appear to be specific to one model but not the others (e.g., SG formation is slower in HeLa-CTG960 cells but not in DM1 HLECs).

      Even for the observations that seem consistent between models, the current results yielded little novel biological insights into whether and how these subtle differences in PB/SG dynamics may relate to DM1 pathogenesis. Collectively, these weaknesses render the current study incremental at best.

      The key biological insight the results provide is that the presence of the CUGexp repeat RNA results in defects in LLPS structures that are largely separable from any sequestration of MBNL1 in nuclear foci. With many researchers attributing the cellular defects in DM1 simply to the loss of MBNL1 by sequestration into nuclear foci, both this separation of altered stress response from MBNL1 levels and the involvement of altered LLPS formation (evidenced by the changes in PB behaviour on 1,6-hexanediol treatment) are novel biological insights into the cellular pathology of DM1. Additionally, our results shift the emphasis from nuclear effects to those seen in the cytoplasm.

      In terms of specific DM1 pathogenesis, the eye lens is subject to constant repeated stress and is subject to continued growth throughout the life span, relying on lens epithelial cells as a stem cell pool. Epithelial cells are also vital to the homeostatic regulation of ions, growth factor and nutrient flow from the aqueous humor to the underlying fibre cells. Any alterations in the response of lens epithelial cells, in particular, to stress is highly relevant to the pathology of cataract seen in DM1. We will revise our discussion to emphasise these key points more strongly.


      Reviewer #3

      The manuscript entiled "Phase-separated stress granules and processing bodies are compromised in Myotonic Dystrophy Type 1" by Gulyurtlu et al., characterizes the composition and ydnamics of stress granules and P-bodies in two Myotonic Dystrophy Type 1 (DM1) cell models, human lens epithelial cells from DM1 patients and age-matched controls and HeLa_CTG12_GFPMBNL1 and HeLa_CTG960GFPMBNL1 cell lines. The manuscript is somewhat descriptive with lack of functional data and some discrepancies. For example, in the discussion section, the authors conclude that "MBNL1 appears to be absent from P-bodies in cells with CUGexp foci in their nuclei. This observation suggests that the role of MBNL1 in P-bodies may be disrupted by the presence of CUGexp RNA." Figure 4A shows that "P-bodies in the DM1 model line, HeLa_CTG960GFPMBNL1 do not contain detectable amounts of GFPMBNL1". However, Figure 4E shows similar levels of total cellular MBNL1 per PB between the control CTG12 and mutated CTG960 lines.

      There is no discrepancy here. The reviewer has misinterpreted our data. PBs in the HeLa CTG960 cell line do not contain detectable amounts of GFP-MBNL1 under normal growth conditions, as shown in figure 4A. The data shown in figure 4E concern arsenite-treated cells, where some PBs in the CTG960 line do contain detectable levels of GFP-MBNL1, but significantly less than in the control CTG12 cells. We will reword these sections to make sure this is clear.

      Most importantly, in Figure S3 the authors show that CUGexp foci are present in 1-2 % of the cells. The claim appears to be too strong for the data presented in the manuscript.

      This is not what is shown in figure S3. The reviewer has misinterpreted our data. Figure S3 shows that in cells from line CTG960, only 1-2% of the total nuclear GFP-MBNL1 signal is found in the CUGexp foci, despite the intensity of the signal within them. Virtually all of the cells from the CTG960 cell line contain CUGexp foci (>95%). We will add a statement to this effect into the results section. We would not have continued working with a cell line in which only 1-2% of cells showed the DM1 phenotype of nuclear CUGexp RNA foci.

      Although the findings are interesting and of potential impact for a better understanding of the implications of RNA-protein condensate dynamics in the pathogenesis of DM1, the work presented here is still descriptive and preliminary in my opinion. In summary, the conclusions are not so convincing and additional experiments are essential to support the authors claims.

      This reviewer seems to have misinterpreted several of our data sets, including the specific points above, leading to the assertion that our conclusions are not convincing.

      Several months of works will be required to consolidate data and reorganize and ameliorate the manuscript, including the way data are presented and quantified.

      We already have data with which to address the majority of the queries posed, so should be able to make the adjustments relatively quickly.

      Specific comments:

      "On removal of stress, clearance of stress granules is mediated largely by a form of autophagy." This statement is not correct since the majority of stress granules disassemble and are not targeted to autophagy; in healthy cells only 5 % (or less) of the total SGs tend to persist in presence of autophagy or lysosome inhibitors, while the vast majority disassembles. Please rephrase carefully.

      The degree and manner of dispersal of stress granules in healthy cells on removal of stress is not well understood, but is known to differ depending on the type and duration of the stress (DOI: 10.1126/science.abj2400). We do not yet know how this may be altered in DM1, however, compromised autophagy is implicated in cataract formation, which is of relevance to our study. We will re-phrase this section of the discussion carefully to reflect the complex situation.

      Figure 1: RNA-protein complexes have heterogeneous composition. In HLECs, do all PBs colocalize with MBNL1 and CUGBP1 or only a fraction of them?

      We do not routinely see PBs without MBNL1 or CUGBP1 in the HLECs, in contrast to the situation in the HeLa CTG960 line. We have data available in order to quantify this and will add the numbers to the text of the results.

      Figure 2: Stress granules and P-Bodies are known to touch each-other, a process referred to as a "kissing event". The authors have studied the mobility of GFP-MBNL1 inside these two types of assemblies. It would be important also to quantify the "kissing" events. Is this altered in DM1 cells?

      We couldn’t find reference in the literature to ‘kissing events’ between SGs and PBs, but found several references to ‘docking’ events. We have noticed such interactions between PBs and SGs in our models. We are currently quantifying this and our first experiments (one in the HeLa cell model and one comparing one of the patient-derived lens cell lines to a control) suggest that there is a change in the frequency and/or size of such interactions in the HeLa CTG960 cell line compared to the CTG12 control and in the DM1 lens cell line derived to control. If this holds true in our repeat experiments (currently in progress), this would also provide the mechanistic insight requested by reviewer 2. We have included this, together with our data showing a decrease in total polyA RNA in stress granules in HeLa cells expressing CUGexp RNA, as an additional draft figure 8.

      Figure 3: In HeLa cells overexpressing CTG960_GFPMBNL1, beside the accumulation of one bright CUGexp puncta, several intranuclear GFPMBNL1 protein foci are visible. This subcellular distribution is different from the one observed in the control HeLaCTG12 GFPMBNL1. Can the author describe what these intranuclear GFPMBNL1 protein foci are?

      In most cells expressing CUGexp RNA, several nuclear foci form (usually one large and several smaller) and all of them contain MBNL1 (or GFP-MBNL1 in the HeLa_CTG960_GFPMBNL1 cell line). Figure S3 shows object identification using MBNL1 in this cell line, with two clear foci detected as the reviewer points out. We have added an additional panel to supplementary figure 1 to confirm that the additional foci are also CUGexp RNA foci and will clarify in the text of the results that there is not a single focus of CUGexp RNA in each nucleus.

      Is GFPMBNL1 accumulating at the level of splicing speckles? Or paraspeckles? Or other types on intranuclear condensates such as e.g. PML nuclear bodies? The different intranuclear distribution of GFPMBNL1 should be better characterized.

      The sub-nuclear distribution of MBNL1 is, indeed, very complex. MBNL1 also sometimes co-localises to splicing speckles/interchromatin granule clusters as we have previously reported in lens epithelial cell lines (DOI: 10.1042/BJ20130870 ) . The details of differences in the nuclear distribution of MBNL1, beyond its accumulation in CUGexp RNA foci, in DM1 cells compared to controls is the subject of another manuscript we have in preparation but are beyond the scope of the current study.

      Moreover, the % of cells expressing CTG960_GFPMBNL1 and forming intranuclear CUGexp foci is only mentioned in the discussion (Figure S3); for clarity it should be reported in the main text when describing Figure 3.

      The number of cells forming nuclear CUGexp foci on expression of CTG960_GFP-MBNL1 is >95% and we will add this to the text of the results section.

      "Figure S2: Quantitation of GFPMBNL1 in P-bodies in HeLa cell model of DM1." The authors report in the legend "Some, but not all, of these P-bodies contain detectable amounts of GFPMBNL1". However, the figure only shows a representative image of cells without quantification. Quantitation should be provided.

      We have data available to provide this simple quantitation. Approximately 38% of PBs in arsenite-treated cells from line HeLa_CTG960_GFPMBNL1 contain detectable levels of GFPMBNL1 using a manually-assigned cut-off intensity. We will add this to the relevant figure legend (now figure S5). However, this method of analysis requires an intensity to be manually set above which GFP-MBNL1 signal is considered ‘detectable’. This is hugely subjective, and in our opinion, the automatically generated quantitative comparison of “% total cellular MBNL1 per P-body” as shown in figure 4E is a more experimentally robust way to demonstrate a small loss of MBNL1 from P-bodies in cells from line HeLa_CTG960_GFPMBNL1 treated with arsenite compared to the relevant control.

      The authors report "a subtle change in stress granule architecture associated with the presence of CUGexp RNA". This statement is not supported by experimental data and should be omitted.

      We will qualify this statement to make it clear that we are referring to a subtle alteration in the co-localisation between CUGBP1 and MBNL1 specifically in the SGs, as our experimental data shown in figure 4D clearly support that, showing a statistically significant increase in the Pearson’s co-efficient of cololcalisation between MBNL1 and CUGBP1 in cell containing CUGexp RNA compared to the relevant control (0.90+/-0.05 for CTG960; 0.87+/-0.07 for CTG12).

      Figure 4. MBNL1 and CUGBP1 co-localise in P-bodies. What is the % of colocalization?

      We’re not sure exactly what is being requested here or what biological question the reviewer is asking us to address. MBNL1 and CUGBP1 co-localise in virtually all PBs (except in the HeLa CTG960 line where MBNL1 is undetectable in PBs under normal growth conditions). Figure 4E shows that, in cells with PBs upregulated by sodium arsenite, the mean amount of total cellular MBNL1 per PB is 0.1%, so it will be similar in cells grown under normal conditions as the PB sizes are similar and they appear to be of similar brightness by immunofluorescence. Again, this would be straightforward to quantify with our existing data if this is, indeed what the reviewer is requesting, but we question the biological significance. We would be reluctant to derive a Pearson’s co-efficient for the degree of co-localisation between CUGBP1 and MBNL1 in P-bodies as the structures are too small in size for this to be meaningful within the limits of imaging capabilities. We could, however, provide this if this is a specific request.

      Figure 5: "Treatment with sodium arsenite was then carried out under time-lapse microscopy, with Z-stacks of images taken every 4 minutes until stress granule formation was clearly seen (Fig.5A). This revealed a pronounced delay in formation of stress granules in cells containing CUGexp foci (HeLa CTG960 GFPMBNL1, 36 min +/- 12) compared to those without (HeLa CTG12 GFPMBNL1, 15 min +/- 2) (Fig.5B)." Data representation in Figure 5 is unclear and the pronounced delay in stress granule formation is not appreciated. Since the authors performed a live imaging taking pictures every 4 minutes, it would me more informative to plot the data and show the assembly and disassembly kinetics over time for both control and CTG960_ GFPMBNL1 cell lines (similar to what shown in e.g. Gwon et al., Science 2021, Ubiquitination of G3BP1 mediates stress granule disassembly in a context-specific manner, Figure 2G).

      The bar graph in figure 5B shows that cells from the CTG960 line take more than twice as long to form SGs compared to controls and are lost in half the time, with the precise numbers given in the text. A simple bar graph seemed the clearest way to present this. However, we have plotted our existing data in a similar manner similar to that in the cited reference and added this to figure 5. These graphs clearly show that the differences we see are at least as great as in other published literature, including the reference given by the reviewer (see below).

      Concerning Figure 1, the authors report no difference in the kinetic of stress granule formation in HLECs. However, they only report data after 45 and 60 min of arsenite treatment; at these time-points the assembly step is maximal. Thus, for consistency, the authors should include earlier time-points to the analysis of stress granule assembly also in HLECs, similar to what done in HeLa cells in Figure 5.

      The assembly step is not ‘maximal’ in these cell lines after 45 minutes. Figure 2A clearly shows that only ~30% of cells have SGs after 45 minutes of treatment, compared with 100% of cells after 90 minutes shown in figure 2B. We have additional data at 10, 20 and 30 minutes all showing no significant differences. We had omitted them to keep the graph simple, but have now included them as a graph of ‘% of cells with stress granules against time’ in figure 2.

      "Having established that MBNL1 and CUGBP1 co-localise closely in stress granules": the authors investigated the colocalization of each of these two proteins with stress granule markers but they did not verify whether MBNL1 and CUGBP1 co-localise.

      In figure 1B we show that endogenous CUGBP1 and endogenous MBNL1 both co-localise with the stress granule marker TIA1 in stress granules in lens epithelial cells. It would, therefore, be highly unlikely that CUGBP1 and MBNL1 would not co-localise with each other in stress granules. We have also previously verified that GFPMBNL1 behaves identically to its endogenous counterpart (Coleman et al, 2014). Furthermore, in figure 4C and 4D, we show close co-localisation between endogenous CUGBP1 and GFPMBNL1 in stress granules in our HeLa cell model, using high-resolution AiryScan microscopy for which we provide detailed quantitation.

      This aspect should be addressed experimentally since the authors also conclude that "a complex relationship between MBNL1 and CUGBP1 in stress granules" exists. Thus, the authors need to assess the colocalization of GFPMBNL1 with endogenous CUGBP1 in stress granules and the one of GFPCUGBP1 with endogenous MBNL1.

      The complex relationship we propose is based on the effects of CUGBP1 or MBNL1 knockdown on the dynamic behaviours of each other by FRAP assay and not solely on their co-localisation, although we have already analysed their co-localisation in detail as above.

      Figure 6: Please add antibody labeling to microscopy panels A and B.

      Certainly, this was an accidental omission and has been added

      Moreover, specify is the numbers refer to minutes in panel F. The data representation is also unclear - see comment above, Figure 5.

      As stated in the figure legend and on the graph axes, these numbers have been normalised to the mean time taken for SG formation/loss in the control CTG12 cell line (set at 100%). The precise numbers in minutes for mean and SD are given in the text. We have added additional graphs of ‘% of cells with stress granules against time’ to this figure, with the values in minutes given to clarify the exact time-scale.

      Figure 7: was 1,6-hexanediol added in presence of arsenite? Or was arsenite removed?

      Arsenite was not removed (neither was Doxycycline) as we wanted to examine the effect of 1,6-hexanediol on SGs and PBs without the added complication of the effects of stress removal. We will clarify this point in the methods/results.

      Aberrant persistent stress granules have been implicated in age-related (Mateju et al., 2017) and neurodegenerative diseases (Protter and Parker, 2016), such as ALS and FTD (Jain et al., 2016; Markmiller et al., 2018; Zhang et al., 2018). These are proposed to result from increased liquid-to-solid phase transitions within the stress granules (Mateju et al., 2017)." The authors should better define what are aberrant stress granules (e.g. see Ganassi et al., 2016; Turakhiya et al., 2018, PMID: 29804830).*

      We will expand on this subject in the discussion

      "Persistent stress granules have long been associated with degenerative conditions, notably ALS (Li et al., 2013)". I suggest updating the reference adding a more recent one.

      We selected this 2013 review to emphasise that there is a long history of association of persistent stress granules with degenerative conditions. We will add in an additional, more recent review.

      Significance

      The work is descriptive; thus, in this form I do not consider that it is strongly advancing the field.

      Having noted alterations to stress granule disassembly in lens epithelial cells from DM1 patients, we went on to develop a novel inducible model in which we replicated and enhanced these effects by expressing the large CUGexp RNA that causes DM1 as part of a DMPK mini-gene mimicking the genetic mutation seen in DM1 patients. This is not purely descriptive. Furthermore, we are now in a position to add an additional figure showing two pieces of evidence for functional defects in stress granules associated with CUGexp RNA expression 1) reduced accumulation of total PolyA RNA in stress granules indicating compromised function and 2) compromised ‘docking’ events between stress granules and P-bodies, a process proposed to be integral to the function of both structures.

    1. Author Response:

      Evaluation Summary:

      This manuscript is of primary interest to readers in the field of infectious diseases especially the ones involved in COVID-19 research. The identification of immunological signatures caused by SARS-CoV-2 in HIV-infected individuals is important not only to better predict disease outcomes but also to predict vaccine efficacy and to potentially identify sources of viral variants. In here, the authors leverage a combination of clinical parameters, limited virologic information and extensive flow cytometry data to reach descriptive conclusions.

      We have extensively reworked the paper.

      Reviewer #1 (Public Review):

      The methods appear sound. The introduction of vaccines for COVID-19 and the emergence of variants in South Africa and how they may impact PLWH is well discussed making the findings presented a good reference backdrop for future assessment. Good literature review is also presented. Specific suggestions for improving the manuscript have been identified and conveyed to the authors.

      We thank the Reviewer for the support.

      Reviewer #2 (Public Review):

      Karima, Gazy, Cele, Zungu, Krause et al. described the impact of HIV status on the immune cell dynamics in response to SARS-CoV-2 infection. To do so, during the peak of the KwaZulu-Natal pandemic, in July 2020, they enrolled a robust observational longitudinal cohort of 124 participants all positive for SARS-CoV-2. Of the participants, a group of 55 people (44%) were HIV-infected individuals. No difference is COVID-19 high risk comorbidities of clinical manifestations were observed in people living with HIV (PLWH) versus HIV-uninfected individuals exception made for joint ache which was more present in HIV-uninfected individuals. In this study, the authors leverage and combine extensive clinical information, virologic data and immune cells quantification by flow cytometry to show changes in T cells such as post-SARS-CoV-2 infection expansion of CD8 T cells and reduced expression CXCR3 on T cells in specific post-SARS-CoV-2 infection time points. The authors also conclude that the HIV status attenuates the expansion of antibody secreting cells. The correlative analyses in this study show that low CXCR3 expression on CD8 and CD4 T cells correlates with Covid-19 disease severity, especially in PLWH. The authors did not observe differences in SARS-CoV-2 shedding time frame in the two groups excluding that HIV serostatus plays a role in the emergency of SARS-CoV-2 variants. However, the authors clarify that their PLWH group consisted of mostly ART suppressed participants whose CD4 counts were reasonably high. The study presents the following strengths and limitations

      We thank the Reviewer for the comments. The cohort now includes participants with low CD4.

      Strengths:

      A. A robust longitudinal observational cohort of 124 study participants, 55 of whom were people living with HIV. This cohort was enrolled in KwaZulu-Natal,South Africa during the peak of the pandemic. The participants were followed for up to 5 follow up visits and around 50% of the participants have completed the study.

      We thank the Reviewer for the support. The cohort has now been expanded to 236 participants.

      B. A broad characterization of blood circulating cell subsets by flow cytometry able to identify and characterize T cells, B cells and innate cells.

      We thank the Reviewer for the support.

      Weaknesses:

      The study design does not include

      A. a robust group of HIV-infected individuals with low CD4 counts, as also stated by the authors

      This has changed in the resubmission because we included participants from the second, beta variant dominated infection wave. For this infection wave we obtained what we think is an important result, presented in a new Figure 2:

      This figure shows that in infection wave 2 (beta variant), CD4 counts for PLWH dropped to below the CD4=200 level, yet recovered after SARS-CoV-2 clearance. Therefore, the participants we added had low CD4 counts, but this was SARS-CoV-2 dependent.

      B. a group of HIV-uninfected individuals and PLWH with severe COVID-19. As stated in the manuscript the majority of our participants did not progress beyond outcome 4 of the WHO ordinal scale. This is also reflected in the age average of the participants. Limiting the number of participants characterized by severe COVID-19 limits the study to an observational correlative study

      Death has now been added to Table 1 under the “Disease severity” subheading. The number of participants who have died, at 13, is relatively small. We did not limit the study to non-critical cases. Our main measure of severity is supplemental oxygen.

      This is stated in the Results, line 106-108:

      “Our cohort design did not specifically enroll critical SARS-CoV-2 cases. The requirement for supplemental oxygen, as opposed to death, was therefore our primary measure for disease severity.”

      This is justified in the Discussion, lines 219-225:

      “Our cohort may not be a typical 'hospitalized cohort' as the majority of participants did not require supplemental oxygen. We therefore cannot discern effects of HIV on critical SARS-CoV-2 cases since these numbers are too small in the cohort. However, focusing on lower disease severity enabled us to capture a broader range of outcomes which predominantly ranged from asymptomatic to supplemental oxygen, the latter being our main measure of more severe disease. Understanding this part of the disease spectrum is likely important, since it may indicate underlying changes in the immune response which could potentially affect long-term quality of life and response to vaccines.”

      C. a control group enrolled at the same time of the study of HIV-uninfected and infected individuals.

      This was not possible given constraints imposed on bringing non-SARS-CoV-2 infected participants into a hospital during a pandemic for research purposes. However, given that the study was longitudinal, we did track participants after convalescence. This gave us an approximation of participant baseline in the absence of SARS-CoV-2, for the same participants. Results are presented in Figure 2 above.

      D. results that elucidate the mechanisms and functions of immune cells subsets in the contest of COVID-19.

      We do not have functional assays.

      Reviewer #3 (Public Review):

      Karim et al have assembled a large cohort of PLWH with acute COVID-19 and well-matched controls. The main finding is that, despite similar clinical and viral (e.g., shedding) outcomes, the immune response to COVID-19 in PLWH differs from the immune response to COVID-19 in HIV uninfected individuals. More specifically, they find that viral loads are comparable between the groups at the time of diagnosis, and that the time to viral clearance (by PCR) is also similar between the two groups. They find that PLWH have higher proportions and also higher absolute number of CD8 cells in the 2-3 weeks after initial infection.

      The authors do a wonderful job of clinically characterizing the research participants. I was most impressed by the attention to detail with respect to timing of viral diagnosis as it related to symptom onset and specimen collection. I was also impressed by the number of longitudinal samples included in this study.

      We thank the Reviewer for the support.

    1. Author Response:

      Reviewer #1:

      The authors demonstrate deficits in perceptual tests related to fine-time perception in non-speech and speech sounds in a group of patients with stroke aphasia compared to a control group without a lesion. A subgroup of patients with deficits in spectrotemporal processing at a fine timescale have lesions mapped to the posterior STS, MTG and adjacent white matter. The area associated with deficits in spectrotemporal analysis with a fine timescale is then used as a seed for probabilistic fibre tractography based on diffusion MR. These results show connectivity of the functionally defined seed region with a number of areas including the cerebellum.

      The work is carefully done and I think interesting in demonstrating the cerebellar connections of the functionally defined region associated with deficits in fine temporal analysis that might be a basis for event representation at this temporal level.

      We appreciate the referee's evaluation and constructive feedback.

      Reviewer #2:

      Based on consideration of supportive evidence in the literature, the authors propose that a cerebellar-temporal lobe functional network plays a key role in auditory temporal processing. The precise parsing of temporal information is critical to understanding dynamic auditory processing and thus is an interesting area of study. Better understanding of how the cerebellum and temporal lobe may interact to achieve such parsing of the dynamic signal in a generative/predictive internal model is of clear interest to a broad readership. This idea is put to the test by first having individuals with lesions in the posterior portion of left temporal lobe perform speech perception and timing tasks and comparing performance with 12 healthy controls to establish the role of this region in tasks reliant on intact fine temporal processing. Typically, a lesion model will be helpful when a dissociation between structure and function can be demonstrated, and preferably this would be a double dissociation. Here, while lesions to auditory regions of the left temporal lobe are associated with impoverished performance on speech and temporal order tasks relative to a healthy control group, performance on comparably difficult auditory tasks that do not require good temporal discrimination is not tested to determine if there is such a dissociation. Given the extensive discussion of hypothesized different time sensitivities of right and left auditory cortices in the Introduction, patients with right homologous lesions might also have served as an interesting control and could have supported a double dissociation. In a second step to their study, a seed region was generated based on comparison of the lesion loci for the half of the patients who performed most poorly on the behavioral tasks to the other half, and this was used to explore anatomical tract connectivity of the seed region to the rest of the brain in the neuroimaging data from the healthy controls, with a focus on connections with the cerebellum. This approach to establishing that "temporo-cerebellar connectivity underlies timing constraints in audition" is unfortunately just not that convincing. The data are interesting, but taken alone they simply do not support such a conclusion. In the data, there is no clear functional link established or even hinted at between the temporal lobe and the cerebellum.

      We appreciate the referee's evaluation and constructive feedback. We address the raised concerns point by point below. We appreciate the concerns regarding our methodological choice and our interpretation of a functional link between the temporal lobe and the cerebellum. It certainly is more reasonable to derive a functional interpretation based on disconnection measured directly in patients’ DTI. However, if unavailable, indirect measures of disconnection can also be used to establish a functional link between a lesioned region and the networks associated with it. The rationale behind this is that it reflects an indirect estimation of the effect of a lesion on structural brain networks. To make this approach clearer, we have revised the manuscript accordingly. See revised manuscript pages 6 and 12:

      [...] Assessing connectivity in healthy participants based on lesion information is a relatively new method that measures structural disconnection in networks associated with given anatomical regions (Foulon et al., 2018). This allows for the indirect estimation of the lesion effect on structural brain networks. In this regard, it was shown that behavioral deficits can be explained similarly by local brain damage and indirectly measured disconnection (Salvalaggio et al., 2020). [...]

      [...] We next used the respective areas as seed regions for probabilistic fiber tractography in a healthy age-matched sample to visualize the underlying common connectivity pattern (see Methods). Thus, we indirectly explored the association between posterior superior temporal disconnection and processing of sound at short timescales. [...]

      We also changed the abstract and conclusion accordingly. See pages 2 and 15 of the revised manuscript.

      [...] Here we tested whether temporo-cerebellar disconnection is associated with the processing of sound at short timescales. [...]

      [...] The evidence we describe (i) shows that lesion-related deficits in spectrotemporal analysis occur in posterior temporal regions connected to the cerebellum [...].

      Reviewer #3:

      Stockert et al. investigate the cortico-cerebellar network underpinning rapid temporal auditory analysis. This study uses a well-defined group of stroke participants with mostly circumscribed lesions to the left posterior superior temporal lobe to motivate probabilistic tractography from cortical regions associated with verbal and non-verbal rapid auditory temporal analysis. Lesion-symptom mapping identifies a specific region of the posterior superior temporal sulcus and underlying white matter as statistically associated with impairment in rapid auditory temporal analysis. Tractography results demonstrate that these regions have high structural connectivity to wider regions of the left hemisphere cortical language network and ipsilateral and contralateral connectivity to postero-lateral cerebellum and dentate nucleus. It is interpreted that this cortico-cerebellar network is crucial to developing representations of fine auditory temporal structure.

      The conclusions of the paper are an interpretation which is based on integrating previous neuropsychology with the current tractography results and based on well-defined models in the motor domain. Such conclusions are not unreasonable but there is no direct (associative) evidence linking this network to the cognitive function of interest.

      Strengths:

      The paper integrates neuropsychology and neuroimaging methodologies to build a coherent picture which is more than the sum of its parts. The stroke group has well-defined and selected lesions which enable testing of the hypotheses put forward by the authors. The behavioural measures are sensitive and suitable to identify impairments in the behaviours of interest. There has been a detailed analysis of the behavioural speech perception data in the stroke group which largely, although perhaps not entirely, conforms to the asymmetric temporal sampling hypothesis. The lesion-symptom mapping approach is suitable for the nature of the population (small group with similar lesion distributions) and has allowed neuropsychologically guided tractography in the neurotypical population. This has clearly illustrated the complexity of the structural connectivity of the posterior superior temporal sulcus and underlying regions.

      Weaknesses:

      The selective nature of the stroke population - relatively small, chronic lesions - has resulted in only mild impairments for a small number of participants (6/12 participants). At the group level there is no difference between the stroke and neurotypical population on speech perception measures - group statistics do not reach one tailed significance. This reduces the certainty with which the regions identified are associated with the behaviour or interest. However, the results do conform to previous neuropsychology and lesion studies and it is likely that this lack of effect is due to low statistical power.

      Please refer to our response to the next point.

      All the stroke participants have a similar lesion distribution, and this makes lesion-symptom mapping challenging. For example, lesion data do not give an indication of the functional integrity of perilesional regions which can be reduced, even at the chronic stage, therefore the superior temporal sulcus may not be functioning effectively, even in the proportion of the group without lesions to this area. Lesion symptom mapping is more robust with a wider distribution of lesions and the inclusion of participants with lesions remote from the area of interest. Having said that, the behavioural measures appear sensitive enough to identify mild impairments and the authors, for good reason, wished to reduce the extension of lesion into primary auditory regions. As above, given the limited sample and homogeneous lesion, the lesion symptom mapping approach is reasonable.

      We agree that the small number of patients is a possible limitation to the study and add this point to the limitations section. See revised manuscript page 21.

      [...] First, the study population is relatively small and lesion symptom mapping is typically applied to larger populations with wider lesion distribution. Although careful selection of circumscribed lesions has the advantage of highlighting behavioral differences without confounding other deficits (e.g., primary auditory processing), it is possible that additional regions are involved in processing of sound at short timescales. However, tractography based on healthy participants makes it possible to indirectly obtain information (i.e., structural disconnection) about brain regions contributing to the investigated function. In addition, it is likely that the small number of patients might hamper the ability to detect statistically significant differences between the behavior of controls and patients. Nevertheless, we are confident that the current results align with the fact that the posterior superior temporal cortex contributes to the processing of sound at short timescales, as indicated by previous neuropsychological evidence and lesion studies (Boemio et al., 2005; Chedru, Bastard, and Efron, 1978; Efron, 1963; Robson, Grube, Lambon Ralph, Griffiths, & Sage, 2013; Swisher & Hirsh, 1972). Further studies should however test larger populations to replicate and extend this finding. [...]

      The authors suggest that the behavioural results conform to the asymmetric temporal sampling hypothesis in that only place of articulation discrimination impairments in the stroke group can be (just about) detected, whereas there were no significant stroke-neurotypical differences in other phonetic contrasts. It is not clear that the VOT differences associated with plosive voicing changes and the cues associated with place changes happen over fundamentally different time-scales and, therefore, it is important to further justify the interpretation of the data. In the future it will be helpful to have this level of analysis applied to individuals with lesions to the wider speech perception network to draw conclusions about the specificity of the impairment to these regions - for example, impairments in phoneme discrimination have been associated with frontal lobe lesions.

      It appears that voicing contrasts in which shorter and longer voice onset times result in the perception of a voiced or voiceless plosive (for example [t] and [d]) are encoded in both the temporal envelope and fine structure (Rosen 1992) of the speech signal that occur in time windows of 20-500 ms and <2 ms, respectively. In words an additional cue is the closure time, which can be further used to discriminate between voiced and voiceless plosives. However, place of articulation contrasts are exclusively encoded in the temporal fine structure (i.e., very quick transitions of the frequency spectrum, formant transitions). Even though for all contrasts shorter timescale information plays a role, somewhat redundant encoding is present for voice contrasts. Ultimately, place of articulation contrasts seem to be the most difficult to discriminate. In Figure 2D it is apparent that despite highest error rates for the place of articulation contrasts, several patients also showed impaired discrimination for voicing contrast when compared to healthy controls. We do agree with the referee that it would be interesting to also extend this level of analysis to individuals with lesions in the wider speech perception network in future work.

      The tractography results reveal a complex pattern of structural connectivity, including other regions associated with speech perception. The authors have a theoretical motivation to focus on the importance of the temporo-cerebellar pathway but there is no correlation evidence to link auditory temporal analysis to the integrity of this pathway in the neurotypical population. The non-verbal measures appear to be sufficiently sensitive for this type of analysis. This lack of association with behaviour makes it hard to draw conclusions about the functional role of this network.

      We appreciate the referee’s concerns about our interpretation of the functional link between the temporal lobe and the cerebellum regarding auditory temporal analysis. It certainly is more reasonable to derive a functional interpretation based on disconnection measured directly in patients DTI. However, if unavailable, indirect measures of disconnection can also be used to establish a functional link between a lesioned region and the networks associated with it. The rationale behind this is that it reflects an indirect estimation of the effect of a lesion on structural brain networks. To make this approach clearer, we have modified the manuscript as such. See revised manuscript pages 6 and 12:

      [...] Assessing connectivity in healthy participants based on lesion information is a relatively new method that measures structural disconnection in networks associated with given anatomical regions (Foulon et al., 2018). This allows for the indirect estimation of the effect of a lesion on structural brain networks. In this regard, it has been shown that behavioral deficits are explained to a similar extent by both the local damage and indirectly measured disconnection (Salvalaggio et al., 2020). [...]

      [...] We next used the respective areas as seed regions for probabilistic fiber tractography in a healthy age-matched sample to visualize the underlying common connectivity pattern (see Methods). Thus, we indirectly explored the association between posterior superior temporal disconnection and processing of sound at short timescales. [...]

      We also changed the abstract and conclusion accordingly. See revised manuscript pages 2 and 15.

      [...] Here we tested whether temporo-cerebellar disconnection is associated with processing of sound at short timescale. [...]

      [...] The evidence we describe (i) shows that lesion-related deficits in spectrotemporal analysis occur in posterior temporal regions connected to the cerebellum [...].

    1. Author Response:

      Reviewer #1 (Public Review):

      The authors interrogated an underexplored feature of CRISPR arrays to enhance multiplexed genome engineering with the CRISPR nuclease Cas12a. Multiplexing represents one of the many desirable features of CRISPR technologies, and use of highly compact CRISPR arrays from CRISPR-Cas systems allows targeting of many sites at one time. Recent work has shown though that the composition of the array can have a major impact on the performance of individual guide RNAs encoded within the array, providing ample opportunities for further improvements. In this manuscript, the authors found that the region within the repeat lost through processing, what they term the separator, can have a major impact on targeting performance. The effect was specifically tied to upstream guide sequences with high GC content. Introducing synthetic separator sequences shorter than their natural counterparts but exhibiting similarly low GC content boosted targeted activation of a reporter in human cells. Applying one synthetic separator to a seven-guide array targeting chromosomal genes led to consistent though more modest targeted activation. These findings introduce a distinct design consideration for CRISPR arrays that can further enhance the efficacy of multiplexed applications. The findings also suggest a selective pressure potentially influencing the repeat sequence in natural CRISPR arrays.

      Strengths:

      The portion of the repeat discarded through processing normally has been included or discarded when generating a CRISPR-Cas12a array. The authors clearly show that something in between-namely using a short version with a similarly low GC content-can enhance targeting over the truncated version. A coinciding surprising result was that the natural separator completely eliminated any measurable activation, necessitating the synthetic separator.

      The manuscript provides a clear progression from identifying a feature of the upstream sequences impacting targeting to gaining insights from natural CRISPR-Cas12a systems to applying the insights to enhance array performance.

      With further support, the use of synthetic separators could be widely adopted across the many applications of CRISPR-Cas12a arrays.

      Weaknesses:

      The terminology used to describe the different parts of the CRISPR array could better align with those in the CRISPR biology field. For one, crRNAs (abbreviated from CRISPR RNAs) should reflect the final processed form of the guide RNA, whereas guide RNAs (gRNAs) captures both pre-processed and post-processed forms. Also, "spacers" should reflect the natural spacers acquired by the CRISPR-Cas system, whereas "guides" better capture the final sequence in the gRNA used for DNA target recognition.

      We thank the reviewer for this correction. We have now changed most uses of “crRNA” to “gRNA”. We decided to retain the use of the word “spacer” for the target recognition portion of the gRNA rather than changing it to “guide” as the reviewer suggests, because we think there is a risk that the reader would confuse “guide” with the non-synonymous “guide-RNA”. We have added a remark explaining our use of “spacer” (“A gRNA consists of a repeat region, which is often identical for all gRNAs in the array, and a spacer (here used synonymously with “guide region”)”)

      A running argument of the work is that the separator specifically evolved to buffer adjacent crRNAs. However, this argument overlooks two key aspects of natural CRISPR arrays. First, the spacer (~30 nts) is normally much longer than the guide used in this work (20 nts), already providing the buffer described by the authors. This spacer also undergoes trimming to form the mature crRNA.

      If we understand this comment correctly, the argument is that, in contrast to a ~20-nt spacer, a 30-nt spacer would provide a buffer between adjacent guides even if a separator is not present. However, even a 30-nt spacer may have high GC content and form secondary structures that would interfere with processing of the subsequent gRNA. Our hypothesis is that the separator is AT-rich and so insulates gRNAs from one another regardless of the length or GC composition of spacers. Please let us know if we have misunderstood this comment.

      Second, the repeat length is normally fixed as a consequence of the mechanisms of spacer acquisition. At most, the beginning of each repeat sequence may have evolved to reduce folding interactions without changing the repeat length, although some of these repeats are predicted to fold into small hairpins.

      We agree with this comment. Indeed, we propose that the separator, which is part of the repeat sequence, has evolved to reduce folding interactions. We now clarify this at the end of the Results section: “Taken together, the results from our study suggest that the CRISPR-separator has evolved as an integral part of the repeat region that likely insulates gRNAs from the disrupting effects of varying GC content in upstream spacers.”

      Prior literature has highlighted the importance of a folded hairpin with an upstream pseudoknot within the repeat (Yamano Cell 2016), where disrupting this structure compromises DNA targeting by Cas12a (Liao Nat Commun 2019, Creutzburg NAR 2020). This structure is likely central to the authors' findings and needs to be incorporated into the analyses.

      We thank the reviewer for this important insight. We have now performed experiments exploring the involvement of the pseudoknot in the disruptive effects of high-GC spacers.

      First, we used our 2-gRNA CRISPR array design (Fig. 1D) where the second gRNA targets the GFP promoter and the first gRNA contains a non-targeting dummy spacer. We generated several versions of this array where we iteratively introduced targeted point mutations in the dummy spacer to either form a hairpin restricted to the dummy spacer, or a hairpin that would compete with the pseudoknot in the GFP-gRNA’s repeat region (new Fig. S3). We found that both of these modifications significantly reduced performance of the GFP-targeting gRNA. These results suggest that interfering with the pseudoknot indeed disrupts gRNA performance, but that also hairpins that presumably don’t interfere directly with the pseudoknot are detrimental – perhaps by sterically hindering Cas12a from accessing its cleavage site. Interestingly, the AAAT synSeparator largely rescued performance of the worst-performing of these constructs. These results are displayed in the new Fig. S3 and discussed in the related part of the Results section.

      Second, we have now performed a computational analysis using RNAfold where we correlated the performance of all dummy spacers with their predicted secondary structure (Fig. 1M). The correlation between predicted RNA structure and array performance was higher when the structural prediction included both the dummy spacer and the entire GFP-targeting gRNA (R2 = 0.57) than when it included only the dummy spacer (R2 = 0.27; new figure panel S1C). This higher correlation suggests that secondary structures that involve the GFP-targeting gRNA play a more important role in our experiment than secondary structures that only involve the dummy spacer. These results are described in the Results section and in the Fig. 1 legend.

      Third, we now also performed secondary structure analysis (RNAfold) of two of our worst-performing dummy spacers (50% and 70% GC), which indicated that these spacers are likely to form secondary structures that involve both the repeat and spacer of the downstream GFP-targeting gRNA (Fig. 3G-H). Interestingly, this analysis suggested that the AAAT synSeparator improves performance of these spacers by loosening up these secondary structures or creating an unstructured bulge at the Cas12a cleavage site. These results are presented in Fig. 3G-H and the accompanying portion of the Results section.

      To conclude, our analyses suggest that the secondary structure in the spacer and its interference with the pseudoknot in the repeat hairpin play a role in gRNA performance, wherein the inclusion of the AAAT synSeparator can partly rescue the performance, likely by restoring the Cas12a accessibility to the gRNA cleavage site.

      Many claims could better reflect the cited literature. For instance, Creutzburg et al. showed that adding secondary structures to the guide to promote folding of the repeat hairpin enhanced rather than interfered with targeting.

      We thank the reviewer for this comment. Creutzburg et al. report the interesting finding that a carefully designed 3’ extension of the spacer can counteract secondary structures that disrupt the repeat. In this way, the extension rescues disruptive secondary structures that involve the repeat and any upstream sequence. Relevant to this finding, it is conceivable that the synSeparator (AAAT) exerts its beneficial effect at the 3’ end of the GFP spacer by folding back onto the GFP spacer and in this way blocking secondary structures caused by a GC-rich dummy spacer located upstream of the GFP gRNA, according to the mechanism reported by Creutzburg et al. However, we used structural prediction of the GFP-targeting gRNA with and without the AAAT synSeparator and did not find evidence that the AAAT extension would cause this spacer to fold back onto itself (data not shown). Moreover, our experimental data (Fig. 3E) demonstrate that the synSeparator exerts its main beneficial effect when located upstream of the GFP-targeting gRNA, which would not be the case if the main mechanism was the one demonstrated by Creutzburg et al. We already had a paragraph discussing the Creutzburg paper in the Discussion, but we have now added a sentence specifying the mechanism that Creutzburg et al. demonstrated: “RNA secondary structure prediction (RNAfold) did not indicate that the GFP-targeting spacer would fold back on itself when an AAAT extension is added to the 3’ end, which would have been the case for the mechanism demonstrated by Creutzburg et al. (data not shown).”

      Liu et al. NAR 2019 further showed that the pre-processed repeat actually enhanced rather than reduced performance compared to the processed repeat.

      The experiment referenced by the reviewer (Fig. 2 in Liu et al., Nucleic Acids Research, 2019) in fact nicely supports our findings. In Liu et al., the pre-processed repeat only shows improved performance if it is located upstream of the targeting gRNA, and the gRNA is not followed by an additional pre-processed repeat (DRf-crRNA in their Fig. 2B & C). In this situation, the pre-processed repeat (containing the natural separator) may serve to enhance gRNA processing, as would be expected based on our results. At the same time, the absence of a full-length repeat downstream of the gRNA means that after gRNA processing, there will not remain any piece of RNA attached to the 3’ end of the spacer, which might disrupt gRNA performance. In contrast, when Liu et al. added an additional pre-processed repeat downstream of their gRNA (DRf-crRNA-DRf in the same panel), this construct performed the worst of all tested variants. This is consistent with our conclusion that the full-length separator reduces performance of gRNAs if it remains attached to the 3’ end of spacers. We have added a paragraph in the Discussion about this (Line 376).

      Finally, the complete loss of targeting with the unprocessed repeat appears represent an extreme example given multiple studies that showed effective targeting with this repeat (e.g. Liu NAR 2019, Zetsche Nat Biotechnol 2016).

      We acknowledge that our CRISPR array containing the full, natural separator (Fig. 3B) appears to be completely non-functional in contrast to the studies mentioned by the reviewer. We think this difference may have a few possible explanations. First, this array is in fact not entirely non-functional. Re-running the same experiment with a stronger dCas12a-activator (dCas12a-VPR, full length VPR, also used in Fig. 5) shows some modest GFP activation even with the full separator (1.4% vs 20.8% GFP+ cells; see the Appendix Figure 1). But for consistency, we have used the same, slightly less effective, dCas12a-activator (dCas12a-miniVPR) for all GFP-targeting experiments. Second, both the Liu et al. and Zetsche et al. studies used CRISPR editing rather than CRISPRa. We speculate that this might explain their relatively high indel frequency: Only a single cleavage event needs to take place for an indel to occur, whereas gene activation presumably requires the dCas12a-activator to be present on the promoter for extended periods of time. Thus, any inefficiency in DNA binding caused by the separator remaining attached to the spacer might disfavor CRISPRa activity more than CRISPR-editing activity. We have added these considerations to the Discussion and referenced the suggested papers (Line 376).

      Appendix Figure 1: Percentage of GFP+ cells without or with a full-length separator using dCas12a-VPR (full length) gene activation.

      Relating to the above point, the vast majority of the results relied on a single guide sequence targeting GFP. While the seven-guide CRISPR array did involve other sequences, only the same GFP targeting guide yielded strong gene activation. Therefore, the generalizability of the conclusions remains unclear.

      We have now performed several experiments that address the generalizability of our conclusions:

      First, we now include data demonstrating that the beneficial effect of adding a synSeparator is not limited to the AAAT sequence derived from the Lachnospiraceae bacterium separator. We now include three other 4-nt, AT-rich synSeparators derived from Acidaminococcus s. (TTTT), Moraxella b. (TTTA) and Prevotella d. (ATTT) (Fig. 3I). All these synSeparators rescued the poor GFP activation caused by an upstream spacer with high GC content, though not equally effectively. The quantitative difference between the synSeparators could either be due to the intrinsic “insulation capacity” of these sequences, or the way they interact with the Lb-Cas12a protein, or to sequence-specific interactions with this particular CRISPR array. We discuss these possibilities in the Discussion (Line 437).

      Second, we now include data demonstrating that nuclease-deactivated, enhanced-Cas12a from Acidaminococcus species (enAsdCas12a; Kleinstiver et al., 2019) is also sensitive to the effects of high-GC spacers (Fig. 3J). This poor performance was largely rescued by including a TTTT synSeparator derived from the natural AsCas12a separator.

      Furthermore, we have now included a paragraph in the Discussion where we speculate on why the effect of adding the synSeparator was more modest for the endogenous genes than for GFP: 1) Our GFP-expressing cell line has multiple GFP insertions in its genome, and each copy has seven protospacers in its promoter. This may amplify the effect of the synSeparator. 2) The gRNAs used for endogenous activation were taken from the literature or had been pre-tested by us. These guides had thus already proven to be successful and might not be particularly disruptive (e.g., they were not selected by us for having high GC content). Therefore, researchers might experience the greatest benefit from the synSeparator with newly designed spacers that have not already proven to be effective even without the synSeparator.

      Reviewer #3 (Public Review):

      Magnusson et al., do an excellent job of defining how the repeated separator sequence of Wild Type Cas12a CRISPR arrays impacts the relative efficacy of downstream crRNAs in engineered delivery systems. High-GC content, particularly near the 3' end of the separator sequence appears to be critically important for the processing of a downstream crRNA. The authors demonstrated naturally occurring separators from 3 Cas12a species also display reduced GC content. The authors use this important new information to construct a synthetic small separator DNA sequence which can enhance CRISPR/Cas12a-based gene regulation in human cells. The manuscript will be a great resource for the synthetic biology field as it shows an optimization to a tool that will enable improved multi-gene transcriptional regulation.

      Strengths:

      • The authors do an excellent job in citing appropriate references to support the rationale behind their hypotheses.
      • The experiments and results support the authors' conclusions (e.g., showing the relationship between secondary structure and GC content in the spacers).
      • The controls used for the experiments were appropriate (e.g., using full-length natural separator vs single G or 1 to 4 A/T nucleotides as synthetic separators).
      • The manuscript does a great job assessing several reasons why the synthetic separator might work in the discussion section, cites the relevant literature on what has been done and restates their results to argument in favor or against these reasons.
      • This paper will be very useful for research groups in the genome editing and synthetic biology fields. The data presented (specially the data concerning the activation of several genes) can be used as a comparison point for other labs comparing different CRISPR-based transcriptional regulators and the spacers used for targeting.
      • This paper also provides optimization to a tool that will be useful for regulating several endogenous genes at once in human cells thus helping researchers studying pathways or other functional relationships between several genes.

      Opportunities for Improvement:

      • The authors have performed all the experiments using LbCas12a as a model and have conclusively proven that the synSeparator enhances the performance of Cas12a based gene activation. Is this phenomenon will be same for other Cas12a proteins (such as AsCas12a)? The authors should perform some experiments to test the universality of the concept. Ideally, this would be done in HEK293T cells and one other human cell type.

      We thank the reviewer for these suggestions. We have now addressed the generalizability of our findings with several new experiments. First, we now include data demonstrating that nuclease-deactivated, enhanced Cas12a from Acidaminococcus species (denAsCas12a; Kleinstiver et al., 2019) is also sensitive to the effects of high-GC spacers (Fig. 3J). This poor performance was largely rescued by including a TTTT synSeparator derived from the natural AsCas12a separator.

      Second, we now include data demonstrating that the beneficial effect of adding a synSeparator is not limited to the AAAT sequence derived from the Lachnospiraceae b. separator. We now include three other 4-nt, AT-rich synSeparators derived from Acidaminococcus s. (TTTT), Moraxella b. (TTTA) and Prevotella d. (ATTT) (Fig. 3I). All these synSeparators rescued the poor GFP activation caused by an upstream spacer with high GC content, though not equally effectively. The quantitative difference between the synSeparators could either be due to the intrinsic “insulation capacity” of these sequences, or the way they interact with the Lb-Cas12a protein, or to sequence-specific interactions with this particular CRISPR array. We discuss these possibilities in the Discussion.

      Third, as described above, we have now performed an in vitro Cas12a cleavage assay and present the data in a new figure (Fig. 4). We found that a CRISPR array containing a 70%-GC dummy spacer was processed less efficiently than an array containing a 30%-GC spacer, but that addition of a synSeparator could to a large extent rescue this processing defect (Fig. 4E). The fact that this result was observed even in a cell-free in vitro setting demonstrates that it is a general feature of Cas12a CRISPR arrays that is likely to work the same way in many cell types rather than being specific to HEK293T cells.

      Fourth, we attempted to investigate the effect of the synSeparator in different cell types. However, either due to poor transfection efficiency or poor expression of the Cas12a activator construct, CRISPRa activity was consistently poor in these cell types, both with and without the synSeparator (e.g., we did not visually observe fluorescence from the mCherry gene fused to the dCas12a activator, which we always see in HEK293T cells). Because of the low general efficiency of CRISPRa, it was not possible to evaluate the performance of the synSeparator. Many cell types are difficult to transfect and dCas12a-VPR-mCherry is a big construct (>6 kb). To our knowledge, there have not been many reports using dCas12a-VPR in cell types other than HEK293T. While we think that it will be important to optimize CRISPRa in many cell types (e.g., by optimizing transfection conditions, Cas12a variants, promoters, expression vectors, etc.), the focus of our study has been to show the separator’s mechanism and general function; we believe that optimizing general CRISPRa for different cell types is beyond the scope of this paper. We acknowledge that this is a limitation of our study and we have added a paragraph about this in the Discussion (line 355). We nevertheless hypothesize that the negative influence of high-GC spacers and the insulating effect of synSeparators are generalizable across cell types. That is because we could observe improved array processing with the synSeparator even in the cell-free context of an in vitro expression system, as described above (Fig. 4). This suggests that the sensitivity to spacer GC content is determined only by the interaction between Cas12a and the array, rather than being dependent on a particular cellular context.

    1. the genre by replac-ing the romantic couple with two sisters. As we will see, Anna’s relationship with Elsa, rather than with either of her male love interests, is easily readable as the ‘couple’ relationship for purposes of the dual-focus narrative in Frozen.

      I understand that Disney may not have openly queer characters and I understand people want more representation. However, I think it is weird to compare the sisters to anything "couple" related since they are sisters and that has nothing to do with being Queer. So I don't exactly understand how someone could take that into an account and have a theory that Frozen is a queer movie.

    1. Author Response:

      Reviewer #1:

      The manuscript by Takahashi et al describes the interaction between MLL fusion proteins with HBO1 and its role in leukemogenesis. Myeloid progenitor transformation assays using various MLL fusion proteins reveal that MLL fusion proteins requires the TRX2 domain of MLL for effective leukemic transformation. IP-MS identifies HBO1 as a bona fide binding partner of the MLL TRX2 domain. ChIP-seq experiments show genome-wide colocalization of HBO1 complex with MLL-ENL and the WT MLL in MLL-fusion leukemia cells and MLL WT cells, respectively. ChIP-qPCR in MLL-deficient cells suggest that recruitment of HBO1 to MLL target genes (such as MYC and CDKN2C) depends on MLL. Truncation analysis of the ELL part of the MLL-ELL fusion reveal that MLL-ELL transformation activity requires OHD domain-mediated recruitment of AF4 and EAF1. Furthermore, co-IP and ChIP experiments with various fragments show that AF4 and EAF1 form two distinct SL1/MED26-containing complexes and likely the AEP/SL1/MED26 complex is competent for transactivation. Series of transformation assays suggest that MLL-ELL transforms hematopoietic progenitors via association with AEP, but not other ELL-associated proteins. Finally, the authors also show that NUP98-HBO1 fusion transforms myeloid progenitors through interaction with MLL. Overall, this is a quite comprehensive study demonstrating that various MLL fusions and NUP98 fusions transform hematopoietic progenitors via HBO1-MLL interaction, which suggests that targeting their interaction might be s new therapeutic approach.

      We appreciate the comments and inputs from the reviewers.

      Reviewer #2:

      In this manuscript, the authors identified an interesting interaction of MLL (a methyltransferase) with an HBO1-JADE complex (an acetyltransferase) and investigated the functional impact in leukemogenesis by fusion proteins containing MLL or HBO1. The data is clear and the connection between MLL and HBO1 is unexpected. The manuscript is also well organized and relatively easy to follow.

      Comments:

      1) The functional relevance of the interaction between MLL and HBO1 is still correlative. It would be important to know whether there are any results directly about the impact of the loss of the HBO1 complex on the function of MLL.

      We performed a sgRNA-dropout assay which showed that HBO1 is critically required for the survival of leukemia cell lines, as depicted in Figure 2F and Figure 2-figure supplement 3.

      2) It is important to show the source and specificity of the antibodies that were used for ChIP of the HBO1 complex.

      The details of the antibodies are provided in Key Resource Table.

      3) It might be interesting to check whether other JADE proteins and also BRD1 (another partner of HBO1) are involved.

      We agree that it would be very interesting to examine the involvement of other JADE/BRPF family proteins in the future because they share the ING4/5 subunits and BRD1 plays an important role in hematopoiesis (1). This can be addressed in future studies.

      4) The acronym TRX2 may be confusing as some might think that it is thioredoxin.

      As advised, we have changed this to THD2 (TRX homology domain 2).

      Reviewer #3:

      This paper starts with a series of bone marrow transformation assays comparing MLL fusions and domain-deletion mutants thereof to define the minimal elements for robust leukemic transformation and surveying growth and attendant common fusion targets HoxA9, Meis1 in colony replanting assays. Here they discover that a region of the MLL-N portion just upstream of the well-studied CXXC domain, termed in their previous work the "TRX2 domain" is important for the transformation capacity for several different MLL-fusions (and more minimal chimeras of key modules). A small region of the MLL-N protein encompassing the TRX2 domain and the CXXC module are subjected to complex purification, it is clear from comparison to number of controls that the TRX2 domain is an important mediator of association, perhaps indirect, with the HBO complex. Drop out experiments confirm that HBO1 knockout is lethal to MLL-rearranged leukemia, nicely confirming recent work (Ay et al., MacPherson et al.).

      ChIP-seq experiments in an ALL with MLL-ENL fusion, and then more extensively in a kidney cancer cell line indicate overlap with some of the HBO complex subunits and MLL, however this does not establish recruitment at these sites. ChIP-qPCR at a few MLL-fusion target genes with MLL depletion supports the recruitment hypothesis somewhat although mixed and modest effect sizes indicate that alternate pathways for HBO1 recruitment are involved, and could also be explained as reduced deposition of marks known to recruit HBO1, rather than direct recruitment. Sadly, the real potential strength of this work goes unrealized, as the recruitment of HBO1 mechanism remains tantalizingly out of reach. More experiments in this space could conclusively establish the molecular mechanism of a seemingly biomedically important recruitment paradigm, and thereby have much more impact.

      As the reviewer pointed out, MLL is not the only element that recruits the HBO1 complex to the target chromatin. MLL is known to deposit H3K4me marks, and the HBO1 complex is known to recognize these marks via ING4/5 subunits. We performed a ChIP-qPCR analysis of H3K4me3 in MLL-knockout cells. At the MYC promoter, the H3K4me3 marks were substantially decreased (Figure 3F). Moreover, recruitment of HBO1 was not recovered by transient expression of an MLL mutant containing THD2, indicating that the presence of H3K4me3 marks is a prerequisite for HBO1 recruitment. In accordance with this, ING5-histone interaction is required for the stable association of MLL with the HBO1 complex (Figure 8A-C). Thus, a more appropriate molecular mechanism would be the cooperative recruitment of the HBO1 complex by ING4/5-mediated chromatin association and MLL-mediated association. Because of the multiple contacts involved in this molecular network, it is not easy to pinpoint the direct contacts as desired, but our biochemical analyses indicate that PHF16 and ING4/5 offer relatively strong binding surfaces (Figure 8A-C). The ING domain of ING5 is the most likely direct binding surface identified thus far.

      At this point the paper shifts to a seemingly distinct line of inquiry, which is not closely related to the HBO1-TRX2 story to the first three figures. The new direction examines the ELL fusion partner in some detail using similar fusion protein chimeras, but a portion of Figure 4, is largely confirmatory of previously established findings about the critical regions of ELL for transformation and its AF4/EAF1 partners, adding only that portions of the MLL fusion protein are dispensable, provided that they are replaced with the PWWP of LEDGF. It is a little bit of a Frankenstein's monster experiment, and does not add much new to the field. Further experiments define potentially two distinct complexes that have already been characterized being recruited by ELL, although there is overlap here again with their previous studies, and the results are a little hard to interpret.

      A portion of Figure 4 was confirmatory to previous results. We have moved this to figure supplements in the revised manuscript (Figure 4-figure supplement 1B,C). The main topic of this paper is the role of the HBO1 complex in MLL-mediated transactivation pathways. The structure/function analysis of MLL fusion proteins demonstrated that MLL-ELL is highly dependent on the HBO1-mediated function in leukemic transformation (Figures 1 and 2). Hence, it was important to clarify the mechanism of gene activation by MLL-ELL in this study to understand why HBO1 association is required for MLL-ELL-mediated transformation. Because MLL-ELL associates with AEP similarly to major MLL fusions such as MLL-AF4 and MLL-ENL, it was speculated that MLL-ELL also activates its target genes via AEP. However, ELL associates with EAF family proteins and MLL-EAF also has transforming ability (3). Thus, EAF1-mediated functions could be more important for MLL-ELL-mediated transformation rather than AEP-mediated functions. To clarify the mechanism of MLL-ELL-mediated transformation, we generated a point mutant that selectively impaired ELL-EAF interaction and demonstrated that EAF1-association is dispensable for MLL-ELL-mediated transformation (Figure 6), thereby indicating that MLL-ELL transforms via AEP-mediated functions, which demands HBO1-mediated functions. We also showed that the presence of THD2 enhances ELL-AEP association to further suggest that one of the roles of the HBO1 complex is to enhance the association of ELL with AEP (Figure 6E). These findings are not reinterpretations of our prior results and are relevant to the main topic of this paper. We believe this part adds new information to the field, and therefore we have included it in the revised manuscript.

      The authors create structure-guided separation of function mutants in the ELL domain that binds both AEP and SL1, permitting them to specifically disrupt EAF1 interactions but not AF4. Further experiments solidify this interpretation, and find that this mutant shows no deficits in hematopoietic progenitor transformation or primary leukemia lethality, although there appears to be some effect upon reimplantation.

      The last figure in the paper tackles the seemingly unrelated Nup98-HBO1 fusion, a rare patient mutation-they demonstrate a requirement for MLL for viability of hematopoietic progenitors transformed by this fusion, connecting back to the TRX2 interaction, and show that menin inhibitors slow growth.

      Strengths:

      The identification of the TRX2 region of the MLL-N protein as the major point of contact (perhaps not direct), to the HBO1 complex adds mechanistic depth to the really important recent discovery (confirmed in this work) the MLL-fusion leukemias rely on HBO1 function. This lab has published a number of technically similar types of papers defining minimal regions of MLL and distinct interacting partners by chimeric fusions, with bone marrow transformation assays, mouse model engrafting studies, IP's, ChIP etc. In my view they are very much under cited, likely because they are similarly so challenging to read.

      Thank you for your pointed feedback. We will try our best to make the necessary improvements so that our papers are widely read and cited.

      The mixture of Co-IP biochemistry, bone marrow transformation assays, and ChIP, to define interactions, minimal requirements for transformation, and their chromatin consequences for a host of different MLL-fusions and HBO1-fusions has the potential to define the key interfaces underlying recruitment.

      Weaknesses:

      The mechanistic inquiry stops short of really defining the critical MLL-HBO1 complex interface. Defining the point of contact on the HBO1 side (even which subunit) and determining whether it is direct, or bridged by some, as yet unidentified factor, as well as conclusively demonstrating that this is the mechanism of HBO1 recruitment remain the major shortcomings.

      To address this criticism, we further investigated the mechanism of complex formation by MLL and the HBO1 complex. As we demonstrated in Figure 8A-C, the association appears to be mediated by multiple contacts mainly through PHF16 and ING4/5. Because this association needs an intact PHD finger of ING5, it likely occurs depending on the context where ING4/5 is bound to histone H3K4me2/3. The ING domain of ING5 was also required for the association, indicating that this portion may contains a point of direct contact. We speculate that HBO1 recruitment is mediated primarily by ING4/5-H3K4me3 interaction and MLL reinforces its chromatin association.

      And the follow-on figures apart from the last one, appear disconnected from this portion of the story and distract from it.

      We depicted a revised model incorporating the above-mentioned aspect in Figure 8D of the revised manuscript.

      The complex nomenclature and density/organization/logic of the presentation of experiments makes this paper difficult to read. Absence of sufficient grounding in the broader literature much beyond their own lab's work further compounds the problem.

      We changed some of the nomenclature and density/organization/logic of the presentation of the experiments to improve the readability.

      There is a lot of overlap, particularly in parts of figure 1 and figure 4 with previously published results. So perhaps re-organizing the display of data, and the organization of presentation, putting confirmatory work in the supplementary figures, would improve accessibility.

      We moved some portions of Figure 4 to figure supplement. The data for MLL-AF10 and MLL-ENL were retained in the Figure 1 as important references.

    1. Author Response:

      Reviewer #1:

      1) The user manual and tutorial are well documented, although the actual code could do with more explicit documentation and comments throughout. The overall organisation of the code is also a bit messy.

      We have now implemented an ongoing, automated code review via Codacy (https://app.codacy.com/gh/caseypaquola/BigBrainWarp/dashboard). The grade is published as a badge on GitHub. We improved the quality of the code to an A grade by increasing comments and fixing code style issues. Additionally, we standardised the nomenclature throughout the toolbox to improve consistency across scripts and we restructured the bigbrainwarp function.

      2) My understanding is that this toolbox can take maps from BigBrain to MRI space and vice versa, but the maps that go in the direction BigBrain->MRI seem to be confined to those provided in the toolbox (essentially the density profiles). What if someone wants to do some different analysis on the BigBrain data (e.g. looking at cellular morphology) and wants that mapped onto MRI spaces? Does this tool allow for analyses that involve the raw BigBrain data? If so, then at what resolution and with what scripts? I think this tool will have much more impact if that was possible. Currently, it looks as though the 3 tutorial examples are basically the only thing that can be done (although I may be lacking imagination here).

      The bigbrainwarp function allows input of raw BigBrain data in volume and surface forms. For volumetric inputs, the image must be aligned to the full BigBrain or BigBrainSym volume, but the function is agnostic to the input voxel resolution. We have also added an option for the user to specify the output voxel resolution. For example,

      bigbrainwarp --in_space bigbrain --in_vol cellular_morphology_in_bigbrain.nii \ --interp linear --out_space icbm --out_res 0.5 \ --desc cellular_morphology --wd working_directory

      where “cellular_morphology_in_bigbrain.nii” was generated from a BigBrain volume (see Table 2 below for all parameters). The BigBrain volume may be the 100-1000um resolution images provided on the ftp or a resampled version of these images, as long as the full field of view is maintained. For surface-based inputs, the data must contain a value for each vertex of the BigBrain/BigBrainSym mesh. We have clarified these points in the Methods, illustrated the potential transformations in an extended Figure 3 and highlighted the distinctiveness of the tutorial transformations in the Results.

      3) An obvious caveat to bigbrain is that it is a single brain and we know there are sometimes substantial individual variations in e.g. areal definition. This is only slightly touched upon in the discussion. Might be worth commenting on this more. As I see it, there are multiple considerations. For example (i) Surface-to-Surface registration in the presence of morphological idiosyncracies: what parts of the brain can we "trust" and what parts are uncertain? (ii) MRI parcellations mapped onto BigBrain will vary in how accurately they may reflect the BigBrain areal boundaries: if histo boundaries do not correspond with MRI-derived ones, is that because BigBrain is slightly different or is it a genuine divergence between modalities? Of course addressing these questions is out of scope of this manuscript, but some discussion could be useful; I also think this toolbox may be useful for addressing this very concerns!

      We agree that these are important questions and hope that BigBrainWarp will propel further research. Here, we consider these questions from two perspectives; the accuracy of the transformations and the potential influence of individual variation. For the former, we conducted a quantitative analysis on the accuracy of transformations used in BigBrainWarp (new Figure 2). We provide a function (evaluate_warp.sh) for BigBrainWarp users to assess accuracy of novel deformation fields and encourage detailed inspection of accuracy estimates and deformation effects for region of interest studies. For the latter, we expanded our Discussion of previous research on inter-individual variability and comment on the potential implications of unquantified inter-individual variability for the interpretation of BigBrain-MRI comparisons.

      Methods (P.7-8):

      “A prior study (Xiao et al., 2019) was able to further improve the accuracy of the transformation for subcortical structures and the hippocampus using a two-stage multi-contrast registration. The first stage involved nonlinear registration of BigBrainSym to a PD25 T1-T2 fusion atlas (Xiao et al., 2017, 2015), using manual segmentations of the basal ganglia, red nucleus, thalamus, amygdala, and hippocampus as additional shape priors. Notably, the PD25 T1-T2 fusion contrast is more similar to the BigBrainSym intensity contrast than a T1-weighted image. The second stage involved nonlinear registration of PD25 to ICBM2009sym and ICBM2009asym using multiple contrasts. The deformation fields were made available on Open Science Framework (https://osf.io/xkqb3/). The accuracy of the transformations was evaluated relative to overlap of region labels and alignment of anatomical fiducials (Lau et al., 2019). The two-stage procedure resulted in 0.86-0.97 Dice coefficients for region labels, improving upon direct overlap of BigBrainSym with ICBM2009sym (0.55-0.91 Dice) (Figure 2Aii, 2Aiv top). Transformed anatomical fiducials exhibited 1.77±1.25mm errors, on par with direct overlap of BigBrainSym with ICBM2009sym (1.83±1.47mm) (Figure 2Aiii, 2Aiv below). The maximum misregistration distance (BigBrainSym=6.36mm, Xiao=5.29mm) provides an approximation of the degree of uncertainty in the transformation. In line with this work, BigBrainWarp enables evaluation of novel deformation fields using anatomical fiducials and region labels (evaluate_warps.sh). The script accepts a nonlinear transformation file for registration of BigBrainSym to ICBM2009sym, or vice versa, and returns the Jacobian map, Dice coefficients for labelled regions and landmark misregistration distances for the anatomical fiducials.

      The unique morphology of BigBrain also presents challenges for surface-based transformations. Idiosyncratic gyrification of certain regions of BigBrain, especially the anterior cingulate, cause misregistration (Lewis et al., 2020). Additionally, the areal midline representation of BigBrain, following inflation to a sphere, is disproportionately smaller than standard surface templates, which is related to differences in surface area, in hemisphere separation methods, and in tessellation methods. To overcome these issues, ongoing work (Lewis et al., 2020) combines a specialised BigBrain surface mesh with multimodal surface matching [MSM; (Robinson et al., 2018, 2014)] to co-register BigBrain to standard surface templates. In the first step, the BigBrain surface meshes were re-tessellated as unstructured meshes with variable vertex density (Möbius and Kobbelt, 2010) to be more compatible with FreeSurfer generated meshes. Then, coarse-to-fine MSM registration was applied in three stages. An affine rotation was applied to the BigBrain sphere, with an additional “nudge” based on an anterior cingulate landmark. Next, nonlinear/discrete alignment using sulcal depth maps (emphasising global scale, Figure 2Biii), followed by nonlinear/discrete alignment using curvature maps (emphasising finer detail, Figure 2Biii). The higher- order MSM procedure that was implemented for BigBrain maximises concordance of these features while minimising surface deformations in a physically plausible manner, accounting for size and shape distortions (Figure 2Bi) (Knutsen et al., 2010; Robinson et al., 2018). This modified MSMsulc+curv pipeline improves the accuracy of transformed cortical maps (4.38±3.25mm), compared to a standard MSMsulc approach (8.02±7.53mm) (Figure 2Bii-iii) (Lewis et al., 2020).”

      Figure 2: Evaluating BigBrain-MRI transformations. A) Volume-based transformations i. Jacobian determinant of deformation field shown with a sagittal slice and stratified by lobe. Subcortical+ includes the shape priors (as described in Methods) and the + connotes hippocampus, which is allocortical. Lobe labels were defined based on assignment of CerebrA atlas labels (Manera et al., 2020) to each lobe. ii. Sagittal slices illustrate the overlap of native ICBM2009b and transformed subcortical+ labels. iii. Superior view of anatomical fiducials (Lau et al., 2019). iv. Violin plots show the DICE coefficient of regional overlap (ii) and landmark misregistration (iii) for the BigBrainSym and Xiao et al., approaches. Higher DICE coefficients shown improved registration of subcortical+ regions with Xiao et al., while distributions of landmark misregistration indicate similar performance for alignment of anatomical fiducials. B) Surface-based transformations. i. Inflated BigBrain surface projections and ridgeplots illustrate regional variation in the distortions of the mesh invoked by the modified MSMsulc+curv pipeline. ii. Eighteen anatomical landmarks shown on the inflated BigBrain surface (above) and inflated fsaverage (below). BigBrain landmarks were transformed to fsaverage using the modified MSMsulc+curv pipeline. Accuracy of the transformation was calculated on fsaverage as the geodesic distance between landmarks transformed from BigBrain and the native fsaverage landmarks. iii. Sulcal depth and curvature maps are shown on inflated BigBrain surface. Violin plots show the improved accuracy of the transformation using the modified MSMsulc+curv pipeline, compared to a standard MSMsulc approach.

      Discussion (P.18):

      “Cortical folding is variably associated with cytoarchitecture, however. The correspondence of morphology with cytoarchitectonic boundaries is stronger in primary sensory than association cortex (Fischl et al., 2008; Rajkowska and Goldman-Rakic, 1995a, 1995b). Incorporating more anatomical information in the alignment algorithm, such as intracortical myelin or connectivity, may benefit registration, as has been shown in neuroimaging (Orasanu et al., 2016; Robinson et al., 2018; Tardif et al., 2015). Overall, evaluating the accuracy of volume- and surface-based transformations is important for selecting the optimal procedure given a specific research question and to gauge the degree of uncertainty in a registration.”

      Discussion (P.19):

      “Despite all its promises, the singular nature of BigBrain currently prohibits replication and does not capture important inter-individual variation. While large-scale cytoarchitectural patterns are conserved across individuals, the position of areal boundaries relative to sulci vary, especially in association cortex (Amunts et al., 2020; Fischl et al., 2008; Zilles and Amunts, 2013) . This can affect interpretation of BigBrain-MRI comparisons. For instance, in tutorial 3, low predictive accuracy of functional communities by cytoarchitecture may be attributable to the subject- specific topographies, which are well established in functional imaging (Benkarim et al., 2020; Braga and Buckner, 2017; Gordon et al., 2017; Kong et al., 2019). Future studies should consider the influence of inter-subject variability in concert with the precision of transformations, as these two elements of uncertainty can impact our interpretations, especially at higher granularity.”

      Reviewer #2:

      This is a nice paper presenting a review of recent developments and research resulting from BigBrain and a tutorial guiding use of the BigBrainWarp toolbox. This toolbox supports registration to, and from, standard MRI volumetric and surface templates, together with mapping derived features between spaces. Examples include projecting histological gradients estimated from BigBrain onto fsaverage (and the ICMB2009 atlas) and projecting Yeo functional parcels onto the BigBrain atlas.

      The key strength of this paper is that it supports and expands on a comprehensive tutorial and docker support available from the website. The tutorials there go into even more detail (with accompanying bash scripts) of how to run the full pipelines detailed in the paper. The docker makes the tool very easy to install but I was also able to install from source. The tutorials are diverse examples of broad possible applications; as such the combined resource has the potential to be highly impactful.

      The minor weaknesses of the paper relate to its clarity and depth. Firstly, I found the motivations of the paper initially unclear from the abstract. I would recommend much more clearly stating that this is a review paper of recent research developments resulting from the BigBrain atlas, and a tutorial to accompany the bash scripts which apply the warps between spaces. The registration methodology is explained elsewhere.

      In the revised Abstract (P.1), we emphasise that the manuscript involves a review of recent literature, the introduction of BigBrainWarp, and easy-to-follow tutorials to demonstrate its utility.

      “Neuroimaging stands to benefit from emerging ultrahigh-resolution 3D histological atlases of the human brain; the first of which is “BigBrain”. Here, we review recent methodological advances for the integration of BigBrain with multi-modal neuroimaging and introduce a toolbox, “BigBrainWarp", that combines these developments. The aim of BigBrainWarp is to simplify workflows and support the adoption of best practices. This is accomplished with a simple wrapper function that allows users to easily map data between BigBrain and standard MRI spaces. The function automatically pulls specialised transformation procedures, based on ongoing research from a wide collaborative network of researchers. Additionally, the toolbox improves accessibility of histological information through dissemination of ready-to-use cytoarchitectural features. Finally, we demonstrate the utility of BigBrainWarp with three tutorials and discuss the potential of the toolbox to support multi-scale investigations of brain organisation.”

      I also found parts of the paper difficult to follow - as a methodologist without comprehensive neuroanatomical terminology, I would recommend the review of past work to be written in a more 'lay' way. In many cases, the figure captions also seemed insufficient at first. For example it was not immediately obvious to me what is meant by 'mesiotemporal confluence' and Fig 1G is not referenced specifically in the text. In Fig 3C it is not immediately clear from the text of the caption that the cortical image is representing the correlation from the plots - specifically since functional connectivity is itself estimated through correlation.

      In the updated manuscript, we have tried to remove neuroanatomical jargon and clearly define uncommon terms at the first instance in text. For example,

      “Evidence has been provided that cortical organisation goes beyond a segregation into areas. For example, large- scale gradients that span areas and cytoarchitectonic heterogeneity within a cortical area have been reported (Amunts and Zilles, 2015; Goulas et al., 2018; Wang, 2020). Such progress became feasible through integration of classical techniques with computational methods, supporting more observer-independent evaluation of architectonic principles (Amunts et al., 2020; Paquola et al., 2019; Schiffer et al., 2020; Spitzer et al., 2018). This paves the way for novel investigations of the cellular landscape of the brain.”

      “Using the proximal-distal axis of the hippocampus, we were able to bridge the isocortical and hippocampal surface models recapitulating the smooth confluence of cortical types in the mesiotemporal lobe, i.e. the mesiotemporal confluence (Figure 1G).”

      “Here, we illustrate how we can track resting-state functional connectivity changes along the latero-medial axis of the mesiotemporal lobe, from parahippocampal isocortex towards hippocampal allocortex, hereafter referred to as the iso-to-allocortical axis.”

      Additionally, we have expanded the captions for clarity. For example, Figure 3:

      “C) Intrinsic functional connectivity was calculated between each voxel of the iso-to-allocortical axis and 1000 isocortical parcels. For each parcel, we calculated the product-moment correlation (r) of rsFC strength with iso-to- allocortical axis position. Thus, positive values (red) indicate that rsFC of that isocortical parcel with the mesiotemporal lobe increases along the iso-to-allocortex axis, whereas negative values (blue) indicate decrease in rsFC along the iso-to-allocortex axis.”

      My minor concern is over the lack of details in relation to the registration pipelines. I understand these are either covered in previous papers or are probably destined for bespoke publications (in the case of the surface registration approach) but these details are important for readers to understand the constraints and limitations of the software. At this time, the details for the surface registration only relate to an OHBM poster and not a publication, which I was unable to find online until I went through the tutorial on the BigBrain website. In general I think a paper should have enough information on key techniques to stand alone without having to reference other publications, so, in my opinion, a high level review of these pipelines should be added here.

      There isn't enough details on the registration. For the surface, what features were used to drive alignment, how was it parameterised (in particular the regularisation - strain, pairwise or areal), how was it pre-processed prior to running MSM - all these details seem to be in the excellent poster. I appreciate that work deserves a stand alone publication but some details are required here for users to understand the challenges, constraints and limitations of the alignment. Similar high level details should be given for the registration work.

      We expanded descriptions of the registration strategies behind BigBrainWarp, especially so for the surface-based registration. Additionally, we created a new Figure to illustrate how the accuracy of the transformations may be evaluated.

      Methods (P.7-8):

      “For the initial BigBrain release (Amunts et al., 2013), full BigBrain volumes were resampled to ICBM2009sym (a symmetric MNI152 template) and MNI-ADNI (an older adult T1-weighted template) (Fonov et al., 2011). Registration of BigBrain to ICBM2009sym, known as BigBrainSym, involved a linear then a nonlinear transformation (available on ftp://bigbrain.loris.ca/BigBrainRelease.2015/). The nonlinear transformation was defined by a symmetric diffeomorphic optimiser [SyN algorithm, (Avants et al., 2008)] that maximised the cross- correlation of the BigBrain volume with inverted intensities and a population-averaged T1-weighted map in ICBM2009sym space. The Jacobian determinant of the deformation field illustrates the degree and direction of distortions on the BigBrain volume (Figure 2Ai top).

      A prior study (Xiao et al., 2019) was able to further improve the accuracy of the transformation for subcortical structures and the hippocampus using a two-stage multi-contrast registration. The first stage involved nonlinear registration of BigBrainSym to a PD25 T1-T2 fusion atlas (Xiao et al., 2017, 2015), using manual segmentations of the basal ganglia, red nucleus, thalamus, amygdala, and hippocampus as additional shape priors. Notably, the PD25 T1-T2 fusion contrast is more similar to the BigBrainSym intensity contrast than a T1-weighted image. The second stage involved nonlinear registration of PD25 to ICBM2009sym and ICBM2009asym using multiple contrasts. The deformation fields were made available on Open Science Framework (https://osf.io/xkqb3/). The accuracy of the transformations was evaluated relative to overlap of region labels and alignment of anatomical fiducials (Lau et al., 2019). The two-stage procedure resulted in 0.86-0.97 Dice coefficients for region labels, improving upon direct overlap of BigBrainSym with ICBM2009sym (0.55-0.91 Dice) (Figure 2Aii, 2Aiv top). Transformed anatomical fiducials exhibited 1.77±1.25mm errors, on par with direct overlap of BigBrainSym with ICBM2009sym (1.83±1.47mm) (Figure 2Aiii, 2Aiv below). The maximum misregistration distance (BigBrainSym=6.36mm, Xiao=5.29mm) provides an approximation of the degree of uncertainty in the transformation. In line with this work, BigBrainWarp enables evaluation of novel deformation fields using anatomical fiducials and region labels (evaluate_warps.sh). The script accepts a nonlinear transformation file for registration of BigBrainSym to ICBM2009sym, or vice versa, and returns the Jacobian map, DICE coefficients for labelled regions and landmark misregistration distances for the anatomical fiducials.

      The unique morphology of BigBrain also presents challenges for surface-based transformations. Idiosyncratic gyrification of certain regions of BigBrain, especially the anterior cingulate, cause misregistration (Lewis et al., 2020). Additionally, the areal midline representation of BigBrain, following inflation to a sphere, is disproportionately smaller than standard surface templates, which is related to differences in surface area, in hemisphere separation methods, and in tessellation methods. To overcome these issues, ongoing work (Lewis et al., 2020) combines a specialised BigBrain surface mesh with multimodal surface matching [MSM; (Robinson et al., 2018, 2014)] to co-register BigBrain to standard surface templates. In the first step, the BigBrain surface meshes were re-tessellated as unstructured meshes with variable vertex density (Möbius and Kobbelt, 2010) to be more compatible with FreeSurfer generated meshes. Then, coarse-to-fine MSM registration was applied in three stages. An affine rotation was applied to the BigBrain sphere, with an additional “nudge” based on an anterior cingulate landmark. Next, nonlinear/discrete alignment using sulcal depth maps (emphasising global scale, Figure 2Biii), followed by nonlinear/discrete alignment using curvature maps (emphasising finer detail, Figure 2Biii). The higher- order MSM procedure that was implemented for BigBrain maximises concordance of these features while minimising surface deformations in a physically plausible manner, accounting for size and shape distortions (Figure 2Bi) (Knutsen et al., 2010; Robinson et al., 2018). This modified MSMsulc+curv pipeline improves the accuracy of transformed cortical maps (4.38±3.25mm), compared to a standard MSMsulc approach (8.02±7.53mm) (Figure 2Bii-iii) (Lewis et al., 2020).”

      (SEE FIGURE 2 in Response to Reviewer #1)

      I would also recommend more guidance in terms of limitations relating to inter-subject variation. My interpretation of the results of tutorial 3, is that topographic variation of the cortex could easily be driving the greater variation of the frontal parietal networks. Either that, or the Yeo parcel has insufficient granularity; however, in that case any attempt to go to finer MRI driven parcellations - for example to the HCP parcellation, would create its own problems due to subject specific variability.

      We agree that inter-individual variation may contribute to the low predictive accuracy of functional communities by cytoarchitecture. We expanded upon this possibility in the revised Discussion (P. 19) and recommend that future studies examine the uncertainty of subject-specific topographies in concert with uncertainties of transformations.

      “These features depict the vast cytoarchitectural heterogeneity of the cortex and enable evaluation of homogeneity within imaging-based parcellations, for example macroscale functional communities (Yeo et al., 2011). The present analysis showed limited predictability of functional communities by cytoarchitectural profiles, even when accounting for uncertainty at the boundaries (Gordon et al., 2016). [...] Despite all its promises, the singular nature of BigBrain currently prohibits replication and does not capture important inter-individual variation. While large- scale cytoarchitectural patterns are conserved across individuals, the position of boundaries relative to sulci vary, especially in association cortex (Amunts et al., 2020; Fischl et al., 2008; Zilles and Amunts, 2013) . This can affect interpretation of BigBrain-MRI comparisons. For instance, in tutorial 3, low predictive accuracy of functional communities by cytoarchitecture may be attributable to the subject-specific topographies, which are well established in functional imaging (Benkarim et al., 2020; Braga and Buckner, 2017; Gordon et al., 2017; Kong et al., 2019). Future studies should consider the influence of inter-subject variability in concert with the precision of transformations, as these two elements of uncertainty can impact our interpretations, especially at higher granularity.”

      Reviewer #3:

      The authors make a point for the importance of considering high-resolution, cell-scale, histological knowledge for the analysis and interpretation of low-resolution MRI data. The manuscript describes the aims and relevance of the BigBrain project. The BigBrain is the whole brain of a single individual, sliced at 20µ and scanned at 1µ resolution. During the last years, a sustained work by the BigBrain team has led to the creation of a precise cell-scale, 3D reconstruction of this brain, together with manual and automatic segmentations of different structures. The manuscript introduces a new tool - BigBrainWarp - which consolidates several of the tools used to analyse BigBrain into a single, easy to use and well documented tool. This tool should make it easy for any researcher to use the wealth of information available in the BigBrain for the annotation of their own neuroimaging data. The authors provide three examples of utilisation of BigBrainWarp, and show the way in which this can provide additional insight for analysing and understanding neuroimaging data. The BigBrainWarp tool should have an important impact for neuroimaging research, helping bridge the multi-scale resolution gap, and providing a way for neuroimaging researchers to include cell-scale phenomena in their study of brain data. All data and code are available open source, open access.

      Main concern:

      One of the longstanding debates in the neuroimaging community concerns the relationship between brain geometry (in particular gyro/sulcal anatomy) and the cytoarchitectonic, connective and functional organisation of the brain. There are various examples of correspondance, but also many analyses showing its absence, particularly in associative cortex (for example, Fischl et al (2008) by some of the co-authors of the present manuscript). The manuscript emphasises the accuracy of their transformations to the different atlas spaces, which may give some readers a false impression. True: towards the end of the manuscript the authors briefly indicate the difficulty of having a single brain as source of histological data. I think, however, that the manuscript would benefit from making this point more clearly, providing the future users of BigBrainWarp with some conceptual elements and references that may help them properly apprise their results. In particular, it would be helpful to briefly describe which aspects of brain organisation where used to lead the deformation to the different templates, if they were only based on external anatomy, or if they took into account some other aspects such as myelination, thickness, …

      We agree with the Reviewer that the accuracy of the transformation and the potential influence of inter-individual variability should be carefully considered in BigBrain-MRI studies. To highlight these issues in the updated manuscript, we first conducted a quantitative analysis on the accuracy of transformations used in BigBrainWarp (new Figure 2). We provide a function (evaluate_warp.sh) for users to assess accuracy of novel deformation fields and encourage detailed inspection of accuracy estimates and deformation effects for region of interest studies. Second, we expanded our discussion of previous research on inter-individual variability and comment on the potential implications of unquantified inter-individual variability for the interpretation of BigBrain-MRI comparisons.

      Methods (P.7-8):

      “A prior study (Xiao et al., 2019) was able to further improve the accuracy of the transformation for subcortical structures and the hippocampus using a two-stage multi-contrast registration. The first stage involved nonlinear registration of BigBrainSym to a PD25 T1-T2 fusion atlas (Xiao et al., 2017, 2015), using manual segmentations of the basal ganglia, red nucleus, thalamus, amygdala, and hippocampus as additional shape priors. Notably, the PD25 T1-T2 fusion contrast is more similar to the BigBrainSym intensity contrast than a T1-weighted image. The second stage involved nonlinear registration of PD25 to ICBM2009sym and ICBM2009asym using multiple contrasts. The deformation fields were made available on Open Science Framework (https://osf.io/xkqb3/). The accuracy of the transformations was evaluated relative to overlap of region labels and alignment of anatomical fiducials (Lau et al., 2019). The two-stage procedure resulted in 0.86-0.97 Dice coefficients for region labels, improving upon direct overlap of BigBrainSym with ICBM2009sym (0.55-0.91 Dice) (Figure 2Aii, 2Aiv top). Transformed anatomical fiducials exhibited 1.77±1.25mm errors, on par with direct overlap of BigBrainSym with ICBM2009sym (1.83±1.47mm) (Figure 2Aiii, 2Aiv below). The maximum misregistration distance (BigBrainSym=6.36mm, Xiao=5.29mm) provides an approximation of the degree of uncertainty in the transformation. In line with this work, BigBrainWarp enables evaluation of novel deformation fields using anatomical fiducials and region labels (evaluate_warps.sh). The script accepts a nonlinear transformation file for registration of BigBrainSym to ICBM2009sym, or vice versa, and returns the Jacobian map, Dice coefficients for labelled regions and landmark misregistration distances for the anatomical fiducials.

      The unique morphology of BigBrain also presents challenges for surface-based transformations. Idiosyncratic gyrification of certain regions of BigBrain, especially the anterior cingulate, cause misregistration (Lewis et al., 2020). Additionally, the areal midline representation of BigBrain, following inflation to a sphere, is disproportionately smaller than standard surface templates, which is related to differences in surface area, in hemisphere separation methods, and in tessellation methods. To overcome these issues, ongoing work (Lewis et al., 2020) combines a specialised BigBrain surface mesh with multimodal surface matching [MSM; (Robinson et al., 2018, 2014)] to co-register BigBrain to standard surface templates. In the first step, the BigBrain surface meshes were re-tessellated as unstructured meshes with variable vertex density (Möbius and Kobbelt, 2010) to be more compatible with FreeSurfer generated meshes. Then, coarse-to-fine MSM registration was applied in three stages. An affine rotation was applied to the BigBrain sphere, with an additional “nudge” based on an anterior cingulate landmark. Next, nonlinear/discrete alignment using sulcal depth maps (emphasising global scale, Figure 2Biii), followed by nonlinear/discrete alignment using curvature maps (emphasising finer detail, Figure 2Biii). The higher- order MSM procedure that was implemented for BigBrain maximises concordance of these features while minimising surface deformations in a physically plausible manner, accounting for size and shape distortions (Figure 2Bi) (Knutsen et al., 2010; Robinson et al., 2018). This modified MSMsulc+curv pipeline improves the accuracy of transformed cortical maps (4.38±3.25mm), compared to a standard MSMsulc approach (8.02±7.53mm) (Figure 2Bii-iii) (Lewis et al., 2020).”

      (SEE Figure 2 in response to previous reviewers)

      Discussion (P.18, 19):

      “Cortical folding is variably associated with cytoarchitecture, however. The correspondence of morphology with cytoarchitectonic boundaries is stronger in primary sensory than association cortex (Fischl et al., 2008; Rajkowska and Goldman-Rakic, 1995a, 1995b). Incorporating more anatomical information in the alignment algorithm, such as intracortical myelin or connectivity, may benefit registration, as has been shown in neuroimaging (Orasanu et al., 2016; Robinson et al., 2018; Tardif et al., 2015). Overall, evaluating the accuracy of volume- and surface-based transformations is important for selecting the optimal procedure given a specific research question and to gauge the degree of uncertainty in a registration.”

      “Despite all its promises, the singular nature of BigBrain currently prohibits replication and does not capture important inter-individual variation. While large-scale cytoarchitectural patterns are conserved across individuals, the position of boundaries relative to sulci vary, especially in association cortex (Amunts et al., 2020; Fischl et al., 2008; Zilles and Amunts, 2013) . This can have implications on interpretation of BigBrain-MRI comparisons. For instance, in tutorial 3, low predictive accuracy of functional communities by cytoarchitecture may be attributable to the subject-specific topographies, which are well established in functional imaging (Benkarim et al., 2020; Braga and Buckner, 2017; Gordon et al., 2017; Kong et al., 2019). Future studies should consider the influence of inter- subject variability in concert with the precision of transformations, as these two elements of uncertainty can impact our interpretations, especially at higher granularity.”

      Minor:

      1) In the abstract and later in p9 the authors talk about "state-of-the-art" non-linear deformation matrices. This may be confusing for some readers. To me, in brain imaging a matrix is most often a 4x4 affine matrix describing a linear transformation. However, the authors seem to be describing a more complex, non-linear deformation field. Whereas building a deformation matrix (4x4 affine) is not a big challenge, I agree that more sophisticated tools should provide more sophisticated deformation fields. The authors may consider using "deformation field" instead of "deformation matrix", but I leave that to their judgment.

      As suggested, we changed the text to “deformation field” where relevant.

      2) In the results section, p11, the authors highlight the challenge of segmenting thalamic nuclei or different hippocampal regions, and suggest that this should be simplified by the use of the histological BigBrain data. However, the atlases currently provided in the OSF project do not include these more refined parcellation: there's one single "Thalamus" label, and one single "Hippocampus" label (not really single: left and right). This could be explicitly stated to prevent readers from having too high expectations (although I am certain that those finer parcellations should come in the very close future).

      We updated the text to reflect the current state of such parcellations. While subthalamic nuclei are not yet segmented (to our knowledge), one of the present authors has segmented hippocampal subfields (https://osf.io/bqus3/) and we highlight this in the Results (P.11-12):

      “Despite MRI acquisitions at high and ultra-high fields reaching submillimeter resolutions with ongoing technical advances, certain brain structures and subregions remain difficult to identify (Kulaga-Yoskovitz et al., 2015; Wisse et al., 2017; Yushkevich et al., 2015). For example, there are challenges in reliably defining the subthalamic nucleus (not yet released for BigBrain) or hippocampal Cornu Ammonis subfields [manual segmentation available on BigBrain, https://osf.io/bqus3/, (DeKraker et al., 2019)]. BigBrain-defined labels can be transformed to a standard imaging space for further investigation. Thus, this approach can support exploration of the functional architecture of histologically-defined regions of interest.”

    1. Author Response:

      Reviewer #1 (Public Review):

      Overall, the authors have done a nice job covering the relevant literature, presenting a story out of complicated data, and performing many thoughtful analyses.

      However, I believe the paper requires quite major revisions.

      We thank the reviewer for their encouraging assessment of our manuscript. We are grateful for their valuable and especially detailed feedback that helped us to substantially improve our manuscript.

      Major issues:

      I do not believe the current results present a clear, comprehensible story about sleep and motor memory consolidation. As presented, sleep predicts an increase in the subsequent learning curve, but there is a negative relationship between learning curve and task proficiency change (which is, as far as I can tell, similar to "memory retention"). This makes it seem as if sleep predicts more forgetting on initial trials within the subsequent block (or worse memory retention) - is this true? Regardless of whether it is statistically true, there appears another story in these data that is being sacrificed to fit a story about sleep. To my eye, the results may first and foremost tell a circadian (rather than sleep) story. Examining the data in Figure 2A and 2B, it appears that every AM learning period has a higher learning curve (slope) than every PM period. While this could, of course, be due to having just slept, the main story gleaned from such a result is not a sleep effect on retention, which has been the emphasis on motor memory consolidation research in the last couple of decades, but on new learning. The fact that this effect appears present in the first session (juggling blocks 1-3 in adolescents and blocks 1-5 in adults) makes this seem the more likely story here, since it has less to do with "preparing one to re-learn" and more to do with just learning and when that learning is optimal. But even if it does not reach statistical significance in the first session alone, it remains a concern and, in my opinion, should be considered a focus in the manuscript unless the authors can devise a reason to definitively rule it out.

      Here is how I recommend the authors proceed on this point: include all sessions from all subjects into a mixed effect model, predicting the slope of the learning curve with time of day and age group as fixed effects and subjects as random effects:

      learning curve slope ~ AM/PM [AM (0) or PM (1)] + age [adolescent (0) or adult (1)] + (1|subject)

      …or something similar with other regressors of interest. If this is significant for AM/PM status, they should re-try the analysis using only the first session. If this is significant, then a sleep-centric story cannot be defended here at all, in my opinion. If it is not (which could simply result from low power, but the authors could decide this), the authors should decide if they think they can rule out circadian effects and proceed accordingly. I should note that, while to many, a sleep story would be more interesting or compelling, that is not my opinion, and I would not solely opt to reject this paper if it centered a time-of-day story instead.

      The authors need to work out precisely what is happening in the behavior here, and let the physiology follow that story. They should allow themselves to consider very major revisions (and drop the physiology) if that is most consistent with the data. As presented, I am very unclear of what to take away from the study.

      We thank the reviewer for the opportunity to further elaborate on our behavioral results. We agree that the interpretation of the behavior in the complex gross-motor task is not straight forward, which might be partly due to less controllability compared to for example finger-tapping tasks. The reviewer is correct that, initially sleep seems to predict more forgetting on initial trials within the subsequent block given the dip in task proficiency and a resulting increase in steepness of the learning curve after the sleep retention interval. Notably, this dip in performance after sleep has also been reported for finger-tapping tasks (cf. Eichenlaub et al, 2020). The performance dip is also present in the wake first group (Figure 2) after the first interval. This observation suggests that picking up the task again after a period of time comes at a cost. Interestingly, this performance dip is no longer present after the second retention interval indicating that the better the task proficiency the easier it is to pick up juggling again. In other words, juggling has been better consolidated after additional training. Critically, our results show, that participants with higher SO-spindle coupling strength have a lower dip in performance after the retention interval, thus indicating a learning advantage.

      Figure 2

      (A) Number of successful three-ball cascades (mean ± standard error of the mean [SEM]) of adolescents (circles) for the sleep-first (blue) and wake-first group (green) per juggling block. Grand average learning curve (black lines) as computed in (C) are superimposed. Dashed lines indicate the timing of the respective retention intervals that separate the three performance tests. Note that adolescents improve their juggling performance across the blocks. (B) Same conventions as in (A) but for adults (diamonds). Similar to adolescents, adults improve their juggling performance across the blocks regardless of group.

      We discuss the sleep effect on juggling in the discussion section (page 22 – 23, lines 502 – 514):

      "How relevant is sleep for real-life gross-motor memory consolidation? We found that sleep impacts the learning curve but did not affect task proficiency in comparison to a wake retention interval (Figure 2DE). Two accounts might explain the absence of a sleep effect on task proficiency. (1) Sleep rather stabilizes than improves gross-motor memory, which is in line with previous gross-motor adaption studies (Bothe et al, 2019; Bothe et al, 2020). (2) Pre-sleep performance is critical for sleep to improve motor skills (Wilhelm et al, 2012). Participants commonly reach asymptotic pre-sleep performance levels in finger tapping tasks, which is most frequently used to probe sleep effects on motor memory. Here we found that using a complex juggling task, participants do not reach asymptotic ceiling performance levels in such a short time. Indeed, the learning progression for the sleep-first and wake-first groups followed a similar trend (Figure 2AB), suggesting that more training and not in particular sleep drove performance gains."

      If indeed the authors keep the sleep aspect of this story, here are some comments regarding the physiology. The authors present several nice analyses in Figure 3. However, given the lack of behavioral difference between adolescents and adults (Fig 2D), they combine the groups when investigating behavior-physiology relationships. In some ways, then, Figure 3 has extraneous details to the point of motor learning and retention, and I believe the paper would benefit from more focus. If the authors keep their sleep story, I believe Figure 3 and 4 should be combined and some current figure panels in Figure 3 should be removed or moved to the supplementary information.

      We thank the reviewers for their suggestion and we agree that the figures of our manuscript would benefit from more focus. Therefore, we combined Figure 3 and 4 from the original manuscript into a revised Figure 3 in the updated version of the manuscript. In more detail, subpanels that explain our methodological approach can now be found in Figure 3 – figure supplement 1, while the updated Figure 3 now focuses on developmental changes in oscillatory dynamics and SO-spindle coupling strength as well as their relationship to gross-motor learning.

      Updated Figure 3:

      (A) Left: topographical distribution of the 1/f corrected SO and spindle amplitude as extracted from the oscillatory residual (Figure 3 – figure supplement 1A, right). Note that adolescents and adults both display the expected topographical distribution of more pronounced frontal SO and centro-parietal spindles. Right: single subject data of the oscillatory residual for all subjects with sleep data color coded by age (darker colors indicate older subjects). SO and spindle frequency ranges are indicated by the dashed boxes. Importantly, subjects displayed high inter-individual variability in the sleep spindle range and a gradual spindle frequency increase by age that is critically underestimated by the group average of the oscillatory residuals (Figure 3 – figure supplement 1A, right). (B) Spindle peak locked epoch (NREM3, co-occurrence corrected) grand averages (mean ± SEM) for adolescents (red) and adults (black). Inset depicts the corresponding SO-filtered (2 Hz lowpass) signal. Grey-shaded areas indicate significant clusters. Note, we found no difference in amplitude after normalization. Significant differences are due to more precise SO-spindle coupling in adults. (C) Top: comparison of SO-spindle coupling strength between adolescents and adults. Adults displayed more precise coupling than adolescents in a centro-parietal cluster. T-scores are transformed to z-scores. Asterisks denote cluster-corrected two-sided p < 0.05. Bottom: Exemplary depiction of coupling strength (mean ± SEM) for adolescents (red) and adults (black) with single subject data points. Exemplary single electrode data (bottom) is shown for C4 instead of Cz to visualize the difference. (D) Cluster-corrected correlations between individual coupling strength and overnight task proficiency change (post – pre retention) for adolescents (red, circle) and adults (black, diamond) of the sleep-first group (left, data at C4). Asterisks indicate cluster-corrected two-sided p < 0.05. Grey-shaded area indicates 95% confidence intervals of the trend line. Participants with a more precise SO-spindle coordination show improved task proficiency after sleep. Note that the change in task proficiency was inversely related to the change in learning curve (cf. Figure 2D), indicating that a stronger improvement in task proficiency related to a flattening of the learning curve. Further note that the significant cluster formed over electrodes close to motor areas. (E) Cluster-corrected correlations between individual coupling strength and overnight learning curve change. Same conventions as in (D). Participants with more precise SO-spindle coupling over C4 showed attenuated learning curves after sleep.

      and

      Figure 3 - figure supplement 1

      (A) Left: Z-normalized EEG power spectra (mean ± SEM) for adolescents (red) and adults (black) during NREM sleep in semi-log space. Data is displayed for the representative electrode Cz unless specified otherwise. Note the overall power difference between adolescents and adults due to a broadband shift on the y-axis. Straight black line denotes cluster-corrected significant differences. Middle: 1/f fractal component that underlies the broadband shift. Right: Oscillatory residual after subtracting the fractal component (A, middle) from the power spectrum (A, left). Both groups show clear delineated peaks in the SO (< 2 Hz) and spindle range (11 – 16 Hz) establishing the presence of the cardinal sleep oscillations in the signal. (B) Top: Spindle frequency peak development based on the oscillatory residuals. Spindle frequency is faster at all but occipital electrodes in adults than in adolescents. T-scores are transformed to z-scores. Asterisks denote cluster-corrected two-sided p < 0.05. Bottom: Exemplary depiction of the spindle frequency (mean ± SEM) for adolescents (red) and adults (black) with single subject data points at Cz. (C) SO-spindle co-occurrence rate (mean ± SEM) for adolescents (red) and adults (black) during NREM2 and NREM3 sleep. Event co-occurrence is higher in NREM3 (F(1, 51) = 1209.09, p < 0.001, partial eta² = 0.96) as well as in adults (F(1, 51) = 11.35, p = 0.001, partial eta² = 0.18). (D) Histogram of co-occurring SO-spindle events in NREM2 (blue) and NREM3 (purple) collapsed across all subjects and electrodes. Note the low co-occurring event count in NREM2 sleep. (E) Single subject (top) and group averages (bottom, mean ± SEM) for adolescents (red) and adults (black) of individually detected, for SO co-occurrence-corrected sleep spindles in NREM3. Spindles were detected based on the information of the oscillatory residual. Note the underlying SO-component (grey) in the spindle detection for single subject data and group averages indicating a spindle amplitude modulation depending on SO-phase. (F) Grand average time frequency plots (-2 to -1.5s baseline-corrected) of SO-trough-locked segments (corrected for spindle co-occurrence) in NREM3 for adolescents (left) and adults (right). Schematic SO is plotted superimposed in grey. Note the alternating power pattern in the spindle frequency range, showing that SO-phase modulates spindle activity in both age groups.

      Why did the authors use Spearman rather than Pearson correlations in Figure 4? Was it to reduce the influence of the outlier subject? They should minimally clarify and justify this, since it is less conventional in this line of research. And it would be useful to know if the relationship is significant with Pearson correlations when robust regression is applied. I see the authors are using MATLAB, and the robustfit toolbox (https://www.mathworks.com/help/stats/robustfit.html) is a simple way to address this issue.

      We thank the reviewers for their suggestion. We agree that when inspecting the scatter plots it looks like that the correlations could be severely influenced by two outliers in the adult group. Because this is an important matter, we recalculated all previously reported correlations without the two outliers (Figure R4, left column) and followed the reviewer’s suggestion to also compute robust regression (Figure R4, right column) and found no substantial deviation from our original results.

      In more detail, increase in task proficiency resulted in flattening of the learning curve when removing outliers (Figure R4A, rhos = -0.70, p < 0.001) and when applying robust regression analysis (Figure R4B, b = -0.30, t(67) = -10.89, rho = -0.80, p < 0.001). Likewise, higher coupling strength still predicted better task proficiency (mean rho = 0.35, p = 0.029, cluster-corrected) and flatter learning curves after sleep (rho = -0.44, p = 0.047, cluster-corrected) when removing the outliers (Figure R4CE) and when calculating robust regression (Figure R4DF, task proficiency: b = 82.32, t(40) = 3.12, rho = 0.45, p = 0.003; learning curve: b = -26.84, t(40) = -2.96, rho = -0.43, p = 0.005). Furthermore, we calculated spearman rank correlations and cluster-corrected spearman rank correlations in our original manuscript, to mitigate the impact of outliers, even though Pearson correlations are more widely used in the field. Therefore, we still report spearman rank correlations for single electrodes instead of robust correlations as it is more consistent with the cluster-correlation analyses.

      We now use robust trend lines instead of linear trend lines in our scatter plots. Further, we added the correlations without outliers (Figure R4ACE) to the supplements as Figure 2 – figure supplement 1D and Figure 3 – figure supplement 2 FG. These additional analyses are now reported in the results section of the revised manuscript (page 9, lines 186 – 191):

      "[…] we confirmed a strong negative correlation between the change (post retention values – pre retention values) in task proficiency and the change in learning curve after the retention interval (Figure 2F; rhos = -0.71, p < 0.001), which also remained strong after outlier removal (Figure 2 – figure supplement 1D). This result indicates that participants who consolidate their juggling performance after a retention interval show slower gains in performance."

      And (page 16, lines 343 – 346):

      "[…] Furthermore, our results remained consistent when including coupled spindle events in NREM2 (Figure 3 – figure supplement 2E) and after outlier removal (Figure 3 – figure supplement 2FG)."

      Furthermore, we now state that we specifically utilized spearman rank correlations to mitigate the impact of outliers in our analyses in the method section (page 35, lines 808 – 813)::

      "For correlational analyses we utilized spearman rank correlations (rhos; Figure 2F & Figure 3DE) to mitigate the impact of possible outliers as well as cluster-corrected spearman rank correlations by transforming the correlation coefficients to t-values (p < 0.05) and clustering in the space domain (Figure 3DE). Linear trend lines were calculated using robust regression."

      Figure R4

      (A) Spearman rank correlation between task proficiency change and learning curve change collapsed across adolescents (red dot) and adults (black diamonds) after removing two outlier subjects in the adult age group. Grey-shaded area indicates 95% confidence intervals of the robust trend line. (B) Robust regression of task proficiency change and learning curve change of the original sample. (C) Cluster-corrected correlations (right) between individual coupling strength and overnight task proficiency change (post – pre retention) after outlier removal (left, spearman correlation at C4, uncorrected). Asterisks indicate cluster-corrected two-sided p < 0.05. (D) Robust regression of coupling strength at C4 and task proficiency of the original sample. (E) Same conventions as in (C) but for overnight learning curve change. (F) Same conventions as in (D) but for overnight learning curve change.

      Additionally, with only a single night of recording data, it is impossible to disentangle possible trait-based sleep characteristics (e.g., Subject 1 has high SO-spindle coupling in general and retains motor memories well, but these are independent of each other) from a specific, state-based account (e.g., Subject 1's high SO-spindle coupling on night 1 specifically led to their improved retention or change in learning, etc., and this is unrelated to their general SO-spindle coupling or motor performance abilities). Clearly, many studies face this limitation, but this should be acknowledged.

      We thank the reviewers for their important remark. We agree that it is impossible to make a sound statement about whether our reported correlations represent trait- or state-based aspects of the sleep and learning relationship with the data that we have reported in the manuscript. However, while we are lacking a proper baseline condition without any task engagement, we still recorded polysomnography for all subjects during an adaptation night. Given the expected pronounced differences in sleep architecture between the adaptation nights and learning nights (see Table R3 for an overview collapsed across both age groups), we initially refrained from entering data from the adaptation nights into our original analyses, but we now fully report the data below. Note that the differences are driven by the adaptation night, where subjects first have to adjust to sleeping with attached EEG electrodes in a sleep laboratory.

      Table R3. Sleep architecture (mean ± standard deviation) for the adaptation and learning night collapsed across both age groups. Nights were compared using paired t-tests

      To further clarify whether subjects with high coupling strength have a motor learning advantage (i.e. trait-effect) or a learning induced enhancement of coupling strength is indicative for improved overnight memory change (i.e. state-effect), we ran additional analyses using the data from the adaptation night. Note that the coupling strength metric was not impacted by differences in event number and our correlations with behavior were not influenced by sleep architecture (please refer to our answer of issue #7 for the results).Therefore, we considered it appropriate to also utilize data from the adaptation night.

      First, we correlated SO-spindle coupling strength obtained from the adaptation night with the coupling strength in the learning night. We found that overall, coupling strength is highly correlated between the two measurements (mean rho across all channels = 0.55, Figure R5A), supporting the notion that coupling strength remains rather stable within the individual (i.e. trait), similar to what has been reported about the stable nature of sleep spindles as a “neural finger-print” (De Gennaro & Ferrara, 2003; De Gennaro et al, 2005; Purcell et al, 2017).

      To investigate a possible state-effect for coupling strength and motor learning, we calculated the difference in coupling strength between the two nights (learning night – adaptation night) and correlated these values with the overnight change in task proficiency and learning curve. We identified no significant correlations with a learning induced coupling strength change; neither for task proficiency nor learning curve change (Figure R5B). Note that there was a positive correlation of coupling strength change with overnight task proficiency change at Cz (Figure R5B, left), however it did not survive cluster-corrected correlational analysis (rhos = 0.34, p = 0.15). Combined, these results favor the conclusion that our correlations between coupling strength and learning rather reflect a trait-like relationship than a state-like relationship. This is in line with the interpretation of our previous studies that SO-spindle coupling strength reflects the efficiency and integrity of the neuronal pathway between neocortex and hippocampus that is paramount for memory networks and the information transfer during sleep (Hahn et al, 2020; Helfrich et al, 2019; Helfrich et al, 2018; Winer et al, 2019). For a comprehensive review please see Helfrich et al (2021), which argued that SO-spindle coupling predicts the integrity of memory pathways and therefore correlates with various metrics of behavioral performance or structural integrity.

      Figure R5

      (A) Topographical plot of spearman rank correlations of coupling strength in the adaptation night and learning night across all subjects. Overall coupling strength was highly correlated between the two measurements. (B) Cluster-corrected correlation between learning induced coupling strength changes (learning night – adaptation night) and overnight change in task proficiency (left) as well as learning curve (right). We found no significant clusters, although correlations showed similar trends as our original analyses, with more learning induced changes in coupling strength resulting in better overnight task proficiency and flattened learning curves.

      We have now added the additional state-trait analyses (Figure R5) to the updated manuscript as Figure 3 – figure supplement 2HI and report them in the results section (page 17, lines 361 – 375):

      "Finally, we investigated whether subjects with high coupling strength have a gross-motor learning advantage (i.e. trait-effect) or a learning induced enhancement of coupling strength is indicative for improved overnight memory change (i.e. state-effect). First, we correlated SO-spindle coupling strength obtained from the adaptation night with the coupling strength in the learning night. We found that overall, coupling strength is highly correlated between the two measurements (mean rho across all channels = 0.55, Figure 3 – figure supplement 2H), supporting the notion that coupling strength remains rather stable within the individual (i.e. trait). Second, we calculated the difference in coupling strength between the learning night and the adaptation night to investigate a possible state-effect. We found no significant cluster-corrected correlations between coupling strength change and task proficiency- as well as learning curve change (Figure 3 – figure supplement 2I).

      Collectively, these results indicate the regionally specific SO-spindle coupling over central EEG sensors encompassing sensorimotor areas precisely indexes learning of a challenging motor task."

      We further refer to these new results in the discussion section (page 23, lines 521 – 528):

      "Moreover, we found that SO-spindle coupling strength remains remarkably stable between two nights, which also explains why a learning-induced change in coupling strength did not relate to behavior (Figure 3 – figure supplement 2I). Thus, our results primarily suggest that strength of SO-spindle coupling correlates with the ability to learn (trait), but does not solely convey the recently learned information. This set of findings is in line with recent ideas that strong coupling indexes individuals with highly efficient subcortical-cortical network communication (Helfrich et al, 2021)."

      Additionally, we now provide descriptive data of the adaptation and learning night (Table R3) in the Supplementary file – table 1 and explicitly mention the adaptation night in the results section, which was previously only mentioned in the method section(page 6, lines 101 – 105):.

      "Polysomnography (PSG) was recorded during an adaptation night and during the respective sleep retention interval (i.e. learning night) except for the adult wake-first group (for sleep architecture descriptive parameters of the adaptation night and learning night as well as for adolescents and adults see Supplementary file – table 1 & 2)."

      Reviewer #2 (Public Review):

      In this study Hahn and colleagues investigate the role of Slow-oscillation spindle coupling for motor memory consolidation and the impact of brain maturation on these interactions. The authors employed a real-life gross-motor task, where adolescents and adults learned to juggle. They demonstrate that during post-learning sleep SO-spindles are stronger coupled in adults as compared to adolescents. The authors further show, that the strength of SO-spindle coupling correlates with overnight changes in the learning curve and task proficiency, indicating a role of SO-spindle coupling in motor memory consolidation.

      Overall, the topic and the results of the present study are interesting and timely. The authors employed state of the art analyse carefully taking the general variability of oscillatory features into account. It also has to be acknowledged that the authors moved away from using rather artificial lab-tasks to study the consolidation of motor memories (as it is standard in the field), adding ecological validity to their findings. However, some features of their analyses need further clarification.

      We thank the reviewer for their positive assessment of our manuscript. Incorporating the encouraging and helpful feedback, we believe that we substantially improved the clarity and robustness of our analyses.

      1) Supporting and extending previous work of the authors (Hahn et al, 2020), SO-spindle coupling over centro-parietal areas was stronger in adults as compared to adolescents. Despite these differences in the EEG results the authors collapsed the data of adults and adolescents for their correlational analyses (Fig. 4a and 4b). Why would the authors think that this procedure is viable (also given the fact that different EEG systems were used to record the data)?

      We thank the reviewers for the opportunity to clarify why we think it is viable to collapse the data of adolescents and adults for our correlational analyses. In the following we split our answers based on the two points raised by the reviewers: (1) electrophysiological differences (i.e. coupling strength) between the groups and (2) potential signal differences due to different EEG systems.

      1. Electrophysiological differences

      Upon inspecting the original Figure 4, it is apparent that the coupling strength of the combined sample does not form isolated clusters for each age group. In other words, while adult coupling strength is on the higher and adolescent coupling on the lower end due to the developmental increase in coupling strength we reported in the original Figure 3F, both samples overlap forming a linear trend. Second, when running the correlational analyses between coupling strength and task proficiency as well as learning curve separately for each age group, we found that they follow the same direction (Figure R3). Adolescents with higher coupling strength show better task proficiency (Figure R3A, rhos = 0.66, p = 0.005). This effect was also present when using robust regression (b = 109.97, t(15)=3.13, rho = 0.63, p = 0.007). Like adolescents, adults with higher coupling strength at C4 displayed better task proficiency after sleep (Figure R3B, rhos = 0.39, p = 0.053). This relationship was stronger when using robust regression (b = 151.36, t(23)=3.17, rho =0.56, p = 0.004). For learning curves, we found the expected negative correlation at C4 for adolescents (Figure R3C, rhos = -0.57, p = 0.020) and adults (Figure R3D, rhos = -0.44, p = 0.031). Results were comparable when using robust regression (adolescents: b = -59.58, t(15) = -2.94, rho = -0.60, p = 0.010; adults: b = -21.99, t(23 )= -1.71, rho = -0.37, p = 0.101).

      Taken together, these results demonstrate that adolescents and adults show the effects and the same direction at the same electrode, thus, making it highly unlikely that our results are just by chance and that our initial correlation analyses are just driven by one group.

      Additionally, we already controlled for age in our original analyses using partial correlations (also refer to our answer to issue #6). Hence, our additional analyses provide additional support that it is viable to collapse the analyses across both age groups even though they differ in coupling strength.

      1. Different EEG-systems

        The reviewers also raise the question whether our analyses might be impacted by the different EEG systems we used to record our data. This is an important concern especially when considering that cross-frequency coupling analyses can be severely confounded by differences in signal properties (Aru et al, 2015). In our sample, the strongest impact factor on signal properties is most likely age, given the broadband power differences in the power spectrum we found between the groups (original Figure 3A). Importantly, we also found a similar systematic power difference in our longitudinal study using the same ambulatory EEG system for both data recordings (Hahn et al, 2020). This is in line with numerous other studies demonstrating age related EEG power changes in broadband- as well as SO and sleep spindle frequency ranges (Campbell & Feinberg, 2016; Feinberg & Campbell, 2013; Helfrich et al, 2018; Kurth et al, 2010; Muehlroth et al, 2019; Muehlroth & Werkle-Bergner, 2020; Purcell et al, 2017). Therefore, we already had to take differences in signal property into account for our cross-frequency analyses. Regardless whether the underlying cause is an age difference or different signal-to-noise ratios of different EEG systems.

      To mitigate confounds in the signal, we used a data-driven and individualized approach detecting SO and sleep spindle events based on individualized frequency bands and a 75-percentile amplitude criterion relative to the underlying signal. Additionally we z-normalized all spindle events prior to the cross-frequency coupling analyses (Figure R3E). We found no amplitude differences around the spindle peak (point of SO-phase readout) between adolescents that were recorded with an ambulatory amplifier system (alphatrace) and adults that were recorded with a stationary amplifier system (neuroscan) using cluster-based random permutation testing. This was also the case for the SO-filtered (< 2 Hz) signal (Figure R3E, inset). Critically, the significant differences in amplitude from -1.4 to -0.8 s (p = 0.023, d = -0.73) and 0.4 to 1.5 s (p < 0.001, d = 1.1) are not caused by age related differences in power or different EEG-systems but instead by the increased coupling strength (i.e. higher coupling precision of spindles to SOs) in adults giving rise to a more pronounced SO-wave shape when averaging across spindle peak locked epochs.

      Consequently, our analysis pipeline already controlled for possible differences in signal property introduced through different amplifier systems. Nonetheless, we also wanted to directly compare the signal-to-noise ratio of the ambulatory and stationary amplifier systems. However, we only obtained data from both amplifier systems in the adult sleep first group, because we recorded EEG during the juggling learning phase with the ambulatory system in addition to the PSG with the stationary system. First, we computed the power spectra in the 1 to 49 Hz frequency range during the juggling learning phase (ambulatory) and during quiet wakefulness (stationary) for every subject in the adult sleep first group in 10-seconds segments. Next, we computed the signal-to-noise ratio (mean/standard deviation) of the power spectra per frequency across all segments. We only found a small negative cluster from 21.9 to 22.5 Hz (p = 0.042, d = 0.53; Figure R3F), which did not pertain our frequency-bands of interest. Critically, the signal-to-noise ratio of both amplifiers converged in the upper frequency bands approaching the noise floor, therefore, strongly supporting the notion that both systems in fact provided highly comparable estimates.

      In conclusion, both age groups display highly similar effects and direction when correlating coupling strength with behavior. Further, after individualization and normalization the analytical signal, we found no differences in signal properties that would confound the cross-frequency analysis. Lastly, we did not find systematic differences in signal-to-noise ratio between the different EEG-systems. Thus, we believe it is justified to collapse the data across all participants for the correlational analyses, as it combines both, the developmental aspect of enhanced coupling precision from adolescence to adulthood and the behavioral relevance for motor learning which we deem a critical research advance from our previous study.

      Figure R3

      (A) Cluster-corrected correlations (right) between individual coupling strength and overnight task proficiency change (post – pre retention) for adolescents of the sleep-first group (left, spearman correlation at C4, uncorrected). Asterisks indicate cluster-corrected two-sided p < 0.05. Grey-shaded area indicates 95% confidence intervals of the robust trend line. Participants with a more precise SO-spindle coordination show improved task proficiency after sleep. (B) Cluster-corrected correlation of coupling strength and overnight task proficiency change) for adults. Same conventions as in (A). Similar trend of higher coupling strength predicting better task proficiency after sleep (C) Cluster-corrected correlation of coupling strength and overnight learning curve change for adolescents. Same conventions as in (A). Higher coupling strength related to a flatter learning curve after sleep. (D) Cluster-corrected correlation of coupling strength and overnight learning curve change for adults. Same conventions as in (A). Higher coupling strength related to a flatter learning curve after sleep. (E) Spindle peak locked epoch (NREM3, co-occurrence corrected) grand averages (mean ± SEM) for adolescents (red) and adults (black). Inset depicts the corresponding SO-filtered (2 Hz lowpass) signal. Black lines indicate significant clusters. Note, we found no difference in amplitude after normalization. Significant differences are due to more precise SO-spindle coupling in adults. Spindle frequency is blurred due to individualized spindle detection. (F) Signal-to-noise ratio for the stationary EEG amplifier (green) during quiet wakefulness and for the ambulatory EEG amplifier (purple) during juggling training. Grey shaded area denotes cluster-corrected p < 0.05. Note that signal-to-noise ratio converges in the higher frequency ranges.

      We have now added Figure R3E as Figure 3B to the revised version of the manuscript to demonstrate that there were no systematic differences between the two age groups in the analytical signal due to the expected age related power differences or EEG-systems. Specifically, we now state in the results section (page 13 – 14, lines 282 – 294):

      "We assessed the cross frequency coupling based on z-normalized spindle epochs (Figure 3B) to alleviate potential power differences due to age (Figure 3 – figure supplement 1A) or different EEG-amplifier systems that could potentially confound our analyses (Aru et al, 2015). Importantly, we found no amplitude differences around the spindle peak (point of SO-phase readout) between adolescents and adults using cluster-based random permutation testing (Figure 3B), indicating an unbiased analytical signal. This was also the case for the SO-filtered (< 2 Hz) signal (Figure 3B, inset). Critically, the significant differences in amplitude from -1.4 to -0.8 s (p = 0.023, d = -0.73) and 0.4 to 1.5 s (p < 0.001, d = 1.1) are not caused by age related differences in power or different EEG-systems but instead by the increased coupling strength (i.e. higher coupling precision of spindles to SOs) in adults giving rise to a more pronounced SO-wave shape when averaging across spindle peak locked epochs."

      Further, we added the correlational analyses that we computed separately for the age groups (Figure R3A-D) to the revised manuscript (Figure 3 – figure supplement 2CD) as they further substantiate our claims about the relationship between SO-spindle coupling and gross-motor learning.

      We now refer to these analyses in the results section (page 16, lines 338 – 343):

      "Critically, when computing the correlational analyses separately for adolescents and adults, we identified highly similar effects at electrode C4 for task proficiency (Figure 3 – figure supplement 2C) and learning curve (Figure 3 – figure supplement 2D) in each group. These complementary results demonstrate that coupling strength predicts gross-motor learning dynamics in both, adolescents as well as adults, and further show that this effect is not solely driven by one group."

      2) The authors might want to explicitly show that the reported correlations (with regards to both learning curve and task proficiency change) are not driven by any outliers.

      We thank the reviewers for their suggestion. We agree that when inspecting the scatter plots it looks like that the correlations could be severely influenced by two outliers in the adult group. Because this is an important matter, we recalculated all previously reported correlations without the two outliers (Figure R4, left column) and followed the reviewer’s suggestion to also compute robust regression (Figure R4, right column) and found no substantial deviation from our original results.

      In more detail, increase in task proficiency resulted in flattening of the learning curve when removing outliers (Figure R4A, rhos = -0.70, p < 0.001) and when applying robust regression analysis (Figure R4B, b = -0.30, t(67) = -10.89, rho = -0.80, p < 0.001). Likewise, higher coupling strength still predicted better task proficiency (mean rho = 0.35, p = 0.029, cluster-corrected) and flatter learning curves after sleep (rho = -0.44, p = 0.047, cluster-corrected) when removing the outliers (Figure R4CE) and when calculating robust regression (Figure R4DF, task proficiency: b = 82.32, t(40) = 3.12, rho = 0.45, p = 0.003; learning curve: b = -26.84, t(40) = -2.96, rho = -0.43, p = 0.005). Furthermore, we calculated spearman rank correlations and cluster-corrected spearman rank correlations in our original manuscript, to mitigate the impact of outliers, even though Pearson correlations are more widely used in the field. Therefore, we still report spearman rank correlations for single electrodes instead of robust correlations as it is more consistent with the cluster-correlation analyses.

      We now use robust trend lines instead of linear trend lines in our scatter plots. Further, we added the correlations without outliers (Figure R4ACE) to the supplements as Figure 2 – figure supplement 1D and Figure 3 – figure supplement 2 FG. These additional analyses are now reported in the results section of the revised manuscript (page 9, lines 186 – 191):

      "[…] we confirmed a strong negative correlation between the change (post retention values – pre retention values) in task proficiency and the change in learning curve after the retention interval (Figure 2F; rhos = -0.71, p < 0.001), which also remained strong after outlier removal (Figure 2 – figure supplement 1D). This result indicates that participants who consolidate their juggling performance after a retention interval show slower gains in performance."

      And (page 16, lines 343 – 346):

      "[…] Furthermore, our results remained consistent when including coupled spindle events in NREM2 (Figure 3 – figure supplement 2E) and after outlier removal (Figure 3 – figure supplement 2FG)."

      Furthermore, we now state that we specifically utilized spearman rank correlations to mitigate the impact of outliers in our analyses in the method section (page 35, lines 808 – 813)::

      "For correlational analyses we utilized spearman rank correlations (rhos; Figure 2F & Figure 3DE) to mitigate the impact of possible outliers as well as cluster-corrected spearman rank correlations by transforming the correlation coefficients to t-values (p < 0.05) and clustering in the space domain (Figure 3DE). Linear trend lines were calculated using robust regression."

      Figure R4:

      (A) Spearman rank correlation between task proficiency change and learning curve change collapsed across adolescents (red dot) and adults (black diamonds) after removing two outlier subjects in the adult age group. Grey-shaded area indicates 95% confidence intervals of the robust trend line. (B) Robust regression of task proficiency change and learning curve change of the original sample. (C) Cluster-corrected correlations (right) between individual coupling strength and overnight task proficiency change (post – pre retention) after outlier removal (left, spearman correlation at C4, uncorrected). Asterisks indicate cluster-corrected two-sided p < 0.05. (D) Robust regression of coupling strength at C4 and task proficiency of the original sample. (E) Same conventions as in (C) but for overnight learning curve change. (F) Same conventions as in (D) but for overnight learning curve change.

      3) The sleep data of all participants (thus from both sleep first and wake first) were used to determine the features of SO-spindle coupling in adolescents and adults. Were there any differences between groups (sleep first vs. wake first)? This might be in interesting in general but especially because only data of the sleep first group entered the subsequent correlational analyses.

      We thank the reviewers for their remark. We agree that adding additional information about possible differences between the sleep first and wake first groups would allow for a more comprehensive assessment of the reported data. We did not explain our reasoning to include only the sleep first groups for the correlation analyses clearly enough in the original manuscript. Unfortunately, we can only report data for the adolescents in our sample, because we did not record polysomnography (PSG) for the adult wake first group. This is also one of the two reasons why we focused on the sleep first groups for our correlational analyses.

      Adolescents in the sleep first group did not differ from adolescents in the wake first group in terms of sleep architecture (except REM (%), which did not correlate with behavior [task proficiency: rho = -0.17, p = 0.28; learning curve: -0.02, p = 0.90]) as well as SO and sleep spindle event descriptive measures (see Table R2). Importantly, we found no differences in coupling strength between the two groups (Figure R2A).

      Table R2. Summary of sleep architecture and SO/spindle event descriptive measures (at electrode C4) of adolescents in the sleep first and wake first group (mean ± standard deviation). Independent t-tests were used for comparisons

      The second reason why we focused our analyses on sleep first was that adolescents in the wake first group had higher task proficiency after the sleep retention interval than the sleep first group (Figure R2A; t(23) = -2.24, p = 0.034). This difference in performance is directly explained by the additional juggling test that the wake first group performed at the time point of their learning night, which should be considered as additional training. Therefore, we excluded the wake first group from our correlational analyses because sleep and wake first group are not comparable in terms of juggling training during the night when we assessed SO-spindle coupling strength.

      Figure R2

      (A) Comparison of SO-spindle coupling strength in the adolescent sleep first (blue) and wake first (green) group using cluster-based random permutation testing (Monte-Carlo method, cluster alpha 0.05, max size criterion, 1000 iterations, critical alpha level 0.05, two-sided). Left: exemplary depiction of coupling strength at electrode C4 (mean ± SEM). Right: z-transformed t-values plotted for all electrodes obtained from the cluster test. No significant clusters emerged. (B) Comparison of task proficiency between sleep first and wake first group after the sleep retention interval (mean ± SEM). Adolescents in the wake first group had higher task proficiency given the additional juggling performance test, which also reflects additional training.

      These additional analyses (Figure R2) and the summary statistics of sleep architecture and SO/spindle event descriptives of adolescents in the sleep first and wake first group (Table R2), are now reported in the revised version of the manuscript as Figure 3 – figure supplement 2AB and Supplementary file – table 7. We now explicitly explain our rationale of why we only considered participants in the sleep first group for our correlational analyses in the results section (page 6, lines 101 – 105):

      "Polysomnography (PSG) was recorded during an adaptation night and during the respective sleep retention interval (i.e. learning night) except for the adult wake-first group (for sleep architecture descriptive parameters of the adaptation night and learning night as well as for adolescents and adults see Supplementary file – table 1 & 2)"

      And (page 15, lines 311 – 320):

      "[…] Furthermore, given that we only recorded polysomnography for the adults in the sleep first group and that adolescents in the wake first group showed enhanced task proficiency at the time point of the sleep retention interval due to additional training (Figure 3 – figure supplement 2A), we only considered adolescents and adults of the sleep-first group to ensure a similar level of juggling experience adolescents and adults of the sleep-first group to ensure a similar level of juggling experience (for summary statistics of sleep architecture and SO and spindle events of subjects that entered the correlational analyses see Supplementary file – table 6). Notably, we found no differences in electrophysiological parameters (i.e. coupling strength, event detection) between the adolescents of the wake first and sleep first group (Figure 3 – figure supplement 2B & Supplementary file – table 7)."

      4) To allow a more comprehensive assessment of the underlying data information with regards to general sleep descriptives (minutes, per cent of time spent in different sleep stages, overall sleep time etc.) as well as related to SOs, spindles and coupled events (e.g. number, density etc.) would be needed.

      We agree with the reviewers that additional information about sleep architecture and SO as well as sleep spindle characteristics are needed for a more comprehensive assessment of our data. We now added summary tables for sleep architecture and SO/spindle event descriptive measures for the whole sample (Table R4) and for the sleep first groups that we used for our correlational analyses (Table R5) to the supplementary material in the updated manuscript. It is important to note, that due to the longer sleep opportunity of adolescents that we provided to accommodate the overall higher sleep need in younger participants, adolescents and adults differed in most general sleep architecture markers and SO as well as sleep spindle descriptive measures. In addition, changes in sleep architecture are prominent during the maturational phase from adolescence to adulthood, which might introduce additional variance between the two age groups.

      Table R4. Summary of sleep architecture and SO/spindle event descriptive measures (at electrode C4) of adolescents and adults across the whole sample (mean ± standard deviation) in the learning night. Independent t-tests were used for comparisons

      Table R5. Summary of sleep architecture and SO/spindle event descriptive measures (at electrode C4) of adolescents and adults in the sleep first group (mean ± standard deviation) in the learning night. Independent t-tests were used for comparisons

      In order to ensure that our correlational analyses are not driven by these systematic differences between the two age groups, we used cluster-corrected partial correlations to control for sleep architecture markers (Figure R7) and SO/spindle descriptive measurements (Figure R8A). Critically, none of these possible confounders changed the pattern of our initial correlational analyses of coupling strength and task proficiency/learning curve. Additionally, we also controlled for differences in spindle event number by using a bootstrapped resampling approach. We randomly drew 200 spindle events in 100 iterations and subsequently recalculated the coupling strength for each subject. We found that resampled values and our original observation of coupling strength are almost perfectly correlated, indicating that differences in event number are unlikely to have an impact on coupling strength as long as there are at least 200 events (Figure R8B). Combined these analyses demonstrate that our correlations between coupling strength and behavior are not influenced by the reported differences in sleep architecture and SO/spindle descriptive measures.

      Figure 7R

      Summary of cluster-corrected partial correlations of coupling strength with task proficiency (left) and learning curve (right) controlling for possible confounding factors. Asterisks indicate location of the detected cluster. The pattern of initial results remained highly stable.

      Figure R8

      (A) Summary of cluster-corrected partial correlations of coupling strength with task proficiency (left) and learning curve (right) controlling SO/spindle descriptive measures at critical electrode C4. Asterisks indicate location of the detected cluster. The pattern of initial results remained highly stable. (B) Spearman correlation between resampled coupling strength (N = 200, 100 iterations) and original observation of coupling strength for adolescents (red circles) and adults (black diamonds), indicating that coupling strength is not influenced by spindle event number if at least 200 events are present. Grey-shaded area indicates 95% confidence intervals of the robust trend line.

      We now provide general sleep descriptives (Table R4 & R5) in the revised version of the manuscript as Supplementary file – table 2 & table 6. These data are referred to in the results section (page 6, lines 101 – 105):

      "Polysomnography (PSG) was recorded during an adaptation night and during the respective sleep retention interval (i.e. learning night) except for the adult wake-first group (for sleep architecture descriptive parameters of the adaptation night and learning night as well as for adolescents and adults see Supplementary file – table 1 & 2)."

      And (page 15, lines 311 – 318):

      "Furthermore, given that we only recorded polysomnography for the adults in the sleep first group and that adolescents in the wake first group showed enhanced task proficiency at the time point of the sleep retention interval due to additional training (Figure 3 – figure supplement 2A), we only considered adolescents and adults of the sleep-first group to ensure a similar level of juggling experience (for summary statistics of sleep architecture and SO and spindle events of subjects that entered the correlational analyses see Supplementary file – table 6)."

      The additional control analyses (Figure R7 & R8) are also now added to the revised manuscript as Figure 3 – figure supplement 3 & 4 in the results section (page 16, lines 356 – 360):

      "For a summary of the reported cluster-corrected partial correlations as well as analyses controlling for differences in sleep architecture see Figure 3 – figure supplement 3. Further, we also confirmed that our correlations are not influenced by individual differences in SO and spindle event parameters (Figure 3 – figure supplement 4)."

      5) The authors used a partial correlations to rule out that age drove the relationship between coupling strength, learning curve and task proficiency. It seems like this analysis was done specifically for electrode C4, after having already established that coupling strength at electrode C4 correlates in general with changes in the learning curve and task proficiency. I think the claim that results were not driven by age as confounding factor would be stronger if the authors used a cluster-corrected partial correlation in the first place (just as in the main analysis).

      The reviewers are correct that initially we only conducted the partial correlation for electrode C4. Following the reviewers suggestion we now additionally computed cluster-corrected partial correlations similar to our main analysis. Like in our original analyses, we found a significant positive central cluster (Figure R6A, mean rho = 0.40, p = 0.017) showing that higher coupling strength related to better task proficiency after sleep and a negative cluster-corrected correlation at C4 showing that higher coupling strength was related to flatter learning curves after sleep (Figure R6B, rho = -0.47, p = 0.049) also when controlling for age.

      Figure R6

      (A) Cluster-corrected partial correlation of individual coupling strength in the learning night and overnight change in task proficiency (post – pre retention) collapsed across adolescents and adults, controlling for age. Asterisks indicate cluster-corrected two-sided p < 0.05. A similar significant cluster to the original analysis (Figure 4A) emerged comprising electrodes Cz and C4. (B) Same conventions as in A. Like in the original analysis (Figure 4B) a negative correlation between coupling strength at C4 and learning curve change survived cluster-corrected partial correlations when controlling for age.

      We now always report cluster-corrected partial correlations when controlling for possible confounding variables in the updated version of the manuscript (also see answer to issue #7). A summary of all computed partial correlations including Figure R6 can now be found as Figure 3 – figure supplement 3 & 4 in the revised manuscript.

      Specifically we now state in the results section (page 16 – 17, lines 347 – 360):

      "To rule out age as a confounding factor that could drive the relationship between coupling strength, learning curve and task proficiency in the mixed sample, we used cluster-corrected partial correlations to confirm their independence of age differences (task proficiency: mean rho = 0.40, p = 0.017; learning curve: rhos = -0.47, p = 0.049). Additionally, given that we found that juggling performance could underlie a circadian modulation we controlled for individual differences in alertness between subjects due to having just slept. We partialed out the mean PVT reaction time before the juggling performance test after sleep from the original analyses and found that our results remained stable (task proficiency: mean rho = 0.37, p = 0.025; learning curve: rhos = -0.49, p = 0.040). For a summary of the reported cluster-corrected partial correlations as well as analyses controlling for differences in sleep architecture see Figure 3 – figure supplement 3. Further, we also confirmed that our correlations are not influenced by individual differences in SO and spindle event parameters (Figure 3 – figure supplement 4)."

      And in the methods section (page 35, lines 813 – 814):

      "To control for possible confounding factors we computed cluster-corrected partial rank correlations (Figure 3 – figure supplement 3 and 4)."

      References

      Aru, J., Aru, J., Priesemann, V., Wibral, M., Lana, L., Pipa, G., Singer, W. & Vicente, R. (2015) Untangling cross-frequency coupling in neuroscience. Curr Opin Neurobiol, 31, 51-61.

      Bothe, K., Hirschauer, F., Wiesinger, H. P., Edfelder, J., Gruber, G., Birklbauer, J. & Hoedlmoser, K. (2019) The impact of sleep on complex gross-motor adaptation in adolescents. Journal of Sleep Research, 28(4).

      Bothe, K., Hirschauer, F., Wiesinger, H. P., Edfelder, J. M., Gruber, G., Hoedlmoser, K. & Birklbauer, J. (2020) Gross motor adaptation benefits from sleep after training. J Sleep Res, 29(5), e12961.

      Campbell, I. G. & Feinberg, I. (2016) Maturational Patterns of Sigma Frequency Power Across Childhood and Adolescence: A Longitudinal Study. Sleep, 39(1), 193-201.

      Dayan, E. & Cohen, L. G. (2011) Neuroplasticity subserving motor skill learning. Neuron, 72(3), 443-54. De Gennaro, L. & Ferrara, M. (2003) Sleep spindles: an overview. Sleep Med Rev, 7(5), 423-40.

      De Gennaro, L., Ferrara, M., Vecchio, F., Curcio, G. & Bertini, M. (2005) An electroencephalographic fingerprint of human sleep. Neuroimage, 26(1), 114-22.

      Dinges, D. F., Pack, F., Williams, K., Gillen, K. A., Powell, J. W., Ott, G. E., Aptowicz, C. & Pack, A. I. (1997) Cumulative sleepiness, mood disturbance, and psychomotor vigilance performance decrements during a week of sleep restricted to 4-5 hours per night. Sleep, 20(4), 267-77.

      Dinges, D. F. & Powell, J. W. (1985) Microcomputer Analyses of Performance on a Portable, Simple Visual Rt Task during Sustained Operations. Behavior Research Methods Instruments & Computers, 17(6), 652-655.

      Eichenlaub, J. B., Biswal, S., Peled, N., Rivilis, N., Golby, A. J., Lee, J. W., Westover, M. B., Halgren, E. & Cash, S. S. (2020) Reactivation of Motor-Related Gamma Activity in Human NREM Sleep. Front Neurosci, 14, 449.

      Feinberg, I. & Campbell, I. G. (2013) Longitudinal sleep EEG trajectories indicate complex patterns of adolescent brain maturation. American Journal of Physiology - Regulatory, Integrative and Comparative Physiology, 304(4), R296-303.

      Hahn, M., Heib, D., Schabus, M., Hoedlmoser, K. & Helfrich, R. F. (2020) Slow oscillation-spindle coupling predicts enhanced memory formation from childhood to adolescence. Elife, 9.

      Helfrich, R. F., Lendner, J. D. & Knight, R. T. (2021) Aperiodic sleep networks promote memory consolidation. Trends Cogn Sci.

      Helfrich, R. F., Lendner, J. D., Mander, B. A., Guillen, H., Paff, M., Mnatsakanyan, L., Vadera, S., Walker, M. P., Lin, J. J. & T., K. R. (2019) Bidirectional prefrontal-hippocampal dynamics organize information transfer during sleep in humans. Nature Communications, 10(1), 3572.

      Helfrich, R. F., Mander, B. A., Jagust, W. J., Knight, R. T. & Walker, M. P. (2018) Old Brains Come Uncoupled in Sleep: Slow Wave-Spindle Synchrony, Brain Atrophy, and Forgetting. Neuron, 97(1), 221-230 e4.

      Killgore, W. D. (2010) Effects of sleep deprivation on cognition. Prog Brain Res, 185, 105-29.

      Kurth, S., Jenni, O. G., Riedner, B. A., Tononi, G., Carskadon, M. A. & Huber, R. (2010) Characteristics of sleep slow waves in children and adolescents. Sleep, 33(4), 475-80.

      Maris, E. & Oostenveld, R. (2007) Nonparametric statistical testing of EEG- and MEG-data. J Neurosci Methods, 164(1), 177-90.

      Muehlroth, B. E., Sander, M. C., Fandakova, Y., Grandy, T. H., Rasch, B., Shing, Y. L. & Werkle-Bergner, M. (2019) Precise Slow Oscillation-Spindle Coupling Promotes Memory Consolidation in Younger and Older Adults. Sci Rep, 9(1), 1940.

      Muehlroth, B. E. & Werkle-Bergner, M. (2020) Understanding the interplay of sleep and aging: Methodological challenges. Psychophysiology, 57(3), e13523.

      Niethard, N., Ngo, H. V. V., Ehrlich, I. & Born, J. (2018) Cortical circuit activity underlying sleep slow oscillations and spindles. Proceedings of the National Academy of Sciences of the United States of America, 115(39), E9220-E9229.

      Purcell, S. M., Manoach, D. S., Demanuele, C., Cade, B. E., Mariani, S., Cox, R., Panagiotaropoulou, G., Saxena, R., Pan, J. Q., Smoller, J. W., Redline, S. & Stickgold, R. (2017) Characterizing sleep spindles in 11,630 individuals from the National Sleep Research Resource. Nature Communications, 8, 15930.

      Van Dongen, H. P., Maislin, G., Mullington, J. M. & Dinges, D. F. (2003) The cumulative cost of additional wakefulness: dose-response effects on neurobehavioral functions and sleep physiology from chronic sleep restriction and total sleep deprivation. Sleep, 26(2), 117-26.

      Wilhelm, I., Metzkow-Meszaros, M., Knapp, S. & Born, J. (2012) Sleep-dependent consolidation of procedural motor memories in children and adults: the pre-sleep level of performance matters. Developmental Science, 15(4), 506-15.

      Winer, J. R., Mander, B. A., Helfrich, R. F., Maass, A., Harrison, T. M., Baker, S. L., Knight, R. T., Jagust, W. J. & Walker, M. P. (2019) Sleep as a potential biomarker of tau and beta-amyloid burden in the human brain. J Neurosci.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1:

      This paper puts together a nice set of data showing that a specific gene called Resf1 when deleted effects the ability of ESCs to self-renew and proceed to germline fates. I believe the data are sound and that they provide the evidence needed for the authors to make their conclusions.While I think the data are presented well and the manuscript is well-written, the "modest" functional results suggest this work would be more suited for a specialized journal.

      We thank Reviewer 1 for their supportive comments.

      • *

      Reviewer #2:

      1. In the presence of LIF, there is no difference between Resf1 knockout mESCs and WT mESCs except the expression of Esrrb, Nanog and Pou5f1. What about other genes? RNA-seq is needed to distinguish the two cell lines.

      Fukuda et al. have shown that deletion of Resf1 leads to misregulation of ~1000 genes (adj. p-value 2) in presence of LIF. This highlights large differences between transcriptomes of Resf1 KO and WT cells that occur despite only a marginal difference in self-renewal efficiency between Resf1 KO cells and WT in the presence of LIF. It is therefore questionable whether the time and resources required to perform the requested RNA-seq would produce data that could unambiguously identify the potential causative effector difference downstream of Resf1.

      As an alternative approach, we have reanalysed the Fukuda et al RNA-seq data. We find that Esrrb is significantly downregulated (in agreement with our Q-RT-PCRs), as are Klf4 and LifR (FDR 1.5). However, our meta-analysis of the Fukuda et al data did not show Pou5f1 and Nanog to be differentially expressed (FDR 1.5). This is in line with the lower level of downregulation of Pou5f1 and Nanog, compared to Esrrb in our Q-RT-PCR data. Notably, our gene expression analyses were performed in 5 biological replicates, whereas Fukuda et al. performed RNA-seq in two biological replicates. We can include the meta-analysis of the Fukuda et al data in our submission. As the change in ESC self-renewal that we see at low LIF concentrations could result from a decrease in Lifr expression, we will verify the change in expression of Lifr by Q-RT-PCR. Importantly, we will do this in a way that discriminates between expression of the transmembrane Lifr and soluble LifR, since the latter acts antagonistically (PMID: 9396734, Chambers, BJ, 1997).


      1. The authors showed Resf1 is not required for Nanog function, so how does Resf1 regulate the expression of pluripotency genes? Through epigenetic modifications or signaling pathways? The authors should design experiments to explain the detailed mechanisms.

      The strength of the immunoblot signal for RESF1 is low, even when Resf1 is expressed episomally. Therefore, although we could try to co-immunoprecipitate with the Resf1-v5 cell line and endogenous Nanog, the expression level of RESF1 may mean this effort is unsuccessful. Given the fact that the result will not affect the conclusions of our study, we do not think this effort is justifiable.

      1. The authors showed that Resf1 interacts with Nanog, but they used forced expressed proteins. Does the endogenous Resf1 interacts with endogenous Nanog? Do they bind to some same DNA sequences?

      This are important questions to answer. However, many more experiments would be required to reach firm conclusions. The reviewer is right to say that the mechanisms by which Resf1 affects pluripotency are unknown and remain to be answered in future. We therefore propose to improve the text discussing similarities in pluripotency phenotype between deletions of Trim28, SETDB1, YTHDC1 and RESF1. As deletion of RESF1 partner SETDB1 or other proteins involved in repression of retrotransposons lead to downregulation of pluripotency genes and in some cases collapse of ESCs (e.g. PMID: 19884255, Bilodeau et al. 2009; PMID: 19884257, Yuan et al. 2009), we hypothesise that the RESF1 phenotype may be explained by affecting SETDB1 chromatin binding and therefore repression of SETDB1 targets. The mild phenotype of RESF1 KO indicates that RESF1 would not be an essential component of this repressor complex but rather “a modulatory protein”.

      It is also worth noting that the meta-analysis of the RNA-seq data from Fukuda et al. suggests that Resf1-null ESCs may express reduced levels of LifR mRNA, and this is something we plan to investigate.


      1. In figure 5C, some Resf1 positive cells showed Nanog negative. Are these Nanog negative cells pluripotent?

      Nanog-null ESCs are pluripotent (PMID: 18097409, Chambers et al., 2007). In addition, NANOG-negative cells in FCS/LIF cultures can retain pluripotency. Our purpose in this figure was therefore not to say whether NANOG-negative:RESF1-positive cells are pluripotent but to draw attention to the broader expression of RESF1 in FCS/LIF compared to NANOG. Such broader expression has also been noted for other heterogeneously expressed factors (PMID: 31582397, Pantier et al. 2019).

      1. In figure 6A, the naïve mESCs are induced to EpiLCs. Is the transition efficiency of Resf1 knockout cells the same with WT mESCs? The finally obtained PGCLCs should be identified.

      We show that the key TFs of EpiLC state are expressed similarly in WT and Resf1 KO cells (Supplementary figure 4) and we have data showing that WT and Resf1KO EpiLCs have a similar morphology. Together this suggests an efficient transition to an EpiLC state. Our analysis has identified expression of Blimp1/Ap2g/Prdm14 in Resf1-null cultures. Compared to wild-type cells these levels are reduced up to 3-fold. As this is from an unsorted population and the number of SSEA1/CD61-positive cells is decreased around 2x, this suggests that the PGCLC population formed by Resf1-null cells is reduced in proportion but is otherwise normal.

      We will add photographs of EpiLC colonies formed by Resf1 KO and WT cells.


      1. in figure 5c, the scale bar is missing.

      We will add missing scale bars in the figure 5C.

      Reviewer #3:

      1. What was less clear was an explanation of why colonies 4 and 24 were chosen. Were there other colonies with the desired expression? Was this amount of expression repeated in replicative experiments with approximately 2 colonies only available to be selected?

      Approximately 30 colonies were selected for analysis. Of these, only 2 had deletion of both Resf1 alleles. We will make this point clearer in the text.


      1. Figure 1C, 5C and S2B with microscopic images should include a scale bar.

      Missing scalebars in the Figure 1C will be added. Unfortunately the microscopy setup used to collect the images in Figures 5C and S2B did not allow scalebars to be added at the time of imaging and these cannot be added retrospectively. However, we do not think that inclusion of scalebars, even were it possible would affect the conclusions of our manuscript.

      1. Figure 1E needs a better explanation of the significance, "less clear cut" is not adequate. Reporting statistics, or lack of significance, on the graph would help.

      We will update the manuscript and the Figure 1E to include results of a statistical analysis (Wilcoxon-rank sum test) comparing formation of AP+ colonies between Resf1 KO and WT cells at different LIF concentrations. These results show that both Resf1 KO cell lines have lower median number of AP+ colonies than WT cells at LIF concentrations 0 and 1 (p.adj. *

      1. It's translatability to medicine, although perhaps that is not the intention, is somewhat lacking. Is there a naturally occurring situation where LIF is absent that would require this pathway to be used? These were mouse ESC's, perhaps this study could incorporate information about relevant translation to a human condition to aid in the significance. This manuscript suggests a mechanistic evaluation by which self-renewal can occur other than the canonical pathway, which is interesting and can inform the field.

      Our results suggest that RESF1 directly or indirectly supports self-renewal of ESCs. Interestingly, Human cell atlas identified RESF1 expression as a negative predictor of survival of renal cancer and was found to be expressed in testis cancer cells and other cancer tissues. Therefore, RESF1 could promote self-renewal of cancer cells similarly to ESCs. However, this is speculative and needs further studies. As this is both outside of the scope of this manuscript and our expertise, we do not think it prudent for us to pursue this line of inquiry. However, we agree that further studies could evaluate RESF1 function in human tissues, especially pluripotent cells and germ cells. As we show that RESF1 deletion leads to reduced induction of PGCLCs and previous studies showed infertility of Resf1 KO mice, investigating link between human fertility and RESF1 could have implications in reproductive medicine.

      We will improve our discussion to highlight the possible significance of RESF1 function in human fertility.





    1. PuTTY: a free SSH and Telnet client Home | FAQ | Feedback | Licence | Updates | Mirrors | Keys | Links | Team Download: Stable · Snapshot | Docs | Changes | Wishlist PuTTY is a free implementation of SSH and Telnet for Windows and Unix platforms, along with an xterm terminal emulator. It is written and maintained primarily by Simon Tatham. The latest version is 0.76. Download it here. LEGAL WARNING: Use of PuTTY, PSCP, PSFTP and Plink is illegal in countries where encryption is outlawed. We believe it is legal to use PuTTY, PSCP, PSFTP and Plink in England and Wales and in many other countries, but we are not lawyers, and so if in doubt you should seek legal advice before downloading it. You may find useful information at cryptolaw.org, which collects information on cryptography laws in many countries, but we can't vouch for its correctness. Use of the Telnet-only binary (PuTTYtel) is unrestricted by any cryptography laws. Latest news 2021-07-17 PuTTY 0.76 released PuTTY 0.76, released today, is a bug-fix and security release. It fixes bugs in 0.75, and also adds a new configuration option as an extra defence against authentication prompt spoofing by a malicious or compromised SSH server. 2021-06-13 Pre-releases of 0.76 now available We're working towards a 0.76 release. Pre-release builds are available, and we'd appreciate people testing them and reporting any issues. 0.76 will be a pure bug-fix release, fixing a few high-impact bugs that appeared as a result of all of 0.75's new features. In particular, 0.76 fixes the crash when you enable the 'Use system colours' setting on Windows PuTTY. 2021-05-28 Cloudflare public DNS blocking PuTTY downloads If you use some of Cloudflare's public DNS resolvers (1.1.1.2 or 1.1.1.3), you may find you can't download PuTTY at the moment. The server that hosts the release files, the.earth.li, has been blocked since at least 22 May. We don't know why; Cloudflare's own categorisation of the site does not currently include any "security threat" tags. If you're currently having trouble downloading PuTTY, check what DNS resolver you're using. If it's one of these, we suggest you use a different one. 2021-05-08 PuTTY 0.75 released PuTTY 0.75, released today, provides major new features: deferred key decryption in Pageant, more secure SSH key fingerprints and SSH private key files, and some new network protocols for special purposes. 0.75 also contains a fix for a DoS vulnerability in the Windows terminal emulator, which allowed a malicious server to lock up all GUI Windows applications running on the client. 2021-04-18 Pre-releases of 0.75 now available We're working towards a 0.75 release. Pre-release builds are available, and we'd appreciate people testing them and reporting any issues. 0.75 will be a feature release. The biggest changes all relate to Pageant and/or SSH public keys. User-visible behaviour changes include: Pageant now allows you to load a key without decrypting it, in which case it will wait until you first use it to ask for the passphrase. We've switched to the modern OpenSSH-style SHA-256 style of key fingerprint. Back-end changes that affect compatibility: We've added support for the rsa-sha2-256 and rsa-sha2-512 signature methods, which some servers now require in order to use RSA keys. We've introduced a new version of the PPK format for private key files, to remove weak crypto and improve password-guessing resistance. We've introduced a new method for applications to talk to Pageant on Windows, based on the same named-pipe system used by connection sharing instead of window messages. 2020-11-22 Primary git branch renamed The primary branch in the PuTTY git repository is now called main, instead of git's default of master. For now, both branch names continue to exist, and are kept automatically in sync by a symbolic-ref on the server. In a few months' time, the alias master will be withdrawn. To update a normal downstream clone or checkout to use the new branch name, you can run commands such as ‘git branch -m master main’ followed by ‘git branch -u origin/main main’. 2020-06-27 PuTTY 0.74 released PuTTY 0.74, released today, is a bug-fix and security release. It fixes bugs in 0.73, including one possible vulnerability, and also adds a new configuration option to mitigate a minor information leak in SSH host key policy. 2019-09-29 PuTTY 0.73 released PuTTY 0.73, released today, is a bug-fix release. It fixes a small number of bugs since 0.72, and a couple of them have potential security implications. 2019-07-20 PuTTY 0.72 released PuTTY 0.72, released today, is a bug-fix release. It fixes a small number of further security issues found by the 2019 EU-funded HackerOne bug bounty, and a variety of other bugs introduced in 0.71. 2019-07-08 Bug bounty concluded The EU-funded bug bounty programme is now closed. Many thanks to everybody who sent in reports! Anyone with a vulnerability to report should now go back to reporting it in the old way, via email to the PuTTY team, as described on the Feedback page. If you think it needs to be reported confidentially, encrypt it with our Secure Contact Key. 2019-03-25 Bug bounty continues This year's EU-funded bug bounty programme is still running. It was originally scheduled to end on 7th March, but there was money left over in the budget. So while that money lasts, you still have a chance to earn some by finding vulnerabilities in PuTTY 0.71 or the development snapshots! As before, vulnerabilities should be reported through the HackerOne web site in order to qualify for a bounty: if you send reports directly to the PuTTY team in the usual way, then we'll still fix them, but we can't provide money for them. 2019-03-16 PuTTY 0.71 released PuTTY 0.71, released today, includes a large number of security fixes, many of which were found by the recent EU-funded HackerOne bug bounty. There are also other security enhancements (side-channel resistance), and a few new features. It's also the first release to be built for Windows on Arm. 2019-01-18 EU bug bounty for finding vulnerabilities in PuTTY From now until 7th March, you can earn money by reporting security vulnerabilities in PuTTY! HackerOne is running a bug bounty programme for PuTTY, funded by the European Union as part of the ‘Free and Open Source Software Audit’ project (EU-FOSSA 2). If you report a vulnerability through their web site, it may qualify for a bounty. (The exact amount will depend on how serious the problem is, and there's also a bonus for providing a patch that fixes it.) For more details, or if you have something to report, see the link above. (Please note that HackerOne will only consider vulnerabilities reported to them. If you send a report directly to the PuTTY team in the usual way, then of course we'll still fix it, but we can't also arrange for you to get paid.) 2018-08-25 GPG key rollover This week we've generated a fresh set of GPG keys for signing PuTTY release and snapshot builds. We will begin signing snapshots with the new snapshot key, and future releases with the new release key. The new master key is signed with the old master keys, of course. See the keys page for more information. 2017-07-08 PuTTY 0.70 released, containing security and bug fixes PuTTY 0.70, released today, fixes further problems with Windows DLL hijacking, and also fixes a small number of bugs in 0.69, including broken printing support and Unicode keyboard input on Windows. Site map Licence conditions under which you may use PuTTY. The FAQ. The documentation. Download PuTTY: latest release 0.76 development snapshots Subscribe to the PuTTY-announce mailing list to be notified of new releases. Feedback and bug reporting: contact address and guidelines. Please read the guidelines before sending us mail; we get a very large amount of mail and it will help us answer you more quickly. Changes in recent releases. Wish list and list of known bugs. Links to related software and specifications elsewhere. A page about the PuTTY team members. If you want to comment on this web site, see the Feedback page. (last modified on Sat Jul 17 11:52:57 2021)

      Qme4bLv4wxfof9ixTMj5e2eUJLJy3U7W4kKNAoNFKH4u6q

    1. Author Response:

      Reviewer #1 Public Review:

      Nakayama and colleagues report a unique screening concept utilizing conserved mechanisms between zebrafish gastrulation and cancer metastasis for identification of potential anti-metastatic drugs. They screen 1280 FDA-approved drugs using the gastrulation as a marker, and identify Pizotifen as an epiboly interrupting drug. Then they find that pharmacologic and genetic inhibition of HTR2C, a target of Pizotifen, suppresses metastatic progression in a zebrafish and mouse model through inhibition of epithelial to mesenchymal transition (EMT) via Wnt-signaling.

      Their work is of interest and has the potential to appeal to a broad audience. However, additional experiments are needed to further substantiate their concept that human cancer metastasis mimic/recapitulate zebrafish gastrulation in terms of conserved mechanism, as well as to confirm the validity of their screening method regarding to the effects of global toxicity.

      Major concerns:

      The first major concern I have is the appropriateness to think the gastrulation as a parameter/index of cancer metastasis. While they cherry-picked some genes that they are known to be involved in both gastrulation and cancer metastasis, more broad analysis should probably be necessary to conclude so. For examples, the authors can analyze comprehensive RNA-seq data set to see if the pathways/networks are similar between gastrulation (zebrafish embryo development data set) and cancer metastasis (benign/primary tumors vs metastasis tumors in TCGA).

      The conservation of embryonic EMT and tumor metastasis EMT has long been well recognized. Now we cited some of these published references (Nieto et al., 2016; Thiery et al., 2009; Yang and Weinberg, 2008). In Table 1, we compiled 50 genes based on published literature to provide further and strong evidence to support this conservation. Knockdown of these genes in Xenopus or zebrafish induced gastrulation defects; conversely, overexpression of these genes conferred metastatic potential on cancer cells and knockdown of these genes suppressed metastasis. Although this point is not really an objective of this study, we believe that the evidence for the conservation is sufficiently convincing to provide the basis for our study. Further RNA-seq comparison of zebrafish embryonic EMT and human tumor metastasis should be beyond the scope of the current study. Generally, the transcriptomic data for zebrafish embryo development at the epiboly/gastrulation stage are based on the whole embryos which include all other activities and are not specific to EMT; thus, it may not be a proper comparison with tumor metastasis data to search for more evidence.

      The second concern is about the Pizotifen's effects on cancer metastasis. Since the Pizotifen suppresses gastrulation, it might have some harmful effect on the organogenesis/development of day2 embryos that they used in zebrafish transplantation model. And if so, cancer metastasis can be suppressed indirectly. The authors could examine if Pizotifen could have some side effects on day2 embryos. The drug also has some cell viability suppressive effects in vivo so as the pics in Fig.2D looks like, and it would be good if this possibility was excluded.

      We had not observed any abnormality in development of Tg (kdrl;GFP) and WT zebrafish at day 2 when these fish were treated with 5µM Pizotifen. However, more than 20µM Pizotifen treatment affected approximately 10% of these fish. The affected fish show shorter tail rather than that of vehicle-treated zebrafish. In xenograft experiments, zebrafish embryos at day 2 were treated with 5µM Pizotifen. The concentration of Pizotifen did not affected development of day2 embryos.

      Futhermore, we demonstrated Pizotifen did not affect primary tumor growth in a mice model of metastasis using 4T1 cells by two different experimental methods. One is that tumor measurement revealed that the sizes of the primary tumors in Pizotifen-treated mice were equal to those in the vehicle-treated mice at the time of resection on day 10 post inoculation. The other is that IF-staining showed the percentage of Ki67 positive cells in the resected primary tumors of Pizotifen-treated mice were the same as those of vehicle-treated mice (Figure 3A and B). Therefore, we conclude that Pizotifen suppress metastasis without affecting cell viability in vivo.

      Finally, the mechanistic parts would need more confirmation and rescue experiments. Transplanted cells can be sorted after the treatment and the expression changes of EMT markers can be examined to see if the phenomenon happens in vivo as well. All main results can be rescued to see if the effect of Pizotifen against EMT happens through HTR2C-Wnt axis.

      Figure 5C showed that 4T1 primary tumors from Pizotifen-treated mice has elevated E95 cadherin expression compared with tumors from vehicle-treated mice. Furthermore, Figure 5C also demonstrated that β-catenin accumulated in the nucleus, and phospho-GSKβ and Zeb1 expression were decreased in 4T1 primary tumors from Pizotifen-treated mice compared with vehicle-treated mice. Loss of E-cadherin plays an essential role in promoting EMT-mediated metastasis since loss of E-cadherin itself is enough to promote metastasis. In contrast, overexpression of mesenchymal markers: vimentin and N-cadherin is not sufficient to induce metastasis. Based on our data from Figure 5C and the accumulated evidences, we conclude that Pizotifen restored epithelial properties to metastatic cells through a decrease of transcriptional activity of β-catenin in vivo

    1. Cultivating a Mindset of Abundance Here’s an empowering fact to internalize: we’re all given the same amount of time as everyone else. Everyone in the world has 24 hours each day to spend as they see fit. It’s up to you to figure out what matters to you. And to spend your days doing those things. In the big picture, we think about life in time periods. Years, decades, and relationships. But you never actually live life on that time scale. Life happens in the moments of every day. Minute by minute. How we spend our days is, of course, how we spend our lives. -Annie Dillard So take the time to align your daily behavior with your priorities. There may be some discomfort in the near term, but you’ll thank yourself down the road.

      Cultivating an Abundance Mindset.

      During all proceeding steps and activites, remember that "Through effort and experience can the brain change." This will help to explain the motivation behind the need for discomfort if change is to be made.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      First of all, we would like to thank the each of the expert reviewers for their effort in evaluating our study. We are confident that we can positively address each of the issues and queries raised by the reviewers.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): **Summary:** This study investigates the functions of the tricalbin proteins in S. cerevisiae, which are homologs extended synaptotagmins in mammals. It suggests the tricalbins modulate plasma membrane phospholipid composition and are particularly important for surviving shift to elevated temperature. The tricalbins are proposed to directly or indirectly promote phosphatidylserine transport from the ER to the plasma membrane during heat stress. They also promote the localization of the kinase Pkh1 to the plasma membrane during heat stress.

      **Major comments:**

      General Response: We thank the reviewer for the comments and opinions. While they are starkly different from the other two reviewers’, they have caused us to consider how we can add additional support for our conclusions and to consider alternative possibilities.

      1. To determine whether the tricalbins and other tethering proteins play a role in phospholipid homeostasis, lipid distribution is measured using biosensors (FLARES) and phospholipid levels are determined by mass spec. The experiments are well done but say little about what role the tricalbins or other tethering proteins play in homeostasis. There are no measurements of lipid transport or rates of phospholipid production, degradation or modification. It is reasonable to propose the tricalbins transport lipids, since other proteins with SMP domains do, but this study does not present any evidence that they do.

      Response: We thank the reviewer for stating that the quantitative microscopy and lipidomics experiments in our study are well done. However, we strongly disagree that the results do not provide new information on the roles of the tricalbins or other tethering proteins in membrane lipid homeostasis. In fact, the lipidomics results definitively show that Ist2 and Scs2/22 control phosphatidylserine (PS) levels. In contrast, loss of the tricalbins does not significantly affect PS levels. We will provide a new figure to make this even more clear to expert and non-expert readers.

      While the tricalbins do not regulate PS levels, the data clearly show that the tricalbins control PS acyl chain saturation and cellular distribution. A PS reporter is reduced at the plasma membrane (PM) upon loss of the tricalbins and we have now confirmed increased localization at the endoplasmic reticulum (ER). These new results will be included in a revised manuscript. As the lipid desaturase Ole1 is localized in the ER, the lipidomics are consistent with increased ER residency of PS species. Thus, the data indicate that while the tricalbins do not regulate PS levels, they control either delivery of PS from the ER to the PM, and/or they may control the organization and stability of PS at the PM which would be a novel finding on its own.

      The reviewer must be aware that there are currently no in vivo cell assays that directly (only) measure lipid transport. These experiments are subject to several factors including lipid metabolism, anterograde transport rates, bilayer organization, lipid accessibility, and retrograde transport rates. While our findings clearly show that the Tcb proteins do not control PS levels, we agree that there are alternative possibilities to explain the changes in PS distribution. In our revised manuscript, we will perform additional experiments to distinguish between these possibilities. New experiments will include mutant forms of Tcb3 bearing substitutions in the SMP domain. We will also examine whether the Tcb proteins control PS organization, availability/accessibility, and stability at the PM (also see Reviewer #3, comment 2). This latter possibility may reveal a novel concept regarding Tcb/E-Syt protein function that goes beyond their proposed conventional role as lipid transfer proteins. Based on the outcome of these experiments, we shall adjust the final cartoon model and conclusions in the discussion accordingly.

      The study convincingly demonstrates that there are fewer Pkh1 puncta formed after temperature shift in cells lacking tricalbins than in wild-type cells (Fig. 6 C,D). However, there is no demonstration that this change in localization alters Pkh1 function or signaling.

      Response: Regulation of Pkh1 by lipids is outside the scope of our current study that is focused on providing new understanding of Tcb protein function. However, the decrease in heat-induced Pkh1 puncta may provide insight into the PM integrity defects in cells lacking Tcb1/2/3 (as Pkh1 is required for PM integrity). To test whether Pkh1 function is compromised in the tcb1/2/3 mutant cells, we can test whether constitutively active Ypk1 (which acts downstream of Pkh1) rescues the PM integrity defects in tcb1/2/3 mutant cells.

      There is no demonstration that association of tricalbins and Skh1 (Fig. 4) has any functional significance or affects phosphoinositide metabolism.

      Response: We thank the reviewer for raising this issue. If the association of Tcb3 and Sfk1 has functional significance, then one would expect that loss of the proteins should phenocopy one another. Deletion of the Sfk1 cytoplasmic domain necessary for co-localization with Tcb3 should also phenocopy loss of Tcb3. This is exactly what we find. Localization of the PS probe is decreased at 42oC upon loss of Sfk1 or truncation of the Sfk1 cytoplasmic tail, similar to cells lacking Tcb3. Furthermore, we find that Tcb3 regulates sterol homeostasis at the PM (using the D4H probe), as has been recently reported for Sfk1 (Kishimoto et al, 2021). Thus, Tcb3 and Sfk1 not only co-localize, but they also share common functions in PM lipid organization. These new results will be presented in our revised manuscript.

      The reviewer also inquired about potential roles for Tcb3 and Sfk1 in phosphoinositide lipid homeostasis, as Sfk1 has reportedly been implicated in heat-induced PI(4,5)P2 synthesis. However, while we find clear roles for Sfk1 and the tricalbins in PS and sterol homeostasis, we did not find a requirement for Sfk1 or the tricalbins in PI(4,5)P2 homeostasis upon heat stress conditions. These findings will be included in our revised manuscript. Importantly, our results indicate that the tricalbins and Sfk1 primarily control PS and sterol homeostasis at the PM, and may regulate phosphoinositides indirectly, and thus provide new insight into the key role of these proteins.

      The study proposes the tricalbins directly or indirectly promote phosphatidylserine transport after temperature shift, but transport has not measured and other possibilities are not ruled out.

      Response: While the Tcb3 SMP domain has been shown to transfer phospholipids in vitro (Qian et al., 2021), we agree that a role in PS transfer in vivo should be examined in more detail and that other possible roles in PS homeostasis should also be considered (also see responses to Reviewer #3, comment 2).

      Upon heat shock, we not only observe a decrease in relative levels of the PS reporter at the PM in the tcb1/2/3 mutant cells (as shown in our original manuscript), but also a corresponding increase in relative levels of the PS probe at the ER and vacuole membrane (also see response to Reviewer #3, comment 5). This could reflect impaired delivery of PS from the ER to the PM and reduced stability of PS at the PM (i.e. increased internalization of PS into the cell).

      In our original manuscript, we showed that deletion of the SMP domain (the lipid transfer domain), phenocopies deletion of the full-length Tcb3 protein in terms of PS distribution and PM integrity following heat shock. To more rigorously test whether lipid transport activity of the SMP domain is responsible for these phenotypes, we will generate amino acid substitutions within the SMP domain of Tcb3 that maintains its overall structure but impairs its ability to transport lipids (by targeting conserved key residues identified in Saheki et al., 2016). We will then assess whether SMP-mediated lipid transfer is necessary for PS homeostasis and PM integrity under heat stress.

      We also agree that other possibilities should be examined. First, to rule out a defect in PS production upon heat stress, we are performing new mass spectrometry lipidomics experiments to measure levels of individual phospholipid species in the tcb1/2/3 mutant and wild type cells after heat stress.

      Second, we have considered whether the Tcb proteins control phospholipid bilayer distribution (e.g. flip and flop). However, cells lacking the Tcbs are not hypersensitive to duramycin (Omnus et al. 2016) and thus phosphatidylethanolamine exposure on the extracellular leaflet is not increased. Moreover, cells lacking the Tcbs (and Scs2/22 and Ist2) are not impaired in the uptake of exogenous NBD-labelled phospholipids (and thus flip across the PM bilayer is not impaired). Possibly, there may be increased lipid ‘flop’ in the mutant cells at high temperature. We can test whether there is increased phospholipid exposure in the extracellular leaflet at high temperature, but our results thus far indicate accumulation of PS on internal membrane compartments (the ER and vacuole membrane).

      Another potentially exciting possibility is that the tricalbin proteins bind and stabilize PS within the cytosolic leaflet of the PM and prevent its internalization by endocytosis or non-vesicular transfer. This mechanism would be completely independent of lipid transport to the PM and would constitute new mechanistic insight into Tcb function. We will test whether PS (and sterol) becomes more accessible (less stable or reduced sequestered pools at the PM) and internalized into the cell, upon removal of the tricalbin proteins. For example, we will monitor PS distribution in cells where endocytosis is blocked with latrunculin A.

      As mentioned, there currently no cellular lipid transport assay that directly (only) measure anterograde transport. However, if the Tcb3 SMP domain mutants are impaired in PS homeostasis and PM integrity, then we can consider monitoring PS transfer in vivo. By performing the experiments outlined here, we will have thoroughly characterised the roles of the tricalbin proteins in PS homeostasis at the PM. Moreover, the new findings may even reveal novel roles that are independent of transport.

      Reviewer #1 (Significance (Required)): While this study is likely to be of interest to those studying the tricalbins or phospholipid homeostasis, it is incremental and provides little conceptual advance on what is already known about the tricalbins and extended synaptotagmins. They have already been implicated in lipid homeostasis in the plasma membrane and this study provides no new mechanistic insight into how this occurs. Similarly, it has already been shown that the tricalbins play a role in maintaining cell integrity during heat stress and there is little new insight into what role the tricalbins play. Perhaps the most notable part of the study is the idea that tricalbins are necessary for phosphatidylserine transport during stress, but considerable additional work is necessary to make a strong case for this claim.

      Response: We strongly disagree with the reviewer’s opinions. Indeed, Reviewer #2 found our study “novel and detailed” and Reviewer #3 found the results in our study to be “highly valuable” and “interesting”.

      In contrast to the reviewer’s claims, there are certainly novel findings in our study. Foremost, this is the first study that demonstrates a role of the tricalbins in PS homeostasis. Previous studies have implicated E-Syt family members in diacylglycerol and phosphoinositide regulation. Our results indicate that the tricalbins and Sfk1 primarily control PS and sterol homeostasis at the PM, not phosphoinositides, and thus provide new insight into the key role of these proteins. Second, while a previous study by Collado et al reported a role of the tricalbins in PM integrity upon heat stress, this work did not provide mechanistic insight into this process. We performed the PM integrity assays for the Collado et al study (as co-authors). Our current study now shows that Tcb function is needed for PS homeostasis and Pkh1 recruitment at the PM upon heat stress; both factors are needed for PM integrity under these conditions. As such, our current study does provide new insight into roles of the Tcbs in PM integrity. Finally, we are exploring roles of the Tcb proteins in PS homeostasis that go beyond their proposed functions as lipid transfer proteins. We are convinced that our study will provide novel and deep mechanistic understanding of the Tcb/E-Syt protein family.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): The study examines the roles of tricalbins in signalling in yeast. By using several mutations they are able to evaluate whether the interactions are caused by tethering the PM and ER or by other mechanisms. Particualrly powerful is their use of cryo-EM tomography and shot-gun mass spectrometery. They look at all the major lipids of the ER and PM and consider their interconversions. They identify large changes in lipid species with one double bond and with two, particularly for PS.

      Reviewer #2 (Significance (Required)): There is little information about membrane physical properties and how it changes as a result of changes in lipid molecular species. Nevertheless, the information provided is novel and detailed. The topic of ER-PM contact sites is new and evolving and this paper advances our understanding of the yeast system considerably. It also looks at protein-protein interactions by fluorescence methods and studies the consequences of heat shock.

      Response: We are pleased that the reviewer concluded that our study on the Tcb proteins “advances our understanding of the yeast system considerably” and found our use of lipidomics to be “powerful”.

      This reviewer only had only one critique; there “is little information about membrane physical properties and how it changes as a result of changes in lipid molecular species”. In our revised manuscript, we will provide new data showing changes in levels of sterol (ergosterol) accessibility/availability at the PM in cells lacking the tricalbin proteins. Sterol lipids exist in distinct pools in the PM bilayer (extracellular vs. cytoplasmic leaflet, accessible vs. sequestered) that control the biophysical and mechanical properties of the PM (packing order, permeability, etc.). Moreover, PS and sterol lipids are proposed to undergo mutual associations whereby PS controls sterol accessibility (the ‘umbrella’ model) and sterol in turn stabilizes PS in the cytoplasmic leaflet of the PM. Our findings demonstrate that the primary function of the Tcb proteins is PS and sterol organization in the PM, providing new mechanistic insight into regulatory mechanisms for membrane homeostasis. We will attempt to further characterize changes in PM mechano-chemical and biophysical properties to further understand how changes in membrane lipid composition affect membrane integrity.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)): The Tcb/ESyt proteins play important role in contact site formation and non-vesicular lipid transport. However, their exact functions remain highly controversial. This study by Stefan and colleagues revealed a new role for the yeast Tcb proteins, especially Tcb3, in regulating plasma membrane phosphatidylserine, as well as PIP2. These results are highly valuable to people working on membrane contact sites and lipid trafficking. Overall, the results are fairly convincing and interesting. There are however some concerns and suggestions:

      Response: We are pleased that the reviewer stated that our study has “revealed a new role for the yeast Tcb proteins” and that the findings are “fairly convincing and interesting”. We also thank the reviewer for providing constructive criticisms and helpful suggestions.

      1. Fig. 4, for Tcb3 and Sfk1 interaction, what about Tcb1/2, which would be good controls for the specificity of the interaction.

      Response: We agree that it would be useful to determine if the Tcb3-Sfk1 interaction is specific. We will perform additional BiFC experiments to address whether Tcb1 and Tcb2 associate with Sfk1. Our previous work suggested that Tcb3 forms heterodimers with Tcb1 and Tcb2 necessary for PM integrity (Collado et al., 2019). However, the functional association between Sfk1 and Tcb3 may be specific to Tcb3, and we will test this possibility. It would be interesting to identify a function specific to an individual Tcb protein.

      A central question is whether Tcb3 transfers PS by itself through the SMP domain or requires other lipid transfer proteins. One possibility is that the transfer is mediated by Osh6 and Osh7. Did the expression level of Osh6/7 change between the delta tether and the ist2scs2/22 null strain? Under normal and stressful conditions?

      Response: We agree that it is important to address whether Tcb3 transfers PS via its SMP domain (an issue also raised by Reviewer #1) or whether this activity is carried out by another lipid transfer protein, such as Osh6 and Osh7. We have considered several alternative possibilities, as well as designed and performed new experiments, as described below (two key experiments are in italics).

      To rigorously test whether SMP domain-mediated lipid transport activity is required for PS homeostasis at the PM under heat stress, we will generate amino acid substitutions within the Tcb3 SMP domain that maintain its overall structure but impair its ability to transport lipids (targeting conserved key residues described in Saheki et al., 2016 & Bian et al., 2018). We will then assess whether SMP-mediated lipid transfer is necessary for PS homeostasis and PM integrity under heat stress.

      As suggested by the reviewer (and discussed in our original manuscript), the tricalbins might serve as scaffolds for PS transfer proteins, including the Osh6 and Osh7 proteins, under stress conditions. Osh6 and Osh7 are recruited to ER-PM contacts through interactions with the Ist2 tether protein where they mediate PS delivery to the PM under non-stress conditions (D’Ambrosio et al., 2020). However, strong lines of evidence suggest that Ist2, Osh6, and Osh7 are not required for PS homeostasis at the PM under stress conditions. First, loss of Ist2 has no impact on the PS probe under heat stress conditions (this result will be included in the revised manuscript). Therefore, the Ist2-Osh6/7 interaction is not required for PS homeostasis upon heat stress. Ist2 is not required for PM integrity upon heat stress either (Omnus et al., 2016).

      These results do not rule out the possibility that the tricalbins serve as scaffolds for other PS transfer proteins, such as Osh6 and Osh7, under stress conditions. In this scenario, a switch between Osh6/7 tethering proteins may occur: from Ist2 under normal growth conditions to the Tcb proteins during stress conditions. However, our findings also suggest that Osh6 and Osh7 function is impaired upon stress conditions, including heat and nutrient starvation. Notably, Osh6 and Osh7 become mis-localized from the PM under upon heat stress (we can provide this data in the revised manuscript). The mechanisms for Osh6/7 attenuation upon stress conditions is outside the scope of our current study, but our preliminary results suggest that changes in cytoplasmic pH and ion homeostasis are involved; this will be a focus of a future study. This is also in line with results from a previous study (Omnus et al., 2020) that showed Osh3 forms intracellular aggregates in response to heat stress. The activity of the Osh proteins and other lipid transfer proteins, in general, may be impaired upon stress conditions (see below).

      To directly determine whether Osh6 and Osh7 are required for PS homeostasis at the PM under heat stress conditions, we will monitor the distribution of the PS probe in cells lacking the Osh6 and Osh7 (osh6 osh7 double mutant cells) upon heat stress. This is a key experiment that will directly address the reviewer’s concerns.

      As suggested by the reviewer, we can also examine the expression and localization of GFP-tagged Osh6 and/or Osh7 in tcb1/2/3, scs2/22 ist2, and ‘delta tether’ mutant cells. However, we have not observed any changes in other Osh proteins, including Osh2 and Osh3, in the ‘delta tether’ mutant strain.

      Finally, we have considered yet another possibility. PS transfer to the PM may be generally attenuated under heat stress conditions and a key role of the Tcb proteins may be to bind and stabilize PS at the PM. In other words, the Tcb proteins may act as a ‘buttress’ to stabilize and maintain pre-existing pools of PS at the PM under stress conditions. Consistent with this idea, the Tcb3 C2 domains are required for PS homeostasis and PM integrity upon heat stress. If the Tcb3 SMP domain mutants are not impaired in PS homeostasis or PM integrity, then this alternative mechanism may be very relevant to PM quality control and organization in response to membrane stress. To address this possibility, we will address whether PS is internalized or removed (extracted) from the PM upon heat stress in cells lacking the Tcb proteins. This may uncover a novel role of the Tcb proteins that is independent of SMP domain-mediated lipid transfer.

      Figure 5A. It is not obvious that the intensity of C2 decreases in tcb null cells at 42 degree. Perhaps there is more internal staining.

      Response: We thank the reviewer for pointing out this issue. We think Figures 5A and 5B together convincingly show increased cytoplasmic localization of the PS reporter in the tcb1/2/3 mutant cells upon heat stress. More importantly, we thank the reviewer for pointing out the increased localization at internal membrane compartments. We realized it would be important to identify the PS-containing membrane compartments in the tcb1/2/3 mutant cells upon heat stress. We have now confirmed that the PS probe localizes at the ER and vacuole membrane (to be included in the revised manuscript). The example in Figure 5A also shows accumulation of the PS probe at both the nuclear ER and vacuole membrane. Thus, whilst wild type cells show little PS reporter localization at the ER or vacuole membrane, loss of the tricalbin proteins leads to an increase in ER and vacuole membrane PS probe localization after heat shock. Accumulation at the ER may reflect impaired PS delivery from the ER to the PM, and possibly rerouting to the vacuole membrane. Alternatively, as discussed above, vacuole membrane localization may be due increased PS removal from the PM and delivery to the vacuole membrane in tcb1/2/3 cells upon heat stress.

      It is important to determine the primary function of Tcb3 since both defects in both PIP2 and PS were observed. If the change in PIP2 is due to a lack of PS, can overexpressing Osh6/7 rescue the PIP2 defect in the tetherless mutant?

      Response: We agree that is important to determine whether the primary function of the Tcb proteins is regulation of PS or PI(4,5)P2 homeostasis. Our new findings definitively indicate that their primary function is PS regulation, not PI(4,5)P2 regulation. A clear effect in the distribution of the PS reporter was observed following heat shock in the tcb1/2/3 mutant cells. In contrast, there is no difference in the localization of the PI(4,5)P2 reporter in tcb1/2/3 cells compared to wild type after heat shock (also see response to reviewer #1, comment 3). In addition, cells lacking the Tcb proteins were not impaired in heat-induced PI(4,5)P2 synthesis, as assessed by metabolic labelling and HPLC analysis. These findings will be included in our revised manuscript, as they indicate that the Tcb proteins primarily control PS at the PM, not phosphoinositides, and thus provide new insight into the main role of these proteins.

      Both PS and sterol are required for the proper recruitment of type I PIP5K to the PM (Nishimura et al., 2019). Therefore, defects in PS distribution could be responsible for the PI(4,5)P2 effects observed in Figure 2. However, overexpression of Osh7 in the ‘delta tether’ mutant did not significantly rescue localization of the PI(4,5)P2 reporter (included in the revision plans). Sterol organization is also perturbed in the ‘tether’ mutant cells (Quon et al., 2018), and this may explain why Osh7 expression did not rescue. Accordingly, Osh2/3/4 (sterol transfer proteins) rescue the PI(4,5)P2 effects.

      The detection of PS by LactC2 has been well-established. However, an alternative approach would be to use the 2XPH in permeabilized cells. See PMID: 33929485 for some detailed discussions on the techniques. It is not a requirement for the authors to adopt the 2XPH.

      Response: We thank the reviewer for suggesting another technique to confirm the results of the LactC2 domain as a PS probe. In this study, we have primarily used a genetically encoded LactC2 probe to observe PS distribution within live, intact cells. Whilst this approach was sufficient to identify accumulation of PS on cytosolic membrane leaflets of the ER and vacuole (see above), the addition of an exogenous probe to permeabilized cells may allow the detection of PS on luminal and extracellular membrane leaflets. Therefore, we plan to repeat our heat shock experiments using permeabilized cells and a purified tagged form of the LactC2 protein. This may allow for improved imaging of intracellular PS localisation and bilayer distribution. However, these experiments are technically challenging, and fixation and permeabilization conditions have not yet been optimized for yeast cell experiments. It is not yet known whether we will be able to optimize these protocols in a reasonable amount of time while completing revisions to the manuscript.

      **Minor:**

      1. the discussion seems to be a bit long

      Response: We will shorten the discussion and modify our final conclusions based on the results from the new experiments.

      Reviewer #3 (Significance (Required)): These proteins are highly important in cell biology/contact sites. The redundancy made it difficult to pinpoint their function. Previous studies have had a number of models. The current study proposed a new function of these proteins, i.e. PS transfer, and this is very interesting and valuable. There will be a good audience for this work. I specialize in lipid storage and trafficking, lipid droplets, cholesterol and phosphatidylserine.

      Response: We are pleased that this expert reviewer found our study to be “very interesting and valuable”.

    1. Author Response:

      Reviewer #1:

      The paper by -Blackwell et al. develops the ideas developed in the influential paper by Dash et al. (2017) which defined a similarity matrix for CDR3s TCRdist which is based on a weighted combination of local and global similarity measurements. In this paper they use the metric to develop the idea of a meta-clonotype, a set of similar TCRs which enrich for TCRs directed at the same antigen. They demonstrate that these meta-clonotypes show greater publicity than individual clonotypes, and show evidence of HLA-restriction. The authors speculate that the metaclonotype may be a useful biomarker. They provide open-access software tools for defining meta-clonotypes in antigen enriched repertoires.

      The major findings are: (1) Meta-clonotypes are more public than clonotypes, a result which seems not unexpected, given that meta-clonotypes include many different sequences; (2) Meta-clonotypes show evidence of HLA restriction, again predicted given the well-established fact that specific antigens can be recognised by sets of similar TCRs.

      The concept of a metaclonotype is an interesting one which could have widespread use in analysis of TCR repertoire. However, the impact could be much greater, by sharpening the focus of the paper, and adding detail and clairty to the idea of teh clonotype. In particular, while the introduction correctly points out that prediction of SARS-Cov_2 clinical outcome, or better understanding of the role of coronavirus prior exposure in determining outcome are important unanswered questions, this paper does not address these questions.

      Thank you for your careful review of the manuscript. We have submitted a major revision with greater focus on the definition of a TCR meta-clonotype. We have removed from the introduction much of the background about SARS-CoV-2 and potential implications for the pandemic. In its place we’ve added greater detail about meta-clonotypes, how they can be defined from antigen-enriched TCR data, and how they can be used to analyze bulk TCR sequencing data.

      A substantial portion of the paper is devoted to analysing data obtained using the MIRA assay (Klinger et al PLoS One 10 :e0141561) to define SARS-COV-2 responses, and it is not always clear whether the objective is to evaluate the accuracy of this data set, or to test the power of the meta-clonotype approach.

      Our objective with the analyses of the IMMUNEcode dataset (Nolan et al. 2020) dataset, using MIRA method from Klinger et al. 2015, is to demonstrate that TCR meta-clonotypes can be defined from antigen-enriched TCR data and that they can be used to identify and quantify antigen-specific TCRs in bulk sequenced data. Furthermore, the analysis provides evidence that meta-clonotypes have greater publicity than individual clonotypes, thus increasing sensitivity of detection for antigen-specific TCRs. Using bulk repertoires from COVID-19 patients we then demonstrated that population-level analysis can be made possible using meta-clonotypes and provided supporting evidence that the antigen specificity of the centroid TCR is retained. These analyses and their interpretation is further revised on lines 319-348. We think that in the process of evaluating meta-clonotypes, our analysis also shows that the publicly released data contains valuable information about SARS-CoV-2 TCR specificities; however, we have not systematically attempted to verify the validity of the dataset. In the current revision these objectives are made clear in the revised Introduction section.

      Reviewer #2:

      Summary of main aims: The main aim of this paper is to build a framework for TCR meta-clonotypes for finding similar TCRs across individuals (or different repertoires). The majority of the investigations performed in this work have the objective of showing the data properties of meta-clonotypes as well as the metaclonotypes' usefulness for the analysis of antigen-specific TCR data and disease-labeled immune repertoire data.

      Major strengths: Building meta clonotypes is a possible path towards a better coverage of immune repertoire biology as well as inter-individual repertoire comparison. TCRdist3 is an efficient method for building meta-clonotypes that enables the study of the specific characteristics of meta-clonotypes. So, far clusters of similar sequences have not been investigated in depth. The author team is making a significant step forward in this direction by characterizing meta clonotypes in differentially antigen-specific-clone-enriched repertoires and by relating the results to generation probability, HLA, sex and immune status.

      Major weaknesses: Although the authors show a significant amount of data, I am not sure if these data convey sufficient intuition about the characteristics and behavior of meta-clonotypes. The authors seem too focused on relating meta-clonotypes to immune status instead of focusing on the specific biological characteristics of metaclonotypes.

      We agree and have shifted the focus of the manuscript away from the results of the application and towards providing greater detail about the characteristics and behaviors of meta-clonotypes; for example, we’ve removed much of the background about SARS-CoV-2 and the COVID-19 cohorts and we’ve added details about how the meta-clonotype radius can be optimized. We’ve also reframed the data analysis section to emphasize that the results demonstrate how meta-clonotypes carry important antigen-specific signals above and beyond individual clonotypes; this makes the results valuable beyond the application to SARS-CoV-2. For example, while demonstrating HLA restriction occurs in SARS-CoV-2 specific T cell responses in COVID-19 patients is not a surprising finding, it provides evidence that meta-clonotypes enable quantification of an antigen-specific and HLA-restricted T cell response from a bulk single-chain TCR repertoire. We use this example analysis to compare the strength of this signal in individual clonotypes, meta-clonotypes with radius alone and meta-clonotypes with a motif constraint. The revision of lines 98-100 and lines 461-470 provide clarity about this motivation for the analysis and interpretation of the results.

      Furthermore, we have added a section to the Results that demonstrates how meta-clonotypes and tcrdist3 enable analyses that can provide biological insights about the biochemical properties that may confer antigen specificity (lines 383-418, Figure 10). Since meta-clonotypes define groups of sequences, we can use CDR3 logo plots to dissect how positions and amino acid properties in the CDR3 define the group. In the revision we demonstrate a “background-adjusted” logo plot, that is able to emphasize amino acids that define the meta-clonotype, yet are uncommon among TCRs using the same V and J genes. Visualizing the results in this way can generate hypotheses that can be experimentally validated about the amino acids that are essential for antigen recognition.

      Furthermore, the authors fail to convincingly show that the background repertoires chosen for meta clonotypes are robust and to what extent meta-clonotypes are sensitive to changes in the background repertoire.

      We agree that it is important to understand how the creation of meta-clonotypes, and specifically optimization of the radius, depends on the background repertoire. Therefore, we conducted sensitivity analyses varying the size (25K to 1M) and makeup (synthetic OLGA vs. cord blood vs. a blend) of the background (lines 259-294). We also empirically demonstrate the value of over-sampling background TCRs with matching V and J genes. We show that using a background of 200,000 TCRs was sufficient for reducing the bias and variability in selecting a meta-clonotype radius, compared to a reference set of 2 M background TCRs; this is important because while it is tractable to use large backgrounds for a small number of meta-clonotypes, for larger studies or analyses confined to a laptop, the smaller background set is sufficient; we also show that this can largely be attributed to the gain in efficiency that comes with using a background that includes synthetic OLGA TCRs with VJ-gene frequencies that match the TCRs included in the meta-clonotype. However, we note that “Ultimately, the best choice for the background may depend on the question being asked and the data that is available, with factors including donor HLA, age, potential antigen exposures, and other factors that may shape the repertoire.” Our goal with tcrdist3 was to make it easy for the user to customize the background to the scientific question.

      The authors also do not convincingly differentiate themselves from previous approaches that have used network analysis and generation probability in order to find clusters of similar sequences (very much conceptually similar to the approach taken here).

      We agree it’s important to communicate how meta-clonotypes differ from existing TCR analysis approaches. There are several important distinctions with the existing methods that use networks and generation probability, namely TCRNET and ALICE. We have highlighted these differences in the Discussion (lines 483-497), quoting:

      “The meta-clonotype approach also differs from methods, such as TCRNET (Ritvo et al., 2018) and ALICE (Pogorelyy et al., 2019), that seek to identify TCRs sharing antigen-specificity within bulk repertoires. These methods were developed to identify TCR nodes in a network with an enriched number of edges compared to the expected number of edges in a background (TCRNET) or derived from a probability model (ALICE). Similarly, another recent method attempts to find antigen-associated sequences in bulk repertoires using a two-stage agglomerative clustering of a k-mer based representation of CDRs, first within and then across bulk repertoires (Yohannes et al., 2021). Our framework is designed for a different task than these algorithms. Specifically, we sought to construct definitions of TCR groupings among already antigen-associated TCRs, which would have high sensitivity and specificity for finding similar TCRs in bulk repertoires. This is an important distinction because the existing network-enrichment methods would simply find that most or all of the TCR groupings among a set of antigen-associated sequences were statistically enriched compared with their frequency in antigen-naïve repertoires. By contrast, a flexible meta-clonotype radius permits the definition of the largest possible group of antigen-associated TCRs with the constraint that the likelihood of finding a TCR within the radius in an antigen-naïve background is equally low across all meta-clonotypes.”

      Finally, the authors do not provide detailed descriptions of how the comparison of meta-clonotypes across repertoires is handled as well as potential sequence redundancies across meta-clonotypes (in potentially different individuals). I believe that all of the perceived shortcomings are readily addressable in a revision.

      You are quite right that many meta-clonotypes are overlapping in that a single TCR might conform to more than one meta-clonotype definition. Thus, in the application of meta-clonotypes to the COVID-19 dataset we tested each meta-clonotype individually for an association with the predicted HLA- restricting genotype. Depending on the context, if a summary across meta-clonotypes is required (e.g. finding the overall abundance of conformant TCRs in a repertoire) it may be appropriate to use meta- clonotypes to identify conformant sequences, but then tally them based on actual abundance (i.e. no double counting). In a prediction context, it may be desirable to have overlapping meta-clonotype features, and in fact many machine learning algorithms excel in this regime. With tcrdist3 we have incorporated a “join” functionality that allows for relational database-style joining of meta-clonotypes with a TCR repertoire; this makes it relatively easy to eliminate or keep redundancies, depending on the context. We have added a sentence to the Discussion pointing out that there is overlap among meta-clonotypes that needs to be considered in their application and we provide a link to an example of how to use the join functionality on https://tcrdist3.readthedocs.io/en/latest/join.html#step-by-step-example.

      All in all, this manuscript is an important steps towards a better understanding of immune receptor biology. tcrdist3 is an evolution of a previously published method (tcrdist) that is here used to build meta-clonotypes. After reading the paper, it remains slightly unclear (addressable in a revision) as to how useful they are for understanding repertoire biology as well as how to use them in practice in terms of robustness and sensitivity.

      Thank you for your constructive comments, and I hope we’ve addressed these issues around biological interpretability and application in the revision.

      Reviewer #3:

      Mayer-Blackwell et al introduce a new framework for leveraging antigen-annotated T cell receptor (TCR) sequencing data to search for similar TCRs in bulk repertoire data, which potentially recognise the same antigen peptide. They introduce the notion of meta-clonotype, a T cell receptor (TCR) feature consisting of a main TCR sequence ("centroid") and a distance radius around it (+/- a CDR3 motif), with distance measured according to their previously published TCRdist method (Dash et al, 2017). The meta-clonotypes benefit from increased publicity over exact clonotype matching, and enhance the ability to find potentially relevant TCRs in repertoires from unrelated individuals, which are usually highly diverse, predominantly private, and subject to sampling constraints. The idea of meta-clonotypes is very interesting, and will provide a very useful tool in future repertoire analyses. For example, public databases of annotated TCRs (e.g. VDJdb) can be used to derive the set of meta-clonotypes for a variety of antigens, which can in turn be searched for in bulk repertoire data to identify e.g. memory to previous antigen exposure, immune status etc.

      The tool for performing the analysis, tcrdist3, is open-source, well-documented with instructions and examples, and the statistical analysis has been well-thought out. It is also useful to have the comparison to the current alternative of k-mer based TCR distance (i.e. GLIPH2), and the added flexibility for the user to define the precise distance metric to be used in the tcrdist3 tool.

      The authors then apply their method to analyse TCR beta sequences from COVID-19 datasets that have been publicly released by Adaptive Biotechnologies through the immuneRACE project. They use the MIRA set, the peptide-enriched set, to identify the meta-clonotypes, and then search for these in an independent cohort of COVID-19 bulk repertoires from 694 individuals. The authors find that a large proportion of the meta-clonotypes were more abundant in patients expressing the relevant restricting HLA allele, and suggest this could potentially lead to the development of disease biomarkers. The set of sars-cov-2 related meta-clonotypes is a useful resource in itself, as researchers generating other COVID-19 TCR datasets will be able to utilize this set of meta-clonotypes to search and potentially stratify patients in their own generated data.

      There are a few areas were further detail / examples would strengthen the paper's claims, in particular in the application of the tcrdist3 method to the COVID-19 data.

      1) Bulk TCR data from repertoires with past antigen exposure are likely to contain varying sizes of clones due to the proliferation of responding T cells and a remaining memory population. Due to the sharp drop in size between a TCR sequencing sample and the entire repertoire, clones above a particular size relative to the sample size are highly unlikely to have been sampled by chance, and identifying significantly/meaningfully expanded clonotypes in a sample is often used to identify a potentially antigen-recognising set of TCRs. The authors demonstrate the detection of meta-clonotypes in the repertoire sets, but it is somewhat unclear how the abundance of a clonotype conforming to a particular meta-clonotype is addressed. For example, there may be rationale for treating the following cases differently: meta-clonotype A is instantiated by (i) a unique clonotype with abundance 1; (ii) a single clonotype with abundance 1000; (iii) 100 different clonotypes (i.e. a "dense neighbourhood" around this meta-clonotype). If used to develop biomarkers, perhaps some degree of granularity in how the frequency/occurrence of meta-clonotypes is calculated would be helpful here.

      Thanks for this helpful suggestion. We agree that the scenarios you outlined above, which differ in the level of clonal breadth (i.e. number of unique clones), may have great immunological relevance. Though we have not specifically assessed the clinical or immunological relevance of clonal breadth vs. clonal frequency, we have noted in the revision (lines 514-519) that there are multiple ways of counting meta-clonotype conformant sequences and multiple ways of aggregating counts across meta-clonotypes, for example without double counting clones that may be conformant to multiple meta-clonotypes. We have also added a documentation page about how to tabulate abundance or breadth of conforming clones: https://tcrdist3.readthedocs.io/en/latest/join.html

      2) The authors focus their analysis on detecting meta-clonotypes from MIRA sets with strong evidence of HLA-restriction. They report 59.7% of these meta-clonotypes were more abundant in patients expressing the corresponding HLA allele. This means that over 40% of meta-clonotypes with strong HLA restriction were more abundant in repertoires with other HLA types. This point could be further elucidated by comparing results with the control repertoires from the COVID-19 set, from MIRA sets with low evidence of HLA restriction, or combining the sets of low and high evidence of HLA restriction (i.e. HLA agnostic results).

      We’d like to clarify that the results do not imply that 40% of meta-clonotypes were more abundant in participants lacking the restricting HLA allele; rather, these meta-clonotypes did not have a significant association with presence or absence of the HLA genotype. In the discussion we highlight several of the possible explanations for this including that meta-clonotypes were too rarely detected in the population (lines 470-478). The volcano plots in Figure 6A and 7A show that there are very few if any HLA associations of the opposite sign (i.e. meta-clonotypes more abundant in patients without the restricting HLA allele). In fact, at the chosen significance threshold (FDR <0.01), 0 of 1831 predicted HLA-associated meta-clonotype were significantly negatively associated with the predicted HLA.

      3) The MIRA55 set is used as an illustrative example throughout the manuscript, which familiarises the reader with this dataset as they are reading the paper. However, the claims made by the paper about MIRA sets / strong HLA evidence MIRA sets could be strengthened by providing an indication of how measured characteristics of the MIRA55 set compare to the other sets being assessed.

      This is a good point and we have tried to provide as much information about MIRA55 and the other MIRA sets to help establish that MIRA55 is a representative set. Characteristics of the other MIRA sets appear in Supporting Table S6, including:

      • Input number of clonotypes (AA exact)
      • Number of non-redundant, public meta-clonotypes
      • Clonotypes spanned by at least one meta-clonotype
      • Span (% of clonotypes conforming to a meta-clonotype definition)
      • Number of public enhanced sequences that match an identified TCRβ

      As well as summary statistics for other meta-clonotype properties:

      • Pgen
      • Radius (TCRdist units)
      • TRBV-CDR length
      • Number of MIRA subjects contributing at least 1 sequence

      Furthermore, Table S7 provides the strength of evidence for HLA-restriction for each meta-clonotype, which is then summarized by MIRA set visually in Figure 4.

      Based on these criteria, we think MIRA 55 is a reasonably representative set to focus on.

      4) There is some discussion throughout the manuscript about using the sars-cov-2 meta-clonotypes to identify differing clinical outcomes such as disease severity. Perhaps the dataset does not have sufficient power to allow for such sub-analysis, but a method of using meta-clonotypes to differentiate between patients based on the occurrence of meta-clonotypes in their repertoire is not provided [e.g. the number of observed clonotypes, the density distribution around clonotypes etc.)

      That is true. With this manuscript we have tried to focus on establishing the methodology, evaluating the strength of the antigen-specific signal and demonstrating its potential applications; we have tried to make these goals more explicit throughout. Specifically in the revision we note that: “Much like any biomarker study, to establish a TCR-based predictor of a particular outcome, the features must be measured among a sufficiently large cohort of individuals, with a sufficient mix of outcomes.” At this time the publicly available ImmuneRACE data lack negative controls and sufficient clinical details to allow for building a predictor of SARS-CoV-2 infection or disease severity.

    1. Author Response:

      Reviewer #1 (Public Review):

      This paper aims to address the question of whether the rotational dynamics in motor cortex may be due to sensory feedback signals rather than to recurrent connections and autonomous dynamics as is typically assumed. This is indeed a question of importance in neural control of movement.

      The authors employ both analyses of motor cortical data and simulation analyses where a neural network is trained to perform a motor task. For the simulations, the authors use a neural network model of a brain performing arm control tasks. Importantly, in addition to the task goals, the brain also receives delayed sensory feedback from the muscle activity and kinematics of the simulated arm. The brain is modeled either using a stack of two recurrent neural networks (RNN) or using two non-recurrent neural network layers to investigate the importance of autonomous recurrent dynamics. The authors use this framework to simulate the brain performing two tasks: 1) posture perturbation task, where the arm is perturbed by external loads and has to return to original posture, and 2) delayed center-out reach task. In both tasks, the authors apply jPCA to units of the trained network, simulated muscle activity, and simulated kinematics and investigate their rotational dynamics. They find that when using an RNN in the brain model, both the RNN layers and kinematics show rotational dynamics but the muscle activity does not. Interestingly, these conclusions for both tasks also hold when networks without recurrent connections are used instead of the RNNs. Also importantly, the rotational dynamics also exist in the sensory feedback signals about the limb state (e.g. joint position, velocity). These results suggest that recurrent dynamics are not necessary for the emergence of rotational dynamics in population activity, rather sensory feedback can also achieve the same.

      The authors perform similar jPCA analyses on monkey motor cortical (MC) or somatosensory cortical activity during the same two tasks and find largely consistent results. As with simulations, neural population activity and kinematics show rotational dynamics but muscle activity, which is explored only in the posture task, does not. Importantly, population activity in both motor and somatosensory cortices shows rotational dynamics. This observation is more consistent with the view that rotational dynamics emerge due to inter-region communications and processing of sensory feedback and planning, rather than autonomous dynamics within the motor cortex.

      The approach of the paper is interesting and valuable and the questions being addressed are very important to the field. To further improve the paper and the analyses, there are several major comments that should be addressed to fully support the conclusions and clarify the results:

      Major:

      1) In the Methods, the authors explain how they model a non-recurrent network as follows: "We also examined networks where we removed the recurrent connections from each layer by effectively setting Whh, Woo to zero for the entire simulation and optimization (NO-REC networks)". However, if this is the only modification, it still leaves recurrent elements in the network. For example, if we set W_{hh} to zero, equation 2 will be:

      h_{t+1} = (1-a) h_t + a tanh(W_{sh} * s_t + b_h)

      where a is a constant scalar (seems to be equal to 0.5). This is indeed still a recurrent neural network since h_{t+1} depends on ht. If their explanation in the Methods is accurate, then the current approach restricts the recurrent dynamics to be a specific linear dynamic (i.e. "h{t+1} = (1-a) ht + …") but does not fully remove them. The second layer is also similar (equation 3) and will still have recurrent linear dynamics even if W{oo} is set to 0. To be able to describe networks as non-recurrent, the first terms in equations 2 and 3 (that is (1-a)h_t and (1-a)o_t) should also be set to 0. This is critical as an important argument in the paper is that non-recurrent networks can also produce rotational dynamics, so the networks supporting that argument must be fully non-recurrent. Perhaps the authors have already done this but just didn't explain it in the Methods, in which case they should clarify the Methods. However, if the current Method description is accurate, they should rerun their NO-REC simulations by also setting the fixed linear recurrent components (that is (1-a)h_t and (1-a)*o_t) to zero as explained above to have a truly non-recurrent model.

      We thank the reviewer for raising this important concern. We have re-simulated the NO-REC network while removing the dynamics related to the leaky-integration component. This removal did not impact the network’s ability to perform the tasks and yielded virtually identical neural dynamics (see Figure 8). Throughout the Results we have updated the figures for the NO-REC network to the network without the leak-integration component.

      2) Assuming my comment in 1 is addressed and the results stay similar, the authors show in simulations that even without recurrent dynamics (referred to as the NO-REC case), rotational dynamics are observed in the simulated brain during both tasks (Figure 8). This result is used to suggest that the sensory feedback is what causes the rotational dynamics in the brain model in this case. However, I think to fully demonstrate the role of feedback, additional simulations are also needed where the sensory feedback is removed from the brain model. In other words, what would happen if recurrent and non-recurrent brain models are trained to perform the tasks but are not provided with the sensory feedback (only receive task goals)? One would expect the recurrent model to still be able to perform the task and autonomously produce similar rotational dynamics (as has been shown in prior work), but the non-recurrent model to fail in doing the task well and in showing rotational dynamics. I think adding such simulations without the feedback signals would really strengthen the paper and help its message.

      We apologize if the network architecture was not clear. In the case of the NO-REC network the only way they can generate the time-varying signals needed for the tasks is through sensory feedback. The network simply will not work without recurrent AND sensory feedback. For the posture task there are no additional inputs since it only receives sensory feedback. For the reaching task the task-goal input is static and the GO cue turns off on a timescale considerably shorter (~20ms) than the reach duration. Thus, the REC network would always perform better than the NO-REC network when sensory feedback was removed as the NO-REC network cannot generate any dynamics. We have now included in the Results the following statement. "Note, by removing the recurrent connections these networks can only generate time-varying outputs by exploiting the time-varying sensory inputs from the limb." (line 345-347).

      We have also now included simulations to highlight how REC networks that receive sensory feedback are able to generalize better to scenarios with increased motor noise than REC networks where sensory feedback is either completely removed (reaching task) or only provided at the beginning of the trial (posture task) (Figure S8). Thus, sensory feedback makes REC networks more robust in less predictable scenarios.

      We agree that this could be an interesting manipulation and have now included manipulations of the sensory feedback delays. We considered three separate delays, 0ms, 50ms and 100ms and found that there was a dependence on the rotational frequency of the top jPC plane with greater delays resulting in a general reduction in frequency (see now Supplementary Figure 10). There was less effect of delay on fit qualities to the constrained and unconstrained dynamical system. This has been added to the Results section (line 423-446).

      We simulated this scenario and found the answer to be rather complex and we have added these results to the supplementary material. The network's behavioural performance in the perturbation posture task is similar to the previous networks with joint-based feedback. However, the dynamics in the output layer are not the same with a clear reduction in how well the dynamics are described as rotational (Figure S11A-B).

      Oddly, rotational dynamics could still be observed in the input layer dynamics (data now shown) and the kinematic signals when they were converted to a cartesian reference frame (Figure S11D-E). Furthermore, rotational dynamics could emerge in the output layer if we used a different initialization method for the network weights. We initialized weights from a uniform distribution bound from ±1/√N, where N is the number of units. In contrast, previous studies have initialized network weights using a Gaussian distribution with standard deviation equal to g/√N where g is constant larger than 1. This alternative initialization scheme encourages strong intrinsic dynamics often needed for autonomous RNN models (Sussillo et al., 2015). We found networks initialized with this method and trained on the perturbation posture task exhibited stronger rotational dynamics with fits to the constrained and unconstrained dynamical systems of 0.5 and 0.88, respectively (Figure S11C-D). When examining the reaching task, we found similar results (Figure S11F-K). When initialized with a uniform distribution, fit quality for the constrained and unconstrained dynamical systems were 0.4 and 0.77, respectively (Figure S11F-G), which were smaller than for the joint-based feedback (Figure 7B, constrained R2=0.7, unconstrained R2=0.83). Qualitatively, the dynamics were different when the network was initialized with a Gaussian distribution (Figure S11H), however fit qualities were comparable between the two initialization methods (Figure S11 I). There was also a noticeable reduction in the fit quality for the kinematic signals particularly for the constrained dynamical system (Figure S11K, constrained R2=0.36, unconstrained R2=0.77). These findings have been added to the Results

      3) A measure of how well each trained network is able to perform the task should be provided. For example, is the non-recurrent network able to perform the tasks as accurately as the recurrent models? The authors could use an appropriate measure, for example average displacement in the posture task and time-to-target in the center-out task, to objectively quantify task performance for each network. Another performance measure could be the first term of the loss in equation 5. Also, plots of example trials that show the task performance should be provided for the non-recurrent networks (for example by adding to Figure 8), similar to how they are shown for the recurrent models in Figures 2 and 6.

      We have now presented and quantified the NO-REC network behavioural performance. Kinematics for the NO-REC network are shown in Figure S7A-C and E-G which are comparable to the REC network. Furthermore, quantifying the maximum displacement during the posture task yielded no obvious differences between the NO-REC and REC networks (Figure S7D). For the reaching task, the time-to-target was noticeably more variable and tended to be slower for the NO-REC network (Figure S7H). These observations have been added to the Results.

      4) An important observation is that rotational dynamics also exist in the sensory signals about the limb state. This may imply that the task structure that dictates the limb state and thus the associated sensory feedback may play an important role in the rotations without the recurrent connections. While the present study will be a valuable addition regardless of what the answer is, this is an important point to address: What is the role of the task structure in producing rotational dynamics? In both the posture task and the center-out task, the task instruction instructs subjects to return to the initial movement 'state' by the end of the trial: in the posture task the simulated arm needs to return to the original posture upon disturbance, and in the center out task the arm needs to start from zero velocity and settle at the target with zero velocity. Is this structure what's causing the rotational dynamics? This is an important question both for this paper and for the field and the authors have a great simulation setup to explore it. For example, what happens if the task instructions u* instruct the arm to follow a random trajectory continuously, instead of stopping at some targets? With a simulated tracking task like this, one could eliminate obvious cases of return-to-original-state from the task. Would the network still produce rotational dynamics? Of course, I don't expect the authors to collect experimental monkey data for such new tasks, rather to just change the task instructions in their numerical simulations to explore the dependence of observed rotational dynamics on the task structure. I think this will help the message of the paper and can be very useful for the field.

      We agree that a tracking task would be an interesting manipulation and have simulated this with the REC and NO-REC networks (Figure 9). Here, we trained up the network to reach from the starting position and track a target moving radially at a constant velocity for the rest of the trial (1.2seconds). Thus, the network has to move the limb at a constant velocity. We found there was a consistent reduction in how well the network’s dynamics (constrained R2=0.13, unconstrained R2=0.3) were described as rotational when compared to the previous reaching task (Figure 7, constrained R2=0.7, unconstrained R2=0.83). Also, note that this reduction in rotational dynamics remained even when we initialized the network weights using a Gaussian distribution (see Essential revision 2.3). These simulations have been added to the Results section.

      5) It would be beneficial if the authors could elaborate in the discussion on intuitive explanations of why sensory feedback can produce rotational dynamics even with no internal recurrent dynamics in the brain model. To me, it seems like sensory feedback is providing a path for recurrence to exist in the overall brain-arm system, so the non-recurrent neural networks can learn to exploit that path to effectively implement some recurrent dynamics. Some intuitive explanations like this will be helpful for readers.

      The main reason why rotational dynamics emerge in sensory feedback is due to the phase offset between the joint position and velocity as changes first occur in the velocity followed by position (see pendulum example Pandarinath et al., 2018a also DeWolf et al., 2016; Susilaradeya et al., 2019). This phase offset is maintained across reach directions and gives rise to the orderly rotational dynamics observed in the kinematic signals (DeWolf et al., 2016; Pandarinath et al., 2018a; Susilaradeya et al., 2019; Vyas et al., 2020). Furthermore, the tracking task disrupted this phase relationship and thus the rotational dynamics were substantively reduced in the network models. This text has been added to the Discussion (lines 519-526).

      6) One main result in data from non-human primates is that there exist rotations also in the somatosensory cortex not just in motor cortex. A more thorough discussion of prior work on rotational dynamics or lack thereof across brain regions and behavioral tasks is important to add here. For example, besides the works cited by the authors, there are other works such as (Kao et al., 2015; Gao et al., 2016; Remington et al., 2018; Stavisky et al., 2019; Aoi et al., 2020; Sani et al., 2021) that discuss or show rotational dynamics in various brain regions and behavioral tasks and should be cited and discussed.

      We have cited the above papers and included in the Discussion the following paragraph (lines 537-549) “Importantly, findings of rotational dynamics in cortical circuits are not trivial. Activity in the supplementary motor area does not exhibit rotational dynamics during reaching (Lara et al., 2018). The hand area of MC also does not exhibit rotational dynamics during grasping-only behaviour (Suresh et al., 2020), though it does exhibit rotational dynamics during reach-to-grasp (Abbaspourazad et al., 2021; Rouse and Schieber, 2018) which may reflect the reaching component of the behaviour. More broadly there is a growing body of work characterizing cortical neural dynamics across different behavioural tasks which have revealed rotational (Abbaspourazad et al., 2021; Aoi et al., 2020; Libby and Buschman, 2021; Remington et al., 2018; Sohn et al., 2019; Stavisky et al., 2019), helical (Russo et al., 2020), stationary (Machens et al., 2010), and ramping dynamics (Finkelstein et al., 2021; Kaufman et al., 2016; Machens et al., 2010) and these dynamics appear to support various classes of computations. Thus, finding rotational dynamics across the fronto-parietal circuit in our study is not trivial."

      7) The authors state that "In contrast, rotational dynamics appear to be absent in… MC activity during grasping driven by sensory inputs (Suresh et al., 2020)." There are other papers that study dynamics during reach-grasps and still finds rotational dynamics and modes (Abbaspourazad et al., 2021; Vaidya et al., 2015) and should be cited and discussed. The recent paper on naturalistic reach-grasps (Abbaspourazad et al., 2021) also argues for the involvement of a large-scale network in these movements, which further supports the authors' interpretation that "This interpretation of motor control emphasizes that the objective of the motor system is to attain the behavioural goal and this requires feedback processed by a distributed network." A discussion of this point made in this recent paper in the intro/discussion is important. Finally, there is a recent paper that argues for the input-driven nature of motor cortex (Sauerbrei et al., 2020) and is cited/discussed by the authors but briefly and mainly in the discussion. I think given the relevance of this recent paper to the core message here, it should also be briefly discussed in the introduction to better set up the work.

      We agree with the reviewer that there are discrepancies between the motor cortical dynamics reported by Suresh et al. 2020 and Abbaspourazad et al., 2021 during grasping tasks. This difference may reflect differences in task as in Suresh et al. 2020 the monkeys grasped objects whereas in Abbaspourazad et al., 2021 monkeys had to reach and grasp objects. Thus, rotations may reflect the reaching component of the behaviour. This has been elaborated on in the Discussion which now reads (lines 539-542) “The hand area of MC also does not exhibit rotational dynamics during grasping-only behaviour (Suresh et al., 2020), though it does exhibit rotational dynamics during reach-to-grasp (Abbaspourazad et al., 2021; Rouse and Schieber, 2018; Vaidya et al., 2015) which may reflect the reaching component of the behaviour.”.

      We have also briefly mentioned the findings by Sauerbrei et al. 2020 in the Introduction which now reads (line 79-81) “Lastly a recent study demonstrates that motor cortical dynamics are driven by inputs coming from motor thalamus (Sauerbrei et al., 2020)."

      Minor:

      1) The Methods are clear and comprehensive, but just to make understanding of the simulation setup easier, it would help to have a diagram of the computation graph for the recurrent and non-recurrent networks that shows their number of units, activations/nonlinearities, RNN cell type, etc., added as supplementary figure.

      We agree that this is useful and have added it to Figure 1

      2) Again, to help more clearly convey the simulations, it would help to show the task goals (x*) that are inputs to the simulated brain for example trials in each task (for example added to Figures 2 and 6).

      We agree that this is useful and have added it to Figure 1

      3) Similar to how VAF is shown on top of all plots of jPC planes, it would be helpful to have the rotation frequency for each jPC plane noted next to it. Currently it is difficult to find the jPC frequency associated with each plot from the text.

      We agree and have added it to the appropriate figures

      4) I am a bit surprised by how different the null distributions are for modeling muscle activity (Figure 3F) and kinematics (Figure 3H). The null distribution is simply the R2 for a constrained or unconstrained dynamic model fit to a subsampled version of the neural activity. The only difference between the null distributions in Figure 3F and Figure 3H seems to be the downsampled dimension, which for muscle activity is 6 and for kinematics is 4 (per equation 1). Any insight will be welcome as to why down sampling the population activity to 4 (Figure 3H) results in so much worse R2 compared with down sampling it to 6 (Figure 3F)?

      We thank the reviewer for raising this concern. Originally, we had applied PCA to reduce the dimensionality of the kinematic signals from 4 dimensions to 2, and the muscle signals from 6 to 4. We realize now that to be more conservative in our significance testing, we should use the full dimensionality of the kinematic and muscle signals. As such, we have changed the figures throughout to reflect this.

      References:

      Abbaspourazad, H., Choudhury, M., Wong, Y.T., Pesaran, B., Shanechi, M.M., 2021. Multiscale low-dimensional motor cortical state dynamics predict naturalistic reach-and-grasp behavior. Nature Communications 12, 607. https://doi.org/10.1038/s41467-020-20197-x

      Aoi, M.C., Mante, V., Pillow, J.W., 2020. Prefrontal cortex exhibits multidimensional dynamic encoding during decision-making. Nature Neuroscience 1-11. https://doi.org/10.1038/s41593-020-0696-5

      Gao, Y., Archer, E.W., Paninski, L., Cunningham, J.P., 2016. Linear dynamical neural population models through nonlinear embeddings, in: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (Eds.), Advances in Neural Information Processing Systems 29. Curran Associates, Inc., pp. 163-171.

      Kao, J.C., Nuyujukian, P., Ryu, S.I., Churchland, M.M., Cunningham, J.P., Shenoy, K.V., 2015. Single-trial dynamics of motor cortex and their applications to brain-machine interfaces. Nature Communications 6, 7759. https://doi.org/10.1038/ncomms8759

      Remington, E.D., Narain, D., Hosseini, E.A., Jazayeri, M., 2018. Flexible Sensorimotor Computations through Rapid Reconfiguration of Cortical Dynamics. Neuron 98, 1005-1019.e5. https://doi.org/10.1016/j.neuron.2018.05.020

      Sani, O.G., Abbaspourazad, H., Wong, Y.T., Pesaran, B., Shanechi, M.M., 2021. Modeling behaviorally relevant neural dynamics enabled by preferential subspace identification. Nature Neuroscience 24, 140-149. https://doi.org/10.1038/s41593-020-00733-0

      Stavisky, S.D., Willett, F.R., Wilson, G.H., Murphy, B.A., Rezaii, P., Avansino, D.T., Memberg, W.D., Miller, J.P., Kirsch, R.F., Hochberg, L.R., Ajiboye, A.B., Druckmann, S., Shenoy, K.V., Henderson, J.M., 2019. Neural ensemble dynamics in dorsal motor cortex during speech in people with paralysis. eLife 8, e46015. https://doi.org/10.7554/eLife.46015

      Vaidya, M., Kording, K., Saleh, M., Takahashi, K., Hatsopoulos, N.G., 2015. Neural coordination during reach-to-grasp. Journal of Neurophysiology 114, 1827-1836. https://doi.org/10.1152/jn.00349.2015

    2. Reviewer #1 (Public Review):

      This paper aims to address the question of whether the rotational dynamics in motor cortex may be due to sensory feedback signals rather than to recurrent connections and autonomous dynamics as is typically assumed. This is indeed a question of importance in neural control of movement.

      The authors employ both analyses of motor cortical data and simulation analyses where a neural network is trained to perform a motor task. For the simulations, the authors use a neural network model of a brain performing arm control tasks. Importantly, in addition to the task goals, the brain also receives delayed sensory feedback from the muscle activity and kinematics of the simulated arm. The brain is modeled either using a stack of two recurrent neural networks (RNN) or using two non-recurrent neural network layers to investigate the importance of autonomous recurrent dynamics. The authors use this framework to simulate the brain performing two tasks: 1) posture perturbation task, where the arm is perturbed by external loads and has to return to original posture, and 2) delayed center-out reach task. In both tasks, the authors apply jPCA to units of the trained network, simulated muscle activity, and simulated kinematics and investigate their rotational dynamics. They find that when using an RNN in the brain model, both the RNN layers and kinematics show rotational dynamics but the muscle activity does not. Interestingly, these conclusions for both tasks also hold when networks without recurrent connections are used instead of the RNNs. Also importantly, the rotational dynamics also exist in the sensory feedback signals about the limb state (e.g. joint position, velocity). These results suggest that recurrent dynamics are not necessary for the emergence of rotational dynamics in population activity, rather sensory feedback can also achieve the same.

      The authors perform similar jPCA analyses on monkey motor cortical (MC) or somatosensory cortical activity during the same two tasks and find largely consistent results. As with simulations, neural population activity and kinematics show rotational dynamics but muscle activity, which is explored only in the posture task, does not. Importantly, population activity in both motor and somatosensory cortices shows rotational dynamics. This observation is more consistent with the view that rotational dynamics emerge due to inter-region communications and processing of sensory feedback and planning, rather than autonomous dynamics within the motor cortex.

      The approach of the paper is interesting and valuable and the questions being addressed are very important to the field. To further improve the paper and the analyses, there are several major comments that should be addressed to fully support the conclusions and clarify the results:

      Major:

      1) In the Methods, the authors explain how they model a non-recurrent network as follows: "We also examined networks where we removed the recurrent connections from each layer by effectively setting Whh, Woo to zero for the entire simulation and optimization (NO-REC networks)". However, if this is the only modification, it still leaves recurrent elements in the network. For example, if we set W_{hh} to zero, equation 2 will be:

      h_{t+1} = (1-a) * h_t + a * tanh(W_{sh} * s_t + b_h)

      where a is a constant scalar (seems to be equal to 0.5). This is indeed still a recurrent neural network since h_{t+1} depends on h_t. If their explanation in the Methods is accurate, then the current approach restricts the recurrent dynamics to be a specific linear dynamic (i.e. "h_{t+1} = (1-a) * h_t + ...") but does not fully remove them. The second layer is also similar (equation 3) and will still have recurrent linear dynamics even if W_{oo} is set to 0. To be able to describe networks as non-recurrent, the first terms in equations 2 and 3 (that is (1-a)*h_t and (1-a)*o_t) should also be set to 0. This is critical as an important argument in the paper is that non-recurrent networks can also produce rotational dynamics, so the networks supporting that argument must be fully non-recurrent. Perhaps the authors have already done this but just didn't explain it in the Methods, in which case they should clarify the Methods. However, if the current Method description is accurate, they should rerun their NO-REC simulations by also setting the fixed linear recurrent components (that is (1-a)*h_t and (1-a)*o_t) to zero as explained above to have a truly non-recurrent model.

      2) Assuming my comment in 1 is addressed and the results stay similar, the authors show in simulations that even without recurrent dynamics (referred to as the NO-REC case), rotational dynamics are observed in the simulated brain during both tasks (Figure 8). This result is used to suggest that the sensory feedback is what causes the rotational dynamics in the brain model in this case. However, I think to fully demonstrate the role of feedback, additional simulations are also needed where the sensory feedback is removed from the brain model. In other words, what would happen if recurrent and non-recurrent brain models are trained to perform the tasks but are not provided with the sensory feedback (only receive task goals)? One would expect the recurrent model to still be able to perform the task and autonomously produce similar rotational dynamics (as has been shown in prior work), but the non-recurrent model to fail in doing the task well and in showing rotational dynamics. I think adding such simulations without the feedback signals would really strengthen the paper and help its message.

      3) A measure of how well each trained network is able to perform the task should be provided. For example, is the non-recurrent network able to perform the tasks as accurately as the recurrent models? The authors could use an appropriate measure, for example average displacement in the posture task and time-to-target in the center-out task, to objectively quantify task performance for each network. Another performance measure could be the first term of the loss in equation 5. Also, plots of example trials that show the task performance should be provided for the non-recurrent networks (for example by adding to Figure 8), similar to how they are shown for the recurrent models in Figures 2 and 6.

      4) An important observation is that rotational dynamics also exist in the sensory signals about the limb state. This may imply that the task structure that dictates the limb state and thus the associated sensory feedback may play an important role in the rotations without the recurrent connections. While the present study will be a valuable addition regardless of what the answer is, this is an important point to address: What is the role of the task structure in producing rotational dynamics? In both the posture task and the center-out task, the task instruction instructs subjects to return to the initial movement 'state' by the end of the trial: in the posture task the simulated arm needs to return to the original posture upon disturbance, and in the center out task the arm needs to start from zero velocity and settle at the target with zero velocity. Is this structure what's causing the rotational dynamics? This is an important question both for this paper and for the field and the authors have a great simulation setup to explore it. For example, what happens if the task instructions u* instruct the arm to follow a random trajectory continuously, instead of stopping at some targets? With a simulated tracking task like this, one could eliminate obvious cases of return-to-original-state from the task. Would the network still produce rotational dynamics? Of course, I don't expect the authors to collect experimental monkey data for such new tasks, rather to just change the task instructions in their numerical simulations to explore the dependence of observed rotational dynamics on the task structure. I think this will help the message of the paper and can be very useful for the field.

      5) It would be beneficial if the authors could elaborate in the discussion on intuitive explanations of why sensory feedback can produce rotational dynamics even with no internal recurrent dynamics in the brain model. To me, it seems like sensory feedback is providing a path for recurrence to exist in the overall brain-arm system, so the non-recurrent neural networks can learn to exploit that path to effectively implement some recurrent dynamics. Some intuitive explanations like this will be helpful for readers.

      6) One main result in data from non-human primates is that there exist rotations also in the somatosensory cortex not just in motor cortex. A more thorough discussion of prior work on rotational dynamics or lack thereof across brain regions and behavioral tasks is important to add here. For example, besides the works cited by the authors, there are other works such as (Kao et al., 2015; Gao et al., 2016; Remington et al., 2018; Stavisky et al., 2019; Aoi et al., 2020; Sani et al., 2021) that discuss or show rotational dynamics in various brain regions and behavioral tasks and should be cited and discussed.

      7) The authors state that "In contrast, rotational dynamics appear to be absent in... MC activity during grasping driven by sensory inputs (Suresh et al., 2020)." There are other papers that study dynamics during reach-grasps and still finds rotational dynamics and modes (Abbaspourazad et al., 2021; Vaidya et al., 2015) and should be cited and discussed. The recent paper on naturalistic reach-grasps (Abbaspourazad et al., 2021) also argues for the involvement of a large-scale network in these movements, which further supports the authors' interpretation that "This interpretation of motor control emphasizes that the objective of the motor system is to attain the behavioural goal and this requires feedback processed by a distributed network." A discussion of this point made in this recent paper in the intro/discussion is important. Finally, there is a recent paper that argues for the input-driven nature of motor cortex (Sauerbrei et al., 2020) and is cited/discussed by the authors but briefly and mainly in the discussion. I think given the relevance of this recent paper to the core message here, it should also be briefly discussed in the introduction to better set up the work.

      Minor:

      1) The Methods are clear and comprehensive, but just to make understanding of the simulation setup easier, it would help to have a diagram of the computation graph for the recurrent and non-recurrent networks that shows their number of units, activations/nonlinearities, RNN cell type, etc., added as supplementary figure.

      2) Again, to help more clearly convey the simulations, it would help to show the task goals (x*) that are inputs to the simulated brain for example trials in each task (for example added to Figures 2 and 6).

      3) Similar to how VAF is shown on top of all plots of jPC planes, it would be helpful to have the rotation frequency for each jPC plane noted next to it. Currently it is difficult to find the jPC frequency associated with each plot from the text.

      4) I am a bit surprised by how different the null distributions are for modeling muscle activity (Figure 3F) and kinematics (Figure 3H). The null distribution is simply the R2 for a constrained or unconstrained dynamic model fit to a subsampled version of the neural activity. The only difference between the null distributions in Figure 3F and Figure 3H seems to be the downsampled dimension, which for muscle activity is 6 and for kinematics is 4 (per equation 1). Any insight will be welcome as to why down sampling the population activity to 4 (Figure 3H) results in so much worse R2 compared with down sampling it to 6 (Figure 3F)?

      References:

      Abbaspourazad, H., Choudhury, M., Wong, Y.T., Pesaran, B., Shanechi, M.M., 2021. Multiscale low-dimensional motor cortical state dynamics predict naturalistic reach-and-grasp behavior. Nature Communications 12, 607. https://doi.org/10.1038/s41467-020-20197-x

      Aoi, M.C., Mante, V., Pillow, J.W., 2020. Prefrontal cortex exhibits multidimensional dynamic encoding during decision-making. Nature Neuroscience 1-11. https://doi.org/10.1038/s41593-020-0696-5

      Gao, Y., Archer, E.W., Paninski, L., Cunningham, J.P., 2016. Linear dynamical neural population models through nonlinear embeddings, in: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (Eds.), Advances in Neural Information Processing Systems 29. Curran Associates, Inc., pp. 163-171.

      Kao, J.C., Nuyujukian, P., Ryu, S.I., Churchland, M.M., Cunningham, J.P., Shenoy, K.V., 2015. Single-trial dynamics of motor cortex and their applications to brain-machine interfaces. Nature Communications 6, 7759. https://doi.org/10.1038/ncomms8759

      Remington, E.D., Narain, D., Hosseini, E.A., Jazayeri, M., 2018. Flexible Sensorimotor Computations through Rapid Reconfiguration of Cortical Dynamics. Neuron 98, 1005-1019.e5. https://doi.org/10.1016/j.neuron.2018.05.020

      Sani, O.G., Abbaspourazad, H., Wong, Y.T., Pesaran, B., Shanechi, M.M., 2021. Modeling behaviorally relevant neural dynamics enabled by preferential subspace identification. Nature Neuroscience 24, 140-149. https://doi.org/10.1038/s41593-020-00733-0

      Stavisky, S.D., Willett, F.R., Wilson, G.H., Murphy, B.A., Rezaii, P., Avansino, D.T., Memberg, W.D., Miller, J.P., Kirsch, R.F., Hochberg, L.R., Ajiboye, A.B., Druckmann, S., Shenoy, K.V., Henderson, J.M., 2019. Neural ensemble dynamics in dorsal motor cortex during speech in people with paralysis. eLife 8, e46015. https://doi.org/10.7554/eLife.46015

      Vaidya, M., Kording, K., Saleh, M., Takahashi, K., Hatsopoulos, N.G., 2015. Neural coordination during reach-to-grasp. Journal of Neurophysiology 114, 1827-1836. https://doi.org/10.1152/jn.00349.2015

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2021-00831

      Corresponding author(s): Lu, Gan

      Reviewer comments are in regular font. Our rebuttal is in bolded font. Experiments that we plan for the full revision are preceded with “FULL:”. In the revision files, the changes are highlighted in yellow.

      General Statements

      We thank the reviewers for their detailed feedback. There are two major concerns. First, the manuscript lacks functional analysis of the meiotic triple helices (MTHs). Second, the manuscript makes claims about the properties of synaptonemal complexes (SCs) and MTHs that are inadequately supported. In order to address the first concern, we would need extensive experiments to first identify and then perturb the genes associated with the MTH. Such experiments are beyond the scope of this manuscript and are the focus of future studies. We will address the second concern with mostly text revisions. We will also improve some of the imaging analysis with new experiments that can be done in a few months’ time.

      2. Description of the planned revisions

      We will acquire new cryo-ET data of pachytene-arrested cell cryolamellae using our new K3-GIF camera. These new data have higher signal-to-noise ratios and allow us to generate a higher-resolution subtomogram average of the MTHs. The achievable resolution will depend on the conformational homogeneity of the MTH segments and on the number of cryotomograms we can capture. If we are able to achieve a subnanometer-resolution reconstruction, we will narrow down the possible identities of the subunits. Even if we cannot achieve subnanometer resolutions, the new data will allow us to test if ladder-like densities were missed in our lower-resolution older data, thereby improving our understanding of the SC’s structure. We will also perform subtomogram analysis of purified ribosomes as a control to strengthen our handedness determination.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      The Reviewers’ original comments are reproduced in regular font. Line numbers refer to the preliminary revision.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Ma and coworkers report studies of budding yeast undergoing meiosis by cryo-ET. They fail to detect structures interpretable as synaptonemal complex, and instead detect feather-like bundles of what appears to be a triple helix. These structures do not appear to be related to the synaptonemal complex, as spo11 mutants that do not initiate recombination, red1 mutants that lack axial elements, and zip1 mutants that lack the central element of the SC still make these bundles. These structures are absent from cells treated with latrunculin A, which depolymerizes actin filaments, but expected structures are not visible in light microscopy of cells treated with two different F-actin-staining reagents. However, it should be noted that another study (Takagi et al, 2021, bioRxiv) did detect actin associated with these structures by immunogold labeling. The structures are also reversibly dissolved in 7% hexanediol.

      This part of the paper's findings is well supported by data and is certainly of interest, although interest is somewhat limited by the unknown nature of these structures-what they contain, let alone their function, remains to be determined-in fact, it is not even determined whether or not they are made of protein. However, as an initial report of a previously unknown phenomenon, the paper is of some value.

      __Thank you for raising the issue of whether MTHs are composed of protein or not. Aside from the proteins, the only other materials capable of forming large bundles of linear polymers are polysaccharides and DNA. Yeast polysaccharides are found in the cell wall, so they are unlikely to be a candidate for the MTHs. In the nucleus, DNA is abundant. While we favor that MTHs are composed of protein, we cannot rule out that the MTHs are non-chromatin DNA-protein complexes. Depending on the resolution of future subtomogram averages, we may get a better idea of the MTH’s composition.__

      There is, unfortunately, a second aspect of the paper that cannot be supported. Although it is clear that synaptonemal complex is present in the cells examined (by standard cytological methods) the authors cannot detect structures consistent with SC in their cryo-ET images. Unfortunately, authors then extrapolate from their inability to detect SC to conclusions about SC, such as that it is not crystalline, and even go so far as to suggest that their failure to detect SC invalidates two models for crossover interference, and that the ladder-like structure reported for SC in many organisms using many difference approaches may be a fixation artifact. Authors show remember that the absence of evidence is not evidence for absence; the speculation described above should be removed from both the abstract and discussion.

      We have removed the speculation about crossover interference and limited the scope of our discussion on SCs to budding yeast only.

      The differences between traditional EM and cryo-EM are not trivial. In the introduction, we added more details to explain the differences in both the sample preparation and contrast generation:

      Lines 79-87: “Meiotic nuclei have been studied for decades by traditional EM (Fawcett, 1956; Moses, 1956), but not by cryo-ET. Cryo-ET can reveal 3-D nanoscale structural details of cellular structures in a life-like state because the samples are kept unfixed, unstained, and frozen-hydrated during all stages of sample preparation and imaging (Ng and Gan, 2020). The densities seen in cryo-ET data come from electron scattering of the biological macromolecules. In comparison, the densities seen in traditional EM from electron scattering of heavy metals such as uranium, tungsten, and osmium, which have adhered to a subset of biological macromolecules that were not extracted in earlier steps.”

      We have also removed the term ‘artifact’ from lines 201-202:

      Original: “The ladder model is based on images of traditional EM samples, which are vulnerable to fixation and staining artifacts.”

      Revised: “The ladder model is based on images of traditional EM samples.”

      Reviewer #1 (Significance (Required)):

      This paper reports a previously unreported structure in the nuclei of yeast cells undergoing meiosis. The composition and function of this structure remain to be determined. This considerably limits the significance of the paper.

      **Referee Cross-commenting**

      I agree with the concerns of the other reviewers. I also agree with reviewer 3 that to raise the significance of the paper would require much work. But I think that the raw observation is of value, albeit in a journal of record. So I would stick with my recommendation of text changes, keeping in mind that there may not be a suitable journal in LSA's portfolio.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      This work describes helical filamentous structures observed in budding yeast nuclei that were cryosectioned and imaged using cryo-electron tomography (cryoET).

      The goal of this work seems to have been to conduct an ultrastructural analysis of the synaptonemal complex (SC), a meiosis-specific protein structure that holds chromosomes together during meiosis and is thought to regulate meiotic recombination. In conventional TEM images of fixed, embedded, and stained sections obtained from pachytene nuclei, SCs usually appear as long, thin, transversely striated structures. At pachytene, SCs extend along the full length of a thin (100-nm) gap between paired chromosomes (typically 1-6 µm long in yeast cells). Surprisingly, the authors did not observe SCs, perhaps because these structures do not produce much contrast in cryo-EM images. Instead, they observe abundant triple helical structures in the nucleoplasm, which they designate as "meiotic triple helices" (MTHs). The authors report that these triple helices assemble at the same time as but independently of the SC. They publised a preprint in which they indicated that these structures were somehow related to SCs, but this the revised version reports that they appear independently of SCs. They further report that treatment of cells with the F-actin depolymerizing drug Latrunculin-A (LatA) resulted in a lack of detectable triple helical structures in the nucleus, suggesting that these structures may be a form of actin or, alternatively, that they may require F-actin for their assembly.

      While the work is technically mostly sound, its significance is unclear because the reported structures have no known function. Many, if not most proteins will form helical structures if their concentration rises above a threshold defined by their binding affinity for themselves (see https://doi.org/10.1186/1741-7007-11-119 and references therein), so this may simply be an example of an abundant nuclear protein that polymerizes to form helical filaments under the conditions that trigger yeast sporulation.

      Thank you for raising the interesting possibility that MTHs form helices because their subunits have exceeded a critical concentration threshold. In the revised text, we discuss the possibility that the MTH is simply a protein that is highly expressed in meiotic cells and polymerize either due to exceeding a critical concentration, or having undergone a biochemical change like a post-translational modification:

      Lines 408-415: “Note it is possible that the MTHs may not be directly involved in meiosis, but are instead a protein that is at a sufficient concentration or has the right biochemical modifications to form helices in pachytene because it is known that many proteins can form a helix under the right conditions {Crane, 1950; Pauling, 1953; Theriot, 2013}. These MTHs also have lateral interactions that allow them to pack with crystalline density. Their sensitivity to 1,6-hexanediol suggests that the polymerization both within and between MTHs are based on hydrophobic interactions. Further work will be needed to determine the identity of the MTH’s subunits and their potential function.”

      The authors perform cryosectioning and cryoEM on yeast cells undergoing meiosis to show that the assembly and disassembly of MTHs follow a similar time course as that of the SC. The observation of these triple-helical filaments in meiotic nuclei has also been reported in another study (https://www.biorxiv.org/content/10.1101/778100v2.full.pdf+html), which proposed, based on immuno-EM labeling, that they may be actin cables. This study reports that the structures are not detected using phalloidin or Lifeact-mCherry. However, treatment with LatA did eliminate detection of MTH structures, suggesting that they may be comprised of actin.

      In my view, there are a number of issues that should be addressed before publication. Many of these relate to the presentation of the findings. Detailed comments below:

      1. The presentation of the work is very confusing. The authors clearly expected to observe SC structures, but did not. They conclude that MTHs are not SCs, since they do not depend on molecular components required for SC assembly. They should describe their findings in a more straightforward way rather than veering from introducing the SC to describing the MTHs.

      We have restructured the manuscript to tone down the discussion about the SC. However, we have to start with the SC because it is the most iconic feature of pachytene cells and a major organizer of chromatin in meiosis. Furthermore, its presence, as indicated by Zip1-GFP signals, was key to establishing that our cells were indeed arrested in pachytene. It would have been confusing to then overlook the absence of structural features as conspicuous and prevalent as what was expected of SCs. The other sections did have room for improvement. In the sections below, we describe the changes point by point.

      Similarly, the discussion section on "recombination and chromosome segregation" seems inappropriate and irrelevant, since no data are presented in this study regarding the functions of the MTHs, and there is no reason to think that they contribute to crossover interference, chromosome segregation, or other aspects of meiosis. Additionally, most of the ideas presented in this section seem very muddled. I recommend deleting this section.

      We have deleted the section entitled "Recombination and chromosome segregation".

      Throughout the text, we have also changed the term “meiosis-specific” to “meiosis-related” when describing MTHs. Doing so allows for the possibility that MTHs might just form as a consequence of being expressed to a high enough concentration as discussed above.

      Along the same lines as comment #1: The title should be changed - absence of evidence for "ladders" is not evidence of absence. Prior work using TEM and superresolution fluorescence microscopy has clearly shown that ladder-like SC structures exist in pachytene nuclei of budding yeast and many other organisms, although they apparently cannot be visualized using the methods described here.

      We have changed the title to be less forceful, yet report what we see and don’t see by cryo-ET:

      “Cryo-ET detects bundled triple helices but not ladders in meiotic budding yeast”

      The authors should clarify whether cryo-sectioning was performed through the full volume of pachytene nuclei.

      This comment refers to our attempt at serial cryosectioning, as shown in Fig. S8. We have revised the text in lines 332-334 to reflect the estimated volume covered:

      “We successfully reconstructed six sequential sections from one ndt80Δ cell (Figure S8), which represents approximately one third of a nucleus (assuming a spherical shape).”

      We also changed Fig. S8’s title so that it doesn’t sound like we reconstructed an entire nucleus:

      “MTH bundles are extensive throughout the cell nucleus.” → “MTH bundles are extensive.”

      It is not clear from the manuscript which camera/microscope configuration was used to acquire the cryoET data that were used for sub-tomogram averaging. The authors state in the methods that Falcon II and K3-GIF was used for projection images, but it's not clear if this applies to all images. These technical details should be clarified.

      We have added a new column to Table S4 that reports the camera used for all the projection imaging and tomography experiments. In the original MS, all of the subtomogram averaging was done using Falcon II data.

      FULL: In the full revision, we plan to incorporate new subtomogram averages of MTHs in situ, using K3-GIF data of cryolamellae.

      The analysis of the handedness of the helices seems to be questionable as the resolution of the reconstructions for 80S are also quite low. I am uncertain whether this can be used to state with confidence that the MTH are right-handed.

      FULL: In the full revision, we will use purified 70S ribosomes, imaged on the same K3-GIF camera and using the same software workflow as for the new subtomogram averaging of MTHs in situ. We expect higher resolution for both ribosomes and MTHs, which will make the handedness assignment unambiguous.

      The authors claim that treatment with Latrunculin-A (LatA) leads to disappearance of MTHs. However, they support this with projection images of cells treated with LatA. The projection images are of poor quality and the vitrification in these cells (as well as the DMSO treated cells) do not look appropriate. They should present data for LatA-treated cells and DMSO-treated controls obtained using the same approach and ideally imaged in parallel with untreated cells. They should also quantify the number of sections and cells imaged for all conditions.

      Once we realized that the MTH bundles were visible in projections, we chose to report detections of MTH bundles by projection imaging instead of the costly tilt series. The apparent poor quality and questionable vitrification comes from the fact that the projection images show the cryosection’s crevasses and knife marks, which reside on opposite cryosection surfaces. These sectioning artifacts are computationally excluded from tomographic slices. The following line was added to the figure caption to explain this:

      “These image features are not devitrification artifacts; they are absent from the tomographic slices in other figures because they can be computationally excluded.”

      The quantification of the MTHs in Lat-A vs control cells are in Table 2. We have now added these numbers to both the text and the figure caption.

      The similarities between the MTHs and SCs - that both are present in meiotic nuclei and sensitive to hexanediol - seem unlikely to be functioanlly relevant. Again, I think the presentation suffers from being focused on the SC which was not seen, rather than on the MTHs.

      We have toned down the discussion on SCs throughout the manuscript. We have retained the motivation for using 1,6-hexanediol to probe the MTHs physico-chemical properties and the fact the previous work on SCs provided motivation for this perturbation experiment. However, we removed the comparison of their relative sensitivities to 1,6-hexanediol (see reply to point #10 below).

      The absence of 100-nm-wide zones containing nucleosomes is again not evidence for lack of SCs. SCs are ribbon-like structures - they are about 100 nm in one dimension but the thickness has not been characterized reliably; even if SCs do exclude nucleosomes (which is not certain) the excluded volume might be much smaller than the authors imagine.

      We did not argue for “lack of SCs”; these structures clearly exist in our cells given the fluorescent linear structures seen in Zip1-GFP expressing cells. We only say that the textbook portrayal of SCs needs revision, though we should have restricted our statement to yeast. In the literature and textbooks: whenever the SC’s central element is drawn, it is depicted without internal nucleosomes and being densely packed with SC proteins.

      Does Lifeact-mCherry enter the nucleus? This information is important in interpreting the failure to detect MTHs using this probe.

      While Lifeact-mCherry is small enough to passively diffuse through the nuclear pore, our data cannot rule out that this molecule is excluded from the nucleus. We added the following sentence as a caveat:

      Lines 244-245 “Note that we cannot rule out that Lifeact-mCherry is excluded from the nucleus.”

      The sensitivity of the MTH structures to 1,6-hexanediol treatment is potentially interesting, but it does not reveal anything about their structure or function, only that their assembly likely depends in part on hydrophobic interactions. Caution should be used in interpreting these findings.

      We have toned down the discussion about the meaning of the MTH bundles’ 1,6-hexanediol sensitivity by removing this line from the original Results:

      “MTH bundles are therefore sensitive to a slightly higher concentration of 1,6-hexanediol than SCs are and reform when 1,6-hexanediol is removed.”

      We have also added the following line to more clearly say what sensitivity to 1,6-hexanediol means:

      Lines 412-414: “Their sensitivity to 1,6-hexanediol suggests that the polymerization both within and between MTHs are based on hydrophobic interactions”

      The figure legends and/or Methods sections should clarify what is represented in each figure, and how the data were acquired. In particular, cryotomographic slices of varying dimensions (6nm, 10 nm or 12 nm or 70 nm) are mentioned in the captions of several figures (2-6, and S1, S2, S3). However, is often unclear whether these represent physical or computational sections.

      “Tomographic slice” refers to a rendering of a computationally extracted slice from a reconstructed tomogram. To make it clearer, we have added the term “computational” to describe the tomographic slices in each figure caption.

      Page 23 has a supplemental figure but no captions. Is this the same as Fig. S8?

      Yes, this is a copy of Fig. S8 that appeared due to MS Word’s jumping-figure bugs. We will manually edit the PDF in the future revision.

      I do not find the model figure (Figure 7) to be helpful. Additionally, the failure to detect SCs and the presence of MTHs do not warrant a "revised model of the meiotic yeast nucleus."

      We now call panel A and B the “Traditional EM” and the “Cryo-EM” models, respectively. The figure therefore reports the large nuclear bodies seen by the two methods and no longer implies correctness.

      We also changed the related sentence in the Introduction:

      Original: “Our work strongly suggests that current models of pachytene nuclear cytology need revision.”

      Revised: “Our analysis shows that MTHs coexist with SCs, which have an unknown cryo-ET structure.”

      The absence of MTHs in haploid cells induced to undergo meiosis should perhaps be studied in more detail. Even SCs are present in haploid meiotic cells, so the absence of these structures may be informative as to their function. Haploid cells should also be stained for SCs and imaged by immunofluorescence to verify that they are in meiosis.

      The haploid strain that was treated with sporulation media cannot enter meiosis. Haploid cells that are capable of entering meiosis need to be disomic for chromosome III, with each copy having a different mating type at the Mat locus. We believe that the construction and studies of such strains would be more meaningful after we identify the MTH’s subunits and determine its function in diploid cells.

      The yeast strain is SK1, not SK-1.

      Thank you. This mistake is corrected.

      What are "self-pressurized-frozen samples" (p.2)?

      Self-pressurized-frozen samples are generated by an alternative to the conventional machine-based high-pressure-freezing method. We have added more details in the new lines 135-141:

      “Self-pressurized freezing is a simpler and lower-cost alternative to conventional high-pressure freezing, which requires a dedicated machine that consumes large amounts of cryogen. In the self-pressurized freezing method, the sample is sealed in a metal tube and rapidly cooled in liquid ethane. The material in direct contact with the metal cools first and expands by forming crystalline ice, which exerts pressure on the material in the center of the tube (Leunissen and Yi, 2009; Yakovlev and Downing, 2011; Han et al., 2012).”

      Reviewer #2 (Significance (Required)):

      The observation of MTHs is novel but (as stated above) of unclear significance, given that their molecular identity and function are unknown.

      This work may be of interest to the meiosis field, with the caveats described above that the functional relevance is currently unclear.

      This review was co-written by referees with expertise in meiosis, chromosome organization, SC structure and function, and cryoEM.

      **Referee Cross-commenting**

      The reviews are strikingly concordant so I don't think much needs to be added.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The authors report the observation of filamentous structures (further termed meiotic triple helices MTH) in the nucleoplasm during meiosis in yeast cells. Those structures are visualized using Cryo-ET. While the authors initially seem to assume an association of those structures with synaptonemal complexes, they discover that those structures rely on filamentous actin and are not affiliated with synaptonemal complexes after all.

      While I think that the observation of those MTH by Cryo-ET is interesting, the overall structure of the paper and presentation of the data are not very well done. As the authors find throughout their experiments that the MTHs are not associated with synaptonemal complexes the strong focus in the first figures on synaptonemal complexes as well as the title of the paper are very mis-leading. The authors try to initially make the point that other labs have observed ladder like structures by transmission electron microscopy and want to make the claim that those observed structures might be an artifact of sample preparation, hence the title: Meiotic budding yeast assemble bundled triple helices but not ladders. However, at the end those structures seem unrelated to synaptonemal complexes.

      In addition, several labs have reported the presence of nuclear actin in meiosis and mitosis and have even succeeded to show those structures by transmission electron microscopy, questioning the "artifact" argument.

      Presumably, the Reviewer means intranuclear actin, as opposed to perinuclear (cytoplasmic) actin. If so, then we have only seen one paper, the one from the Takagi et at 2021 (bioRxiv) that has reported seeing structures associated with nuclear actin. Note however, that the revised Takagi bioRxiv paper is very careful in saying that the filament bundles contain actin, which is not the same as saying that the filaments are polymers of actin.

      Our “artifact” argument – now removed – referred to SCs, not to nuclear actin.

      In the revised manuscript, we use the term “intranuclear” to make it clear that we are referring to structures inside the nucleus. We also use the term F-actin, where appropriate, to refer to the best-studied actin polymer, which resembles a double helix. Doing so eliminates confusion about other forms of actin: G-actin, which cryo-ET cannot yet directly visualize due to its small size; and non-canonical actin polymers, for which there are no previous experimental X-ray or cryo-EM structures for comparisons.

      The story line of the paper is weak and I think the authors would have been better of reporting their cryo-ET structures and making a better link to actin or determining what else they think might be a component of those structures. Immuno-EM (as actually shown in Reference 41) of actin would have been much more convincing. The authors could also use the power of Cryo-ET and the achievable higher resolution to describe those filaments in much more detail. In my opinion this would have been a much better and more exciting paper.

      We disagree that “making a better link to actin” is the right approach because doing so presupposes that the structures are composed of actin, for which the present evidence is inadequate. We do agree that determining what the MTHs are (or are not) would be valuable.

      FULL: Now that we have better access to a K3-GIF camera and a cryo-FIB-SEM, we will attempt higher-resolution subtomogram averaging analysis. If we are able to achieve subnanometer resolution, we will attempt to narrow down the fold of the MTH subunit. Note that this goal will require that the MTHs are conformationally homogenous and that we can image sufficient copies of the MTHs.

      In summary: While I think the Cryo-ET images of those structures could be very exciting the paper unfortunately does not do a very good way in presenting this data and is at times misleading trying to proof or rather dis-proof a connection to synaptonemal complexes. Based on this I think that the paper can not be published in the current form and needs major revisions that would require a significant amount of time.

      **Minor comments:**

      Figure 1: The choice of timepoints is confusing and makes it hard to compare. While wild type is shown at 0,1,3,4,5,8h, the mutant is shown at 0,2,3,4,5,6,7h. It would be appropriate to select the same timpepoints for both conditions.

      FULL: We will recollect fluorescence images of the mutant cells at the same time points as the wild type.

      Figure 3 and 4 need a quantification of the number of observed MTHs, in particular as only selected regions of the nuclei are shown.

      These images were taken from single cryosections instead of serial cryosections, which would have been too difficult to do for multiple conditions and multiple cells. Therefore, quantification would be obfuscated by the fact each cryosection samples a small fraction of the nuclear volume. We believe that reporting the number of cell cryosections that are MTH-positive (Table 2) is at present the best way to characterize their abundance and ability to polymerize. Once we are able to identify the MTH gene products, we will be able to perform GFP tagging experiments and thereby get a much better estimate of the polymer mass as a function of biochemical perturbations.

      Fig 7. The data certainly does not support a "REVISED" model of the yeast nuclear organization.

      We have changed the Figure title to “A cryo-ET model of the meiotic yeast nucleus.” We now also refer to panels A and B as “Traditional EM model” and “Cryo-ET model”, respectively.

      Reviewer #3 (Significance (Required)):

      Several publications have already shown the presence of actin in meiotic and mitotic nuclei and have even succeeded in observing those structures by transmission electron microscopy. Based on this it is not clear why the authors have not tried to put their work in context to all these observations and used their technology to obtain novel information on the structure, which might be helpful to identify which proteins compose the MTHs. Based on how the data is presented I do not think that this paper contributes anything new to the field.

      Presumably, the reviewer means that F-actin has been imaged in yeast cells, because for actin to exist in nuclei in mitosis/meiosis, the organism would have to undergo closed mitosis/meiosis. Furthermore, for actin to be observable by transmission electron microscopy, it would have to be in the filamentous (F-actin) form. We could not find any publications that report transmission electron microscopy of F-actin in yeast cells. We therefore cannot relate our results to F-actin in the meiotic nucleus.

      My field of expertise is meiosis and mitosis as well as imaging (light and electron microscopy).

      **Referee Cross-commenting**

      All reviewers seem to agree that the general observation of these structures is interesting but that there is a reduced significance as the function and identity of these structures remains unknown.

      4. Description of analyses that authors prefer not to carry out

      The main unanswered question is: what is the function of the MTH bundles? To address this question, we would first need to identify the gene products that are needed for MTH assembly. Next, we would need to do genetic perturbation experiments to actually determine the MTHs’ function. These experiments would constitute a complete study, which is better suited for a separate, future manuscript.

    1. Author Response:

      Reviewer #1:

      Maimon-Mor et al. examined the control of reaching movement of one-handers, who were born with a partial arm, and amputees, who lost their arm in adulthood. The authors hypothesized that since one-handers started using their artificial arm earlier in life then amputees, they are expected to exhibit better motor control, as measured by point-to-point reaching accuracy. Surprisingly, they found the opposite, that the reaching accuracy of one-handers is worse than that of amputees (and control with their non-dominant hand). This deficit in motor control was reflected in an increase in motor noise rather than consistent motor biases.

      Strengths:

      • I found the paper in general very well and clearly written.
      • The authors provide detailed analyses to examine various possible factors underlying deficits in reaching movements in one-handers and amputees, including age at which participants first used an artificial arm, current usage of the arm, performance in hand localization tasks, and statistical methods that control for potential confounding factors.
      • The results that one handers, who start using the artificial arm at early age, show worse motor control than amputees, who typically start using the arm during adulthood, are surprising and interesting. Also intriguing are the results that reaching accuracy is negatively correlated with the time of limbless experience in both groups. These results suggest that there is a plasticity window that is not anchored to a certain age, but rather to some interference (perhaps) from the time without the use of artificial arm. In one-handers these two time intervals are confounded by one another, but the amputees allow to separate them. I think that the results have implications for understanding plasticity aspects of acquiring skills for using artificial limbs.

      Weaknesses:

      • While I found that one of the main conclusion from the paper is that the main factor that is related to increased motor noise is the time spent without the artificial arm, it felt that this was not emphasized as such. These results are not mentioned in the abstract and the correlation for amputees is not shown in a figure.

      We thank the reviewer for their comment. While it is true that motor noise correlated with time of limbless experience in both groups, we were hesitant to highlight the results found in amputees, considering the small number of participants, and lack of converging evidence (e.g., contrary to the congenital group, we did not find a strong main effect). For these reasons, we have chosen to include it in the manuscript but not highlight it or base our main conclusions on it. Following the reviewer’s comment, the correlation of the amputees’ data is now visualised in Figure 3. Moreover, while the behavioural correlation might be similar in both groups, from a neural standpoint, the limbless experience of a toddler with a developing brain is qualitatively different to that of an adult, with a fully developed brain, who has lost a limb. As such, we were hesitant to link these two findings into a single framework, however in the revised manuscript we highlight this tentative link.

      Discussion (4th paragraph):

      “In both the congenital and acquired groups, artificial arm reaching motor noise correlated with the amount of time they spent using only their residual limb. It is therefore tempting to link these two results under a unifying interpretation; however, this requires further research, considering the neural differences between the two groups.”

      Figure 3. Years of limbless experience before first artificial arm use in the acquired group. (A) Relationship between years of limbless experience and (A) artificial arm reaching errors or (B) artificial arm motor noise in the acquired group.

      • The suggested mechanism of a deficit in visuomotor integration is not clear, and whether the results indeed point to this hypothesis. The results of the reaching task show that the one-handers exhibit higher motor noise and initial error direction than amputees. The results of the 2D localization task (the same as the standard reaching task but without visual feedback) show no difference in errors between the groups. First, it is not clear how the findings of the 2D localization task are in line with the results that one-handers show larger initial directional errors.

      We fully take on the reviewer’s comment regarding the vague use of the term visuomotor integration. In the revised manuscript, we have opted instead for a much broader term, suggesting a deficit in visual-based corrective movements, considering we are limited in our ability to infer the specific underlying mechanism from our result. We have also made changes to the abstract based on the reviewer’s comment (see below).

      With regards to discussing how the various results fit together, in the revised manuscript, these are now discussed more at length. In short, in the 2D localisation task (reaching without visual feedback), participants were not instructed to perform fast ballistic movements. Instead, participants were instructed that they could perform movements to correct for their initial aiming error (using proprioception). Together with the similar performance observed for the proprioceptive task, this strengthens our suggestion that the deficit in the congenital group is triggered by visual-driven corrections. These various considerations are now detailed as follows:

      Abstract:

      “Since we found no group differences when reaching without visual feedback, we suggest that the ability to perform efficient visually-based corrective movements, is highly dependent on either biological or artificial arm experience at a very young age.”

      Result (section 7, 1st paragraph):

      “From these results, we infer that early-life experience relates to a suboptimal ability to reduce the system’s inherent noise, and that this is possibly not related to the noise generated by the execution of the initial motor plan. Early life experience might therefore relate to better use of visual feedback in performing corrective movements. The continuous integration of visual and sensory input is at the heart of visually- driven corrective movements. Therefore, one possibility is that limited early life experience, results in suboptimal integration of information within the sensorimotor system.”

      Discussion (2nd paragraph):

      “When performing reaching movements without visual feedback (2D localisation task), the congenital group did not differ from the acquired or control group. This begs the question, if the congenital group has a deficit in motor planning why was it not evident in this task as well? In the 2D localisation task, unlike the main task, participants were allowed to make corrective movements. While they did not receive visual feedback, the proprioceptive and somatosensory feedback from the residual limb appears to be enough to allow them to correct for initial reaching errors and perform at the same level as the acquired and control group. Moreover, we did not find strong evidence for an impaired sense of localisation of either the residual or the artificial arm in the congenital group. As such, by elimination, our evidence suggests that the process of using visual information to perform corrective movements isn’t as efficient in the congenital group.”

      Discussion (2nd paragraph):

      “Lack of concurrent visual and motor experience during development might therefore cause a deficit in the ability to form the computational substrates and thus to efficiently use visual information in performing corrective movements.”

      Discussion (last paragraph):

      “By the process of elimination, we have nominated suboptimal visual feedback-based corrections to be the most likely cause underlying this motor deficit.”

      Second, I think that these results suggest that the deficiency in one-handers is with feedback responses rather than feedforward. This may also be supported by the correlation with age: early age is correlated with less end-point motor noise, rather than initial directional error. Analyses of feedback correction might help shedding more light on the mechanism. The authors mention that the participants were asked to avoid doing corrective movement and imposed a limit of 1 sec per reach to encourage that. But it is not clear whether participants actually followed these instructions. 1 sec could be enough time to allow feedback responses, especially for small amplitude movements (e.g., <10 cm).

      Please see below our response to the feedback correction analysis suggestion. Regarding corrective movements, we had the same concern as the reviewer which led us to use hand velocity data to identify first movement termination. We apologise if the experimental design and pre-processing procedures were not clear.

      In short, a 1 sec trial duration was imposed on all trials to generate a sense of time- pressure and encourage participants to perform fast ballistic movements. As we were worried that participants might still perform secondary corrective movements within this 1 sec window, for each trial, we used the hand velocity profile to identify the end of the first movement. Below, we have plotted the arm velocity from a single trial to illustrate this procedure. For this trial, the timepoint indicated by the circular marker has been identified as the time of the end of the first movement (See Methods for further information). For each trial, endpoint location was defined as the location of the arm at the movement termination timepoint defined by the kinematic data and not the endpoint at the 1 sec timepoint. It is worth noting that performing the same analysis using the end- points recorded at the 1 sec timepoint did not generate different statistical results.

      This has now been further clarified in the text.

      Results (section 1, 1st paragraph):

      “Reaching performance was evaluated by measuring the mean absolute error participants made across all targets (see Figure 1C). The absolute error refers to the distance from the cursor’s position at the end of the first reach (endpoint) to the centre of the target in each trial. The endpoint of each trial was set as the arm location at the end of the first reaching movement, identified using the trial’s kinematic data (See Methods).”

      Methods (section: Data processing and analysis – main task):

      “Within the 1 sec movement time constraint, in some trials, participants still performed secondary corrective movements. We therefore used the tangential arm velocities to identify the end of the first reach in each trial (i.e., movement termination).”

      Reviewer #2:

      This is a broad and ambitious study that is fairly unique in scope - the questions it seek to answer are difficult to answer scientifically, and yet the depth of the questions it seeks to answer and the framework in which it is founded seem out of place in a clinical journal.

      And yet, as a scientist and clinician, I found myself objecting to the claims of the authors, only have them to address my objection in the very next section. The results are surprising, but compelling - the authors have done an excellent job of untangling a very complicated question, and they have tested (for our field) a large number of subjects.

      The main two results of the paper, from my perspective, are as follows:

      1) Persons with an amputation can form better models of new environments, such as manipulandums, than can those with congenital deficiencies. This result is interesting because a) the task did not depend on significant use of the device (they were able to use their intact musculature for the reaching-based task), and b) the results were not influenced by the devices used by the subjects (cosmetic, body-powered, or myoelectric).

      2) Persons with congenital deficiency fit earlier in life had less error than those fit later in life.

      Taken together, these results suggest that during early childhood the brain is better able to develop the foundation necessary to develop internal models and that if this is deprived early in childhood, it cannot be regained later in life - even if subjects have MORE experience. (E.g., those with congenital deficiencies had more experience using their prosthetic arm than those with amputation, and yet scored worse).

      The questions analyzed by the researchers are excellent and the statistical methods are generally appropriate. My only minor concern is that the authors occasionally infer that two groups are the same when a large p-value is reported, whereas large p-values do not convey that the groups are the same; only that they cannot be proven to be different. The authors would need to use a technique such as ICC or analysis of similarities to prove the groups are the same.

      We appreciate the reviewer’s concern about inferring the null from classical frequentist statistics. In this manuscript, we have opted to using Bayesian statistics as a measure of testing the significance of similarity across groups (See Methods: Statistical analysis) as opposed to the frequentist methods suggested by the reviewer. This approach is equivalent to the ones proposed by the reviewer and are widely used in our field. A Bayesian Factor (BF) smaller than 0.33 is regarded as sufficient evidence for supporting the null hypothesis that is, that there are no differences between the groups.

      This approach is described in detail in the methods and is introduced in the first section of the results as well.

      Results (1st section 2nd paragraph):

      “To further explore the non-significant performance difference between amputees and controls, we used a Bayesian approach (Rouder et al., 2009), that allows for testing of similarities between groups (the null hypothesis). In this analysis, the smaller effect size of the two reported here (1.39) was inputted as the Cauchy prior width. The resulting Bayesian Factor (BF10=0.28) provided moderate support to the null hypothesis (i.e., smaller than 0.33).”

      Methods (Statistical analysis section):

      “In parametric analyses (ANCOVA, ANOVA, Pearson correlations), where the frequentist approach yielded a non-significant p-value, a parallel Bayesian approach was used and Bayes Factors (BF) were reported (Morey & Rouder, 2015; Rouder et al., 2009, 2012, 2016). A BF<0.33 is interpreted as support for the null-hypothesis, BF > 3 is interpreted as support for the alternative hypothesis (Dienes, 2014). In

      Bayesian ANOVAs and ANCOVA’s, the inclusion Bayes Factor of an effect (BFIncl) is reported, reflecting that the data is X (BF) times more likely under the models that include the effect than under the models without this predictor. When using a Bayesian t-test, a Cauchy prior width of 1.39 was used, this was based on the effect size of the main task, when comparing artificial arm reaches of amputees and one- handers. Therefore, the null hypothesis in these cases would be there is no effect as large as the effect observed in the main task.”

      Following the reviewer’s comment, we have carefully scanned through the manuscript to make sure no equivalence claims are made without the support of a significant BF. In one instance that has been the case and has been rectified.

      Results (3rd section, 2nd paragraph):

      “We compared artificial arm and nondominant arm biases (distance from the centre of the endpoint to the target) across groups, using intact arm biases as a covariate. The ANCOVA resulted in no significant (inconclusive) group differences (F(2,47)=2.40, p=0.1, BFIncl=0.72; see Figure 2A).”

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Saha and colleagues investigated the functions of the long non-coding RNA (lncRNA) DRAIC in malignant glioma. They find that DRAIC expression decreases cell migration/invasion and tumorsphere/colony formation in vitro, and tumor growth in vivo using established cell lines. Mechanistically, DRAIC is known to inhibit NF-kB signaling and the authors demonstrate that DRAIC activates AMPK leading to repression of mTOR, which decreases protein synthesis and increases autophagy. This is a solid study highlighting a potentially interesting pathway of tumor growth and invasion in brain tumors.

      Answer: We appreciate Reviewer 1 for the positive feedback of our study


      Major Comments:


      1) It is unclear whether the presented values (mean +/- SD) in the histograms refer to repeat measurements (in which case n = 1) or independent experiments (n>1). The number of replicate experiments is not stated in the methods or figure legends. This must be included.

      Answer: We want to thank Reviewer 1 for pointing out this omission. We have now included this information in the Materials and methods and in figure legends section.


      2) I don't think the immunoblot for p62 in Fig. 5C shows a convincing increase following DRAIC knockout, so the statement on p.8 should be revised.

      Answer: We have revised the statement to say: Consistent with DRAIC decrease being associated with a decrease in autophagic flux, and despite a decrease in p62 mRNA, the level of P62 protein is increased in three of the DRAIC KO prostate cancer cells (Fig. 5C, KO1, KO2, KO4 compared to WT) and unchanged in the other two.

      3) On p.8/Fig.5 the authors make a case that increased DRAIC levels increase lysosomal degradation of autophagosome core proteins LC3 II / p62 (resulting in decreased protein levels of both), while simultaneously increasing gene expression of LC3B and p62 (causing increased mRNA levels). The data for DRAIC overexpression fit this logic fairly well (even though I think more work is needed to fully support this claim), but I am finding it difficult to reconcile the DRAIC knockout data with this scenario - here, loss of DRAIC results in increased protein levels to decreased autophagy, but also decreased gene expression. To fully support this argument, rescue experiments would be needed using FoxO3a knockout/overexpression.

      Answer: Note that the mRNA level is not always correlated with protein expression. This is particularly true for LC3 and p62, whose protein levels are significantly affected by the extent of fusion of autophagosomes and lysosomes and subsequent degradation in autophagolysosomes. Thus, although the mRNA of these genes is decreased in DRAIC KO cells (Fig. 5E), the proteins are increased (Fig. 5C) because of decrease of autophagic flux (and decrease of degradation in the autophagolysosomes).

      The overexpression of FoxO3a in the DRAIC KO cells will not restore mRNA levels of LC3 or p62, because we show in Fig. 4H that FoxO3 phosphorylation by AMPK is suppressed by DRAIC KO. This phosphorylation is important for the induction of LC3 or p62 mRNA by FoxO3.

      FoxO3 knockout or knockdown in DRAIC OE cells should decrease LC3B or p62 mRNA in Fig. 5D, but it is already known from the Literature that FoxO3a is necessary for inducing LC3B or p62 mRNA. Cell Metab. 2007 Dec;6(6):458-71. doi: 10.1016/j.cmet.2007.11.001.PMID: 18054315.

      4) Similarly, the data supporting increased autophagy following DRAIC overexpression (Fig. 5F/G) are a bit weak and lack controls (is the LC3B-GFP overlapping with endogenous LC3B and autophagosomes? Was the transfection efficiency comparable? Is there fusion with lysosomes?). In the absence of stronger data, the authors should temper their claims that DRAIC increases autophagy.

      Answer: LC3B of fusion protein LC3B-GFP is known to overlap with the p62 puncta (similar to endogenous LC3B). This result is in Fig. 4A of the citation that we have now added (Proc Natl Acad Sci U S A. 2016 Nov 22;113(47): E7490-E7499. doi: 10.1073/pnas.1615455113. Epub 2016 Oct 17)

      To support our hypothesis that DRAIC OE induces more autophagy compared to empty vector, we used Bafilomycin A1 in Figure 5B to inhibit the autophagosome and lysosome fusion. We see the accumulation of more LC3B upon treatment with Bafilomycin A1 in the DRAIC OE cells (compared to EV containing U251 cells), consistent with the idea that autophagosome-lysosome fusion is increased by DRAIC OE.



      5) No information is provided on animal numbers used in this study. How many mice were used per cohort? Were male and female mice used? Authors should follow ARRIVE guidelines in reporting animal experiments. The method for calculating tumor volume needs to be specified.

      Answer: We have included the details about the animal study in methods section of our modified manuscript .

      6) Student's T-test is inappropriate for comparisons of more than two groups (i.e. all experiments using DRAIC knockout cells) - for these experiments a Kruskal Wallis test or ANOVA should be used. Did the authors test for normal distribution of their data? This may affect statistical testing and should be taken into consideration.

      Answer: We have now modified our statistical calculation and included in the statistical analysis section in our modified manuscript.


      Minor Comments:


      7) Authors mention that DRAIC expression is undetectable in immortalized astrocytes and GBM cancer stem cells (Fig. S1). What is the source of these cells and how were they cultured?

      Answer: The immortalized astrocytes and GBM stem cells and their culture conditions is now described.

      8) The immunoblot in Fig. 3D could be replaced with a slightly lower exposure to make the difference between WT and DRAIC KO more obvious.

      Answer: We have now replaced the immunoblot with lower exposure.

      9) Some immunoblots in Fig. 3 (panel E, p-S6K and S6K; panel H, actin) are not of the best quality and an effort should be made to replace them.

      Answer: We have now replaced the immunoblot p-S6K as reviewer mentioned.



      10) Why are different loading controls used in Fig. 3 (a-Tubulin v actin)?

      Answer: We use multiple loading control to make sure that we are not underestimating or overestimating changes in the experimental protein because of unexpected changes in the loading controls.

      11) Compared to other blot images in the same figure (e.g. Fig. 3E), the bands for p-mTOR and mTOR in Fig. 3F look compressed and should be shown appropriately sized.

      Answer: We have modified the Figure as reviewer suggested.


      12) The layout of Fig. 4 is somewhat confusing. I would suggest organizing this according to DRAIC overexpression in A172 and U373 cells versus DRAIC knockout in LNCaP cells. Each immunoblot should be clearly labelled with the corresponding cell line, and it should be clearly explained why p-FoxO3a was tested in U251 cells, rather than A172/U373 as in the rest of the figure.

      Answer: We thank the reviewer for the constructive criticism. We have labeled all the cell lines in the Figure as reviewer suggested. We have now systematically alternated the prostate cancer cells (for KO) and the GBM cells (for OE), as we looked at each relevant marker. We have now included the western blot for p-FoxO3a from another glioblastoma cell line U373. Please find the modified Figure 4K for p-FoxO3a.

      13) Labelling of immunoblot in Fig. 5B is confusing and should be improved.

      Answer: We have modified the Fig. 5B to make the label clearer.

      14) Changes in GLUT1 expression (Fig. 7A) should be validated on the protein level.

      Answer: We have included the immunoblot for GLUT1 from DRAIC KO cells in Figure 7B. GLUT1 protein is increased upon DRAIC KO.


      Reviewer #1 (Significance (Required)):

      The authors describe a novel link between the lncRNA DRAIC and AMPK activation through inhibition of NF-kB-mediated regulation of GLUT1. This study extends their previous work on DRAIC inhibition of NF-kB in prostate cancer (Saha et al. Cancer Res 2020). There is one study describing DRAIC effects on growth and invasion in glioma cell lines (Li et al. Eur Rev Med Pharmacol Sci 2020), but the work presented by Saha and colleagues contains stronger experimental data and a more detailed and previously undescribed mechanism.

      The current study presents a mechanistic advance that increases the understanding of tumor growth and protein synthesis in cancer cells. The data presented in the study are not supported by in vivo experiments (other than suppression of tumor growth by DRAIC overexpression), validation in human tissue and/or primary patient-derived human glioblastoma cells, or even substantial rescue experiments. This limits the influence of the work on the field. I'm also not sure how transferable findings from DRAIC knockout in prostate cancer cell lines are to glioma, although the results are mostly complementary to the data from glioma cell lines. This is particularly relevant to the proposed mechanism of GLUT1 regulation by NF-kB, as the bulk of experimental data in Figures 6 and 7 was generated in prostate cancer cell lines and is only poorly validated in glioma cells. The study results will be most relevant for researchers investigating cell signaling pathways and autophagy in cancer.

      Answer: We like to thank reviewer for the positive comments on our study. The DRAIC KO experiments of Fig. 6 and 7 cannot be done in glioma cells, because as we show if Supp. Fig. S1, there are no glioma cells or GSC that express DRAIC to levels comparable to LnCaP. We have shown that GLUT1 mRNA decreases in the glioma cells when DRAIC is overexpressed (Supp. Fig. S4. We also show in Fig. 7G that AMP levels increase when DRAIC is overexpressed in glioma cells.__

      __

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, the authors describe DRAIC as a lncRNA downregulated in prostate cancer. They postulate that DRAIC expression surpasses invasion, migration and growth. Mechanistically, the authors show that DRAIC activates AMPK by suppressing NFkB target gene TOR and indirectly impacting translation and autophagy. Collectively the observation is interesting and robust. However, I have several technical requests, particularly regarding the mechanistic part of the paper.

      Answer: We appreciate the positive feedback. We have addressed the reviewer’s concerns in our modified manuscript.


      Major Comments:


      1) The authors should rescue Ko phenotypes by over expressing DRAIC to consider potential off target effects.

      Answer: DRAIC OE alone is sufficient to have exactly the opposite effect as DRAIC KO in protein translation (Fig. 3C-F), so DRAIC OE will rescue the effect of DRAIC KO. We make a similar argument for all the phenotypes, including mTOR, S6K and ULK1(S757) phosphorylation (Fig. 3G-J), AMPK and FoxO3a phosphorylation (Fig. 4B-C; J-L), autophagic flux (Fig. 5B, C) and effects on LC3B and p62 mRNAs (Fig. 5D, E). The same is true for our published phenotypes of DRAIC KO on invasion, migration and NF-kB activity (Saha, Cancer Research, 2020)


      2) The blots showing TOR and ULK1 phosphorylation need to be repeated. This is an important part of the paper and I feel that these blots are hard to interpret. p-S6K typically run a bit higher in gels. there may be a technical problem.

      Answer: We are not sure which specific blots the reviewer is referring to, and it is possible that the blots the other reviewers pointed to are the ones under question. We have changed those blots so that the results are clear.


      3) GLUT1-related results are interesting, but the authors should provide genetic evidence that the effects are mediated by GLUT1. How do we know that glucose uptake is indeed upregulated upon knockout?

      Answer: In Fig. 7 C-F we show that the effects of DRAIC KO on invasion, protein translation, AMP levels and AMPK activity are reversed by the GLUT1 inhibitor Bay-876. This is a cleaner result than using siRNA to knockdown GLUT1. siRNAs can have off-target activity and sometimes cannot decrease a protein sufficiently below the threshold necessary to see reversal of action.


      Minor Comments:

      4) The figures need to be updated. FOnts are all different, lots of unaligned graphs, quality of the blots are poor.

      Answer: We have updated the Figures and changed fonts as reviewer mentioned.

      Reviewer #2 (Significance (Required)):

      The observation is interesting, but the mechanism is incompletely understood. This is a nice addition to the literature, even without the mechanism.

      Answer: We want to thank the reviewer for the constructive criticism.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      Shaha and colleagues present a study demonstrating the tumor suppressive role of DRAIC, a long non-coding RNA transcript, through transmission of the signal from IKK/NF-kB to the AMPK/mTOR pathway via regulation of GLUT1 expression. The inhibition of mTOR by this pathway results in the reduction of protein translation, cellular invasion and activation of autophagy. Several diseases and models as well as multiple genetic and pharmacological manipulations were used to investigate the mechanisms at play. The manuscript is well written and the experiments are well designed. The conclusions are supported by the results. The following major and minor comments should be addressed:

      Answer: We appreciate the reviewer for the positive comments on our study.


      Major Comments:


      1) In addition to reporting the effect of DRAIC overexpression on tumor volume, the authors should present survival studies with one or more models.__

      Answer: __We thought of doing the survival study in our glioblastoma model but unfortunately, the tumor growth is very rapid (exceeding the size permitted by our IACUC in 2-3 weeks). The animal ethics welfare committee did not allow us to keep the mice for a longer time to perform the survival study.



      2) Since the authors study metabolic energy sensor pathways, related to glycolysis, it would be important to perform some of the key experiments in physiological level of glucose: e.g., pmTOR, pAMPK, LC3-II expression level in DRAIC overexpressing and deficient cells.


      Answer: The concentration of glucose in plasma is 1G/L, while that of the RPMI medium is 2G/L. We do not think we are too far from the physiological levels of glucose.


      3) In addition to RT-PCR data, GLUT1 protein levels should be investigated in the different DRAIC expressing cells.

      Answer: We have incorporated the GLUT1 protein expression data from DRAIC KO cells in Figure 7B and DRAIC overexpressing cells in supplementary Figure 4G-H. The blots from the same gels were split into different panels, the loading control GAPDH remain same in Figure 4K and Supplementary Figure S4H.


      4) The effect of DRAIC on GLUT1 expression is also measured in condition of glucose saturation, which does not reflect disease state. The decrease of GLUT1 in response to DRAIC overexpression and the increased GLUT1 level in DRAIC deficient cells should be investigated in physiological levels of glucose.

      Answer Same as above. We are near physiological levels of glucose.


      __Minor Comments:

      __

      5) All the data are generated with established cell lines (e.g., U87) but more clinically relevant models, such as patient-derived primary cells like the ones used in Fig. S1, could be used to replicate some of the key findings.

      Answer: As we showed in Fig. S1 that DRAIC is not expressed in glioma stem cells, and so knockout experiments are not possible. We believe that the knockout experiments are the most relevant to this paper because they do not run the risk of artefacts from overexpression of an RNA far beyond physiological levels.


      6) Also please provide further details about the patient-derived cells from Fig. S1.

      Answer: We have mentioned the details of the cell lines in our modified manuscript.


      7) The statistical analysis section states that the number of measurements is indicated however I don't see the sample size of the experiments.

      Answer: We have now incorporated the number the experiments in our modified text.


      Reviewer #3 (Significance (Required)): The study reports a new model of regulation of tumor via long non-coding RNA. This article adds to the growing literature The topic and content of the article is relevant and significant to the field of tumor research but the significance and impact could be enhanced with the use of more physiologically relevant models and conditions as pointed in the major comments.

      Answer: We want to thank the reviewer for the positive feedback on our study.

    1. Not everyone is able to think or express themselves in the ways that we request. This is a barrier to seeming smart that is arbitrary but very real.

      This is something that I've been thinking about a lot lately, particularly as a white academic. Students have things to say, but if we require them to conform to "Standard American English" (I'm in the US, so I'm using SAE as an example, but this would presumably apply elsewhere) and scholarly/disciplinary academic writing expectations/form, then a teacher's fixation on "proper grammar" or "academic voice" or "scholarly form" likely DO stifle the ideas and knowledges that students may have to share ("How many points off for not using an Oxford comma?" "What if I use 'informal language' or 'slang'?") As one student explained just before they were about to graduate, they had spent most of their academic career "whitewashing" their ideas and language, often writing multiple drafts, first in their own voice, then progressively more "academic," so they were never actually able to express what they really had to say...until a two faculty said "just write it the way you want. Here are some models to consider, but you don't have to conform to these either."

    1. Author Response:

      Reviewer #1 (Public Review):

      [...] The authors do a great job of listing and evaluating possible explanations, one of which is simply that the strains carry multiple mutations of small effect. All but one of the successfully mapped variants consists of missense and nonsense mutations. I think it's important to note that this represents a particular range of the effect-size distribution of mutations affecting the YFP phenotype. We know from the authors' earlier work that there are lots of mutations that can affect gene expression in cis, and so the absence of trans-acting cis-regulatory variants here is parsimoniously interpreted as due to their small effects. In general, work in other systems (particularly human genetics) has shown that even molecular traits are often hugely polygenic, affected by thousands of variants of tiny but non-zero magnitude. With a forward screen of the sort performed here, it's difficult to know how much of the phenotypic variance is due to unmapped small-effect variants, but two lines of evidence suggest it may be a lot: first, the absence of mappable causal mutations in 36/82 mutants, and second, the differences between EMS mutant strains and their matched single-site mutants. The authors commendably report and discuss these issues but to my mind they neglect them in drawing inferences and generalizations from their findings.

      We thank the reviewer for these encouraging comments and also appreciate the reviewer pointing out these concerns.

      With respect to the overlap of the trans-regulatory mutations we mapped and previously identified eQTL, we agree the possibility of similar mapping biases in the two BSA-seq studies contributing to the overlap of trans-regulatory mutations and eQTL warrants further exploration. We interpret the reviewer’s comment to suggest that if some regions of the genome systematically showed lower sequencing read coverage (because of poor read mappability, PCR biases or any other reason), the power to detect trans-regulatory mutations and eQTL in these regions would be decreased compared to regions of the genome with higher coverage because the G-tests used to identify significant associations with expression in both studies are based on read counts. Consequently, variation in sequencing read coverage across the genome shared in this study and the prior study identifying eQTL, both of which used BSA-seq, could lead to the enrichment of transregulatory mutations in eQTL regions. Indeed, consistent patterns of read coverage across the S. cerevisiae genome have been observed in prior work.

      To determine whether trans-regulatory mutations were enriched in regions of the genome with higher sequence read coverage, we compared read coverage between regions of the genome identified as having trans-regulatory mutation or non-regulatory mutations. The identification of variants classified as non-regulatory is expected to be less dependent on the depth of sequencing read coverage because this designation does not require a statistically significant G-test. We found that the mutations identified as trans-regulatory showed 120x coverage whereas mutations identified as non-regulatory showed only 100x coverage, consistent with greater power to detect associations with expression in regions of the genome with higher sequencing read coverage. However, eliminating this difference in read coverage by excluding non-regulatory mutations with lower sequence read coverage did not eliminate the observed enrichment of trans-regulatory mutations in regions previously shown to contain eQTL. Non-regulatory mutations with higher and lower sequencing read coverage were also equally likely to be found within eQTL regions, suggesting that similar variation in sequence read coverage across the genome between the two studies is unlikely to explain the observed overlap of trans-regulatory mutations and eQTL. These analyses are now included in a new Figure 7-figure supplement 1.

      With respect to better incorporating biases in what we were able to map and considerations for extending findings from this work to other systems, we have tried to better address these issues in the revised discussion.

      Reviewer #2 (Public Review):

      Fabien Duveau et al. tried to characterize mutations in trans-regulation effects on expression of the TDH3 by using EMS mutants with TDH3 reporter in Saccharomyces cerevisiae. This work is an extension of works of Gruber et al. (2012) and Metzger et al. (2016) with specific mutation effect on TDH3 expression. They found that these trans-regulatory mutations that have effects on expression of TDH3 reporter were enriched in coding sequences of transcription factors. They found that the trans regulatory mutations with effect are associated with natural variants of trans within S. cerevisiae. In summary, the data is well described and supports their claims. The method of study could be used for study the mechanism how regulatory network works.

      [...] Although the paper does have strengths in principle, some weaknesses of the paper would cause the quality of data presented. [...]

      We thank the reviewer for taking the time to evaluate this work and have the following responses to the weaknesses noted:

      1) The reviewer is correct that we focused this paper on trans-regulatory mutations because cis-regulatory mutations affecting TDH3 expression were previously characterized. Furthermore, long distance enhancers with cis-acting effects on expression have not been described in S. cerevisiae and the term promoter is commonly used to encompass both the basal (core) promoter (including a TATA box for some genes) as well as other upstream activating sequences (UAS) and upstream repressing sequences (URS). In other words, the cis-acting sequences for S. cerevisiae genes are confined to a particular region much more than in multicellular eukarlyotes. In fact, our prior work with TDH3 (Metzger et al. 2015) showed that 97% of cis-acting variation affecting TDH3 expression could be explained by sequence variation in the 678 bp region we define as its promoter. Consequently, all mutations outside of this region were considered to have transregulatory effects on TDH3 expression. In the revised version, we extended the discussion to specifically compare the structure of regulatory sequences in S. cerevisiae to other eukaryotic model systems.

      2) In this study, a mutation is defined as trans-regulatory if it affects TDH3 expression and is not located in the TDH3 promoter, regardless of whether or not it also affects growth rate. In fact, mutations in RAP1 and GCR1 affect growth rate (Figure 5), but are clearly trans-regulators of TDH3 with well-established binding sites in the TDH3 promoter. In other words, we do not think that mutations should be discounted as having trans-regulatory effects because they also impact growth rate.

      3) (A) Prior work examining the statistics of BSA-seq has shown that G-tests are most appropriate because they take into account the independent sampling from two bulk populations inherent to bulk-segregant analysis (Magwene et al. 2011 PLOS Computational Biology). (B) We are guessing that the reviewer is asking about multiple testing corrections rather than post-hoc tests, as we used a false discovery rate correction for multiple tests in Figure 2-supplement 5A. Although we did not use a multiple test correction for the BSA-seq data, we used a conservative significance threshold of 0.001 that was expected to result in a 3.5% false positive rate. Perhaps more importantly, we functionally validated the effects of 40 of the 41 associated mutations tested. (C) We may indeed have been overly optimistic about mapping power when choosing mutants to analyze with BSA-seq given that the 36 EMS mutants for which we failed to find a significant association between a mutation and fluorescence tended to have smaller effects on PTDH3-YFP expression than the EMS mutants for which we observed one or more associated sites (Figure 3-figure supplement 3). The reviewer’s comment also made us realize that our original sentence referring to mapping power had reported the effect size for estimated RNA levels rather than fluorescence. To avoid confusion, and because our anticipated mapping power does not affect the results of the study, we deleted this statement from the revised manuscript. Regardless of our anticipated mapping power, we were ultimately able to map mutations that affected fluorescence by as little as 1.6% relative to the wildtype strain.

      4) The GO enrichment analysis was performed with widely used tools on www.pantherdb.org. The statistical significance of enrichment for each GO term was computed using Fisher’s exact tests that compared 1) the proportion of genes with non-regulatory mutations and 2) the proportion of genes with trans-regulatory mutations that corresponded to the tested GO term. Because the total number of genes identified in our study with trans-regulatory mutations (42 genes) was much lower than the total number of genes with non-regulatory mutations (1043 genes), it was possible to obtain strong and statistically significant enrichment (P < 0.05 in Fisher’s exact test) even if only a small number of genes corresponded to the GO term in both categories. Although we found a large number of enriched GO terms, these GO terms were not always independent from each other. For instance, GO:0009168 (purine ribonucleoside monophosphate biosynthetic process) and GO:0009167 (purine ribonucleoside monophosphate metabolic process) refers to the same biological process and contains the same genes. For this reason, even though we reported all enriched GO terms in Supplementary File 8, we only showed GO terms that were at the tips of different branches in the GO hierarchy on Figure 6 and we grouped GO terms in four main categories that together encompassed most genes with trans-regulatory mutations.

      5) We agree with the reviewer that trans-regulatory mutations can affect either the function of a gene product (including the ability of a transcription factor to bind to DNA) as well as the abundance of that gene product, but we do not think this is a weakness of the study. In fact, we think one of the strengths of the study is that we have empirical data testing the relative frequency of these two types of possible changes, finding that mutations in coding regions (presumably more likely to affect the function of the gene product than its expression) are the primary source of changes in TDH3 expression greater than 1%.

      6) The goal of the study was to characterize the effects of individual trans-regulatory mutations, thus we did not look at the combined effects of mutations in proteins that might work in a complex. We do, however, mention transcription factors working in a complex: "Transcription factors encoded by the TYE7 and GCR2 genes found to harbor trans-regulatory mutations affecting expression of PTDH3-YFP are known to regulate the expression of glycolytic genes (including TDH3) by forming a complex with transcription factors encoded by the RAP1 and GCR1 genes” (line 461). We think that looking at the combined effects of mutations that all impact the same complex of regulatory proteins is an interesting direction for future work.

      Finally, we’d like to point out that the reviewer’s statement in their opening summary about mutations being enriched in the coding sequence of transcription factors is not quite correct: the mutants we mapped were enriched in coding sequences, and we found more mutations in transcription factors previously shown to regulate (directly or indirectly) expression of TDH3 than expected by chance, but trans-regulatory mutations were not significantly enriched in genes encoding transcription factors relative to non-regulatory mutations (as described in the manuscript).

      Reviewer #3 (Public Review):

      [...] The mutagenesis approach in yeast the authors used is very powerful, but it naturally has drawbacks. The regulatory landscape in yeast is arguably simpler compared to e.g. metazoa or plants, in that the cis-regulatory regions are predominantly closely linked to target genes, the genes in majority do not have introns and post-transcriptional regulation of mRNA through e.g. splicing is rare. These features distinguish the systems, as in animals and plants introns are a very prominent source of regulatory elements (close to half of all enhancers are intronic in many animals), and alternative splicing of e.g. transcription factors are known to play major roles in transcriptional regulation. Further, chromatin is a very important layer in metazoan and plant gene regulation. To benefit the general readership, it would be informative to further elaborate on the significance of the findings for researchers studying other organisms. In addition, it would help to clarify what aspects of the differences in the regulatory landscape the authors think are important to distinguish.

      We thank the reviewer for their kind words and recognition of the novelty of this work. We have modified the introduction to try to clarify the relationship of this work to eQTL studies, which we hope addresses the reviewer’s first concern. Specifically, we’ve tried to clarify that the complex, polygenic nature of trans-regulatory variation segregating within a species is well established by prior eQTL studies. We also sought to clarify that our work (which maps single mutations from mutagenized strains rather than natural variation) provides complementary insight into the distribution of regulatory mutations within the genome and within a gene’s regulatory network. Revisions have also been made to try to clarify that the single mutations we mapped were from EMS-induced mutants containing only ~24 mutations per genome, which is more than 1000-fold less than the number of single nucleotide polymorphisms between two strains of S. cerevisiae. That is, this study was designed to identify single trans-regulatory mutations rather than to characterize the genetic architecture of naturally occurring trans-regulatory variation. Although we intentionally focused on characterizing properties of single mutations here, we agree with the reviewer that testing for epistatic interactions among trans-regulatory mutations will be an interesting avenue for future work, and have added this point to the revised discussion. We have also added text to the discussion describing some similarities and differences in gene regulation as among eukaryotes that should be considered when trying to generalize from this work.

    2. Reviewer #1 (Public Review):

      This paper aims to characterize mutations that act in trans to affect a single gene's expression in yeast. Trans-acting mutations potentially play an important role in variation and disease within species and in phenotypic evolution. The authors have previously described the mutational architecture and natural variation in the gene's cis-regulatory activity, creating a powerful experimental model for the causes of phenotypic variation. Trans-acting variation is much more challenging, because the mutational target space is the whole genome. The authors use a forward-genetic screen and bulked segregant analysis to identify 52 point mutations that affect their focal transgene's activity, and they identify an additional 17 by directly searching within a handful of candidate genes. The paper includes elegant validation using genome engineering to confirm the mapped variants are causal.

      With this collection of trans-acting mutations in hand, the authors can compare their characteristics to a set of mutations that do not have detectable effects on the transgene. Overall, they conclude that trans-acting mutations are enriched for genes that are known to sit upstream of the focal gene in transcriptional cascades, but the majority of mutations are in other kinds of genes, outside the network of transcription factors.

      This work is a valuable contribution to the authors' important experimental assault on the genetics of regulatory variation, a useful complement to their previous work on cis-regulatory mutations and polymorphisms. They provide evidence that experimentally defined regulatory networks have predictive value for the location of trans-acting mutations, and they reinforce the result (well established and widely accepted, but important to show in this kind of rigorous way) that trans-acting variation is distributed across a wide range of cellular and molecular functions. There are also some useful fine-grained results, such as the absence of mutations in a known regulator, RAP1, probably due to pleiotropic constraints, and an excess of mutations in iron homeostasis.

      Because the dataset of trans-acting mutations is relatively modest in size (necessarily- it's a heroic effort to identify this many), many of the enrichments are also modest. In particular, the finding that mutations are enriched in eQTL regions holds for only two of three previous eQTL studies, and involves a slight elevation over the baseline that 66% of the genome is in eQTL regions. Because both the eQTL and the mutations were discovered by bulked-segregant analysis, biases in mappability will affect both similarly, and so I do not find the enrichment for overlapping hits to be completely persuasive.

      This work is important in substantial measure because of its contribution to the larger yeast TDH3 model trait project, which is a landmark research program for understanding phenotypic variation and evolution. On its own, the results in this manuscript would be difficult to generalize to regulatory variation more broadly. There are narrow reasons for this (yeast has a distinctive compact CDS-dense genome; the focal transcript is YFP and so has no endogenous post-transcriptional regulation; only one class of mutations assayed), but the bigger reason is that the researchers are only able to discover mutations with effects above a particular size. Even among the 82 mutant strains they start with, some 36 strains have altered YFP levels but no successfully mapped causal variants. The authors do a great job of listing and evaluating possible explanations, one of which is simply that the strains carry multiple mutations of small effect. All but one of the successfully mapped variants consists of missense and nonsense mutations. I think it's important to note that this represents a particular range of the effect-size distribution of mutations affecting the YFP phenotype. We know from the authors' earlier work that there are lots of mutations that can affect gene expression in cis, and so the absence of trans-acting cis-regulatory variants here is parsimoniously interpreted as due to their small effects. In general, work in other systems (particularly human genetics) has shown that even molecular traits are often hugely polygenic, affected by thousands of variants of tiny but non-zero magnitude. With a forward screen of the sort performed here, it's difficult to know how much of the phenotypic variance is due to unmapped small-effect variants, but two lines of evidence suggest it may be a lot: first, the absence of mappable causal mutations in 36/82 mutants, and second, the differences between EMS mutant strains and their matched single-site mutants. The authors commendably report and discuss these issues but to my mind they neglect them in drawing inferences and generalizations from their findings.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary

      Most laboratory research using T. b. brucei has made use of two strains, the monomorphic Lister 427 strain and the pleomorphic EATRO1125 strain. These strains have been passaged in vitro for decades in separate laboratories, thus selecting for populations that differ from both the original isolate and between laboratories.

      In this study, Mulindwa et al. describe the isolation of two new T. b. brucei strains from cattle in Uganda, MAK65 and MAK98, which show differences in virulence and propensity for stumpy form differentiation in rodents. To assess the effect of culture adaptation on trypanosomes, the researchers compared the gene copy number of the MAK65 and MAK98 isolates before and after culture adaptation. The study found that isolates cultured for as little as a week already demonstrated a broader gene copy number distribution than those that were not culture adapted. Broader gene copy number distributions, compared to the non-culture-adapted isolates, were also observed for a number of routinely-passaged Lister 427 and EATRO1125 cultures. The researchers observed reproducible increases in copy number for certain genes, such as those encoding histones, HSP70 and PFR proteins. The study postulated that changes in gene copy number observed across the genome increased bias towards rapid proliferation and stress tolerance upon culture adaptation.

      Major comments

      I have some concerns about the method that was used for the copy-number calculations. Firstly, I can imagine that a non-uniform distribution of the reads across the genome for any of the datasets could influence the results. Was this checked? Secondly, in the methods section lines 394/395 it is said that the modal RPKM for each dataset was 'adjusted slightly' to get a symmetrical distribution. Upon checking the modal RPKMs and the adjusted values used for the calculations, the adjusted values appear to have been adjusted to different extents between the different datasets to fit the authors assumptions. Do the authors believe that this could perhaps account for some of the subtle gene copy number changes observed (as is discussed in lines 287-290)? Next, why were the reads aligned 20 times? I think in general the method needs to be explained far more clearly so that the audience can understand what you did.

      Minor comments

      Additional experiments:

      Would it be possible to generate a phylogenetic tree comparing these new isolates with the Lister 427, TREU927 and EATRO1125 isolates in circulation to get an idea of how these strains may have evolved? I this could add to the strength of the manuscript.

      Title: Since no other unicellular eukaryotes are mentioned throughout the text, I think a more appropriate title for this study might be 'Adaptation of Trypanosoma brucei brucei bloodstream forms to in vitro culture results in gene copy-number changes.'

      Line 59: There have been more recent studies than the referenced paper (Cross et al., 2014) that quote far higher numbers of alternative VSG genes or pseudogenes in the Lister 427 strain. In Müller et al (2018, doi: 10.1038/s41586-018-0619-8) over 2500 VSGs are quoted and in Cosentino et al (2021, doi: 10.1101/2021.04.13.439624) 2872 VSGs are identified.

      Line 109/110: The dates of isolation written here, Feb 1st and July 30th 2016, are different to what is written under the Date of Isolation column in Supplementary text 1, table 1- 25/5/2016 and 30/7/2017. Do these two dates represent different things?

      Line 233: Should Figure 5 A, B, C and E (rather than A, B, D and E) be referenced here to show all the initial, not cultured results?

      Line 270/271: PIP39 does not promote differentiation to stumpy forms, but instead contributes towards the efficient differentiation of stumpies to procyclic forms (Szoor et al, 2010, doi: 10.1101/gad.570310).

      Discussion:

      It should be discussed in the text that changes in gene copy number have also been observed upon Leishmania culture adaptation (see, for e.g., Gerald Späth's work). This will strengthen the authors' conclusions. Furthermore, similar observations of aneuploidy and triploidy in some T. brucei Lister 427 strains have recently been reported in Cosentino et al (2021, doi: 10.1101/2021.04.13.439624). This could also be referenced somewhere in the text.

      Line 364: I think the Nijuru et al (2005, doi: 10.1007/s00436-004-1267-5) paper would be the correct reference here since it describes the use of the ITS-1 PCR. Reference 13 is actually referring to another paper, I cannot find Nijuru et al in the references list.

      Line 379: PAD staining should be referenced to be more informative and allow reproducibility.

      Figure 1, Line 580: How many cells were counted for determining the % PAD positivity?

      Figure 1, Line 581: Scale bar should be included in D.

      Supplement Text 1:

      I found Supplement Text 1 a little confusing for two reasons. Firstly, I was never sure if the tables or references being referenced were from the main text or from the supplement text 1, perhaps this could be made a little clearer to aid the reader. Secondly, the names of the isolates switched between, for e.g., MAK 65 and Tb065. For simplicity it could help to try and stick to one naming system.

      It might be worth adding a sentence about why Tb236B was not followed up. Was this because it could not easily be distinguished from MAK65 by microsatellite analysis?

      S3 Figure: Legend describes red bars but there are none in the figure.

      Table 2: In the legend for Sheet 2, the cut-off for the increase is missing.

      Significance

      Though the T. b. brucei Lister 427 and EATRO1125 strains are used most commonly for laboratory-based research, they have been extensively passaged in vitro without characterisation of the changes that have occurred between them and the original isolate, or indeed, between laboratories.

      Mulindwa et al. demonstrated that changes to gene copy number occur rapidly upon adaptation to culture of field isolates, and that different cultures of the same isolate can furthermore have different ploidies. This is an important advance and raises awareness a) that trypanosomes undergo changes upon laboratory adaptation, b) of the nature of some of these changes and c) that the changes can occur rapidly. Changes to gene copy number have also been shown to effect Leishmania donovani upon culture adaptation (Prieto Barja et al, 2017, doi: 10.1038/s41559-017-0361-x).

      This study will be of interest to the trypanosome community in general, but particularly those who work on biology that we already know is impacted by prolonged in vitro passage-differentiation, virulence and antigenic variation.

      Reviewer expertise: BSF differentiation, antigenic variation

    1. Author Response:

      Reviewer #1 (Public Review):

      [...] Authors' rigorous experimental design (based on bacterial genetics and structural biology), solid biochemical assays (including photo-crosslinking, cysteine crosslinking, and Western blotting), and carefully drawn interpretation and conclusions are impressive. Finally, authors delineate the mechanisms of BepA activation and LptD biogenesis, which are supported by the current and previous studies by the authors and other research groups.

      Thank you for the nice summarization and the positive evaluation of our study.

      While this is overall a wonderful piece of work, this manuscript would be further improved by clarifying the following points:

      1. Authors examined how mutations (Pro and Cys scanning) on the edge-strand of BepA affected degradation and maturation of LptD.

      It was assumed that these mutations impact the structure of BepA only locally. However, a mutational effect can be propagated in an unexpected way affecting the structural integrity of other regions. Although authors tested that A106P retains proteolytic activity as shown by self-cleavage, a similar test (for example, in vitro experiments using a structureless substrate) may need to be extended to other mutations to support the conclusions.

      Thank you for the suggestion. Unfortunately, we have not succeeded in reproducibly detecting the proteolytic activity of BepA with purified BepA even when an unstructured substrate (-casein) is used and the assay was conducted at an elevated temperature, possibly because the protases activity of isolated BepA is tightly repressed by the mechanism that included His-246-mediated regulation as described in our paper (please see Introduction). Although BepA mutants with a mutation of His-246 or a deletion of H9 loop (these mutations release the His-246-mediated repression) significantly degrade -casein, a combination of these mutations with the edge-strand mutations should make the interpretation of the results complicated. We thus think that the suggested experiments cannot be conducted soon.

      Instead, we described the following points in the revised manuscript. Although we mentioned the self-cleavage activity of only the A106P mutant in the original manuscript, our results showed that the other edge-strand Pro mutants (other than F107P) exhibited significant self-cleavage activities as well (Figure 1-figure supplement 2B). In addition, the Pro mutants other than the A106P mutant degraded mis- or un-folded BamA at a detectable level (Figure 1-figure supplement 2A). Furthermore, all the Pro mutants accumulated at a level comparable to that of wild-type BepA. These observations together indicate that most of the Pro mutations specifically affected the edge-strand structure, but not drastically altered the active site or the protein's overall structures. We described the above points in the revised text (line 174 in p7 to line 181 in p8).

      1. In the result (Line 159), authors report chaperone-like activity of BepA. Here, the term "chaperone-like" is rather obscure regarding whether this activity facilitates LptD maturation without proteolysis (i.e., via holdase activity), or involves proteolysis as a part of quality control mechanisms. In another experiment, authors show that the chaperone-like activity may not necessarily involve proteolysis. It would be good to describe a possible molecular principle of how the edge-stand binding to the substrate can lead to chaperone activity.

      We suppose that the interaction of BepA (via the edge-strand) with an assembly intermediate of LptD on the BAM complex stabilizes the partially unfolded assembly intermediate of LptD on the BAM complex to help the association of LptE with LptD. This was explained in Discussion (lines 388–392 in p16) and the legend to Figure 5.

      Reviewer #2 (Public Review):

      The authors found that a conserved β-strand (edge-strand β2 of BepA) directly contacts with the N-terminal half of the β-barrel-forming domain of an immature LptD; the C-terminal region of the β-barrel-forming domain of the BepA-bound LptD intermediate interacts with a "seam" strand of BamA in the BAM complex. By combining crosslinking and mutational studies, they showed that the edge-strand of BepA may have both the proteolytic and the chaperone-like functions. Based on the authors' previous studies of BepA, they proposed a model that the edge-strand and His switch of BepA regulate BepA in LptD assembly and degradation.

      Thank you for the nice summarization of our study.

      Reviewer #3 (Public Review):

      [...] By performing an impressive systematic cross linking analysis, combined with previous known findings, the authors are able to dissect the general architecture of how BepA interacts with beta-barrel substrates as they are being assembled by the Bam complex. The experiments presented are nicely executed and the data are of high quality. I am convinced that the edge strand of BepA interacts with LptD, likely as it is assembling on the Bam complex. It is also clear that this interaction is functionally important because mutations in this region that disrupt the BepA-LptD interaction interfere with LptD maturation and degradation. This suggests that the substrate binding to the protease domain of BepA is important for both its chaperone and proteolytic activity. The work is well executed and will be of value to others interested in the regulation of membrane protein folding and, more broadly, in the biogenesis of the bacterial cell envelope.

      Thank you for the nice summarization and the positive evaluation of our study.

      While the authors conclusively establish a link between this region of BepA and its function, the data do not explain the underlying mechanism of how BepA discriminates between substrates targeted for integration into the membrane and those targeted for destruction. The model proposed at the end incorporates the presence of the edge strand of BepA, but its role in the process remains vague. As mentioned in the discussion, the mechanisms that control the switch from chaperone to protease function in BepA is likely governed by the loops that gate access to the catalytic residues proximal to the edge strand. It is possible that the edge strand may just be reporting on substrate binding to the protease domain active site. While this may be important for substrate recognition, it does not mean that the edge strand-substrate interaction plays a deterministic role in subsequent protein triage during LptD assembly.

      Our data demonstrated that the edge-strand of BepA directly binds a substrate. As pointed out by the reviewer, the involvement of the edge-strand in substrate binding has been known for other proteases. However, it was not known whether the substrate binding at the edge-strand contributes to the chaperone-like function; it was possible that the binding sites of a substrate on BepA during its proteolysis and its maturation are totally different as the chaperone-like activity of BepA is independent of its protease activity (it was conceivable, for example, that substrate binding during it maturation occurs on the surface of the C-terminal TPR domain that has been shown to interact with LptD). Our results showed that the defective binding of a substrate (LptD) at the edge-strand impairs not only its proteolysis but also its normal maturation (assembly). Because the edge-strand-bound substrate would be directly presented to the proteolytic active site for its degradation, this binding step should be important for the determination of the fates of the bound substrate. Our results strongly suggest that the substrate binding by the edge-strand is a crucial common step required for the subsequent protein triage during the LptD assembly.

    1. At the same time, as a team we wanted to try and resist being reductive about the very complex social environments that we’re depicting.

      I think this is a super key consideration, and I think we all have to acknowledge the possibility that our findings for this project are only a part of the story, and may not be representative of the entire campus community. I'm really looking forward to seeing the final project and learning about your interpretation of the results.

  2. Jul 2021
    1. Author Response:

      Reviewer #1:

      Insulin-secreting beta-cells are electrically excitable, and action potential firing in these cells leads to an increase in the cytoplasmic calcium concentration that in turn stimulates insulin release. Beta-cells are electrically coupled to their neighbours and electrical activity and calcium waves are synchronised across the pancreatic islets. How these oscillations are initiated are not known. In this study, the authors identify a subset of 'first responders' beta-cells that are the first to respond to glucose and that initiate a propagating Ca2+ wave across the islet. These cells may be particularly responsive because of their intrinsic electrophysiological properties. Somewhat unexpectedly, the electrical coupling of first responder cells appears weaker than that in the other islet cells but this paradox is well explained by the authors. Finally, the authors provide evidence of a hierarchy of beta-cells within the islets and that if the first responder cells are destroyed, other islet cells are ready to take over.

      The strengths of the paper are the advanced calcium imaging, the photoablation experiments and the longitudinal measurements (up to 48h).

      Whilst I find the evidence for the existence of first responders and hierarchy convincing, the link between the first responders in isolated individual islets and first phase insulin secretion seen in vivo (which becomes impaired in type-2 diabetes) seems somewhat overstated. It is is difficult to see how first responders in an islet can synchronise secretion from 1000s (rodents) to millions of islets (man) and it might be wise to down-tone this particular aspect.

      We thank the reviewer for highlighting this point. We acknowledge that we did not measure insulin from individual islets post first responder cell ablation, where we observed diminished first phase Ca2+. We do note that studies have linked the first phase Ca2+ response to first phase insulin release [Henquin et al, Diabetes (2006) and Head et al, Diabetes (2012)], albeit with additional amplification signals for higher glucose elevations. Thus a diminished first phase Ca2+ would imply a diminished first phase insulin (although given the amplifying signals the converse would not necessarily be the case).

      Nevertheless there are also important caveats to our experiment. Within islets we ablated a single first responder cell. In small islets this ablation diminished Ca2+ in the plane that we imaged. In larger islets this ablation did not, pointing to the presence of multiple first responder cells. Furthermore we only observed the plane of the islet containing the ablated first responder. It is possible elsewhere in the islet that [Ca2+] was not significantly disrupted. Thus even within a small islet it is possible for redundancy, where multiple first responder cells are present and that together drive first phase [Ca2+] across the islet. Loss of a single first responder cell only disrupts Ca2+ locally. That we see a relationship between the timing of the [Ca2+] response and distance from the first responder would support this notion. Results from the islet model also support this notion, where >10% of cells were required to be ablate to significantly disrupt first-phase Ca2+.

      While we already discuss the issue of redundancy in large islets and in 3D, we now briefly mention the importance of measuring insulin release.

      Reviewer #2:

      Kravets et al. further explored the functional heterogeneity in insulin-secreting beta cells in isolated mouse islets. They used slow cytosolic calcium [Ca2+] oscillations with a cycle period of 2 to several minutes in both phases of glucose-dependent beta cell activity that got triggered by a switch from unphysiologically low (2 mM) to unphysiologically high (11 mM) glucose concentration. Based on the presented evidence, they described a distinct population of beta cells responsible for driving the first phase [Ca2+] elevation and characterised it to be different from some other previously described functional subpopulations.

      Strengths:

      The study uses advanced experimental approaches to address a specific role a subpopulation of beta cells plays during the first phase of an islet response to 11 mM glucose or strong secretagogues like glibenclamide. It finds elements of a broadscale complex network on the events of the slow time scale [Ca2+] oscillations. For this, they appropriately discuss the presence of most connected cells (network hubs) also in slower [Ca2+] oscillations.

      Weakness:

      The critical weakness of the paper is the evaluation of linear regressions that should support the impact of relative proximity (Fig. 1E), of the response consistency (Fig. 2C), and of increased excitability of the first responder cells (Fig. 3B). None of the datasets provided in the submission satisfies the criterion of normality of the distribution of regression residuals. In addition, the interpretation that the majority of first responder cells retain their early response time could as well be interpreted that the majority does not.

      We thank the reviewers for their input, as it really opened multiple opportunities for us to improve our analysis and strengthen our arguments of the existence and consistency of the first responder cells. We present more detailed analysis for these respective figures below and describe how these are included in the manuscript.

      As it is described below, we performed additional in-depth analysis and statistical evaluation of the data presented in figures 1E, 2C, and 3B. We now report that two of the datasets (Fig.1 E, Fig.2 C) satisfy the criterion of normality of the distribution of regression residuals. The third dataset (Fig.3 B) does not satisfy this criterion, and we update our interpretation of this data in the text.

      Figure 1E Statistics, Scatter: We now show the slope and p-value indicating deviation of the slope from 0, and r^2 values in Fig.1 E. While the scatter is large (r^2=0.1549 in Fig.1E) for cells located at all distances from the first responder cell, we found that scatter substantially diminishes when we consider cells located closer to the first responder (r^2=0.3219 in Fig.S1 F): the response time for cells at distances up to 60 μm from the first responder cells now is shown in Fig.S1 F. The choice of 60 μm comes from it being the maximum first-to-last responder distance in our data set (see red box in Fig.1D).

      Additionally, we noticed that within larger islets there may be multiple domains with their own first responder in the center (now in Fig.S1 E) and below. Linear distance/time dependence is preserved withing each domain.

      Figure 1E Normality of residuals: We appreciate reviewer’s suggestion and now see that the original “distance vs time” dependence in Fig.1 E did not meet normality of residuals test. When plotted as distance (μm)/response time (percentile), the cumulative distribution still did not meet the Shapiro-Wilk test for normality of residuals (see QQ plot “All distances” below). However, for cells located in the 60 μm proximity of the first responder, the residuals pass the Shapiro- Wilk normality test. The QQ-plots for “up to 60 μm distances” are included in Fig.S1 G.

      Figure 2C Statistic and Scatter: After consulting a biostatistician (Dr. Laura Pyle), we realized that since the Response time during initial vs repeated glucose elevation was measured in the same islet, these were repeated measurements on the same statistical units (i.e. a longitudinal study). Therefore, it required a mixed model analysis, as opposed to simple linear regression which we used initially. We now have applied linear mixed effects model (LMEM) to LN- transformed (original data + 0.0001). The 0.0001 value was added to avoid issues of LN(0).

      We now show LMEM-derived slope and p-value indicating deviation of the slope from 0 in Fig.2 C. Further, we performed sorting of the data presented in Fig.2 C by distance to each of the first responders (now added to Fig.2D). An example of the sorted vs non-sorted time of response in the large islet with multiple first responders is added to the Source Data – Figure 1. We found a substantial improvement of the scatter in the distance- sorted data, compared to the non-sorted, which indicates that consistency of the glucose response of a cell correlates with it’s proximity to the first responder. We also discuss this in the first sub-section of the Discussion.

      Figure 2C Normality of residuals: The residuals pass Shapiro-Wilk normality test for LMEM of the LN-transformed data. We added very small number (0.0001) to all 0 values in our data set, presented in Fig.2C, D, and Fig.S4 A, to perform natural-log transformation. Details on the LMEM and it’s output are added to the Source data – Statistical analysis file.

      Figure 3B Statistic and Scatter: We now show LMEM-derived slope and p-value, indicating deviation of the slope from 0, values in Fig.3 B (below). The LMEM-derived slope has p-value of 0.1925, indicating that the slope is not significantly different from 0. This result changes our original interpretation, and we now edit the associated results and discussion.

      Figure 3B Normality of residuals: This data set does not pass Shapiro-Wilk test.

      A major issue of the work is also that it is unnecessarily complicated. In the Results section, the authors introduce a number of beta cell subpopulations: first responder cell, last responder cell, wave origin cell, wave end cell, hub-like phase 1, hub-like phase 2, and random cells, which are all defined in exclusively relative terms, regarding the time within which the cells responded, phase lags of their oscillations, or mutual distances within the islet. These cell types also partially overlap.

      To address this comment, we added Table 1 to describe the properties of these different populations.

      Their choice to use the diameter percentile as a metrics for distances between the cells is not well substantiated since they do not demonstrate in what way would the islet size variability influence the conclusion. All presented islets are of rather a comparable size within the diffusion limits.

      We replaced normalized distances in Fig.1 D with absolute distance from first responder in μm.

      The functional hierarchy of cells defining the first response should be reflected in the consistency of their relative response time. The authors claim that the spatial organisation is consistent over a time of up to 24 hours. In the first place, it is not clear why would this prolonged consistency be of an advantage in comparison to the absence of such consistency. The linear regression analysis between the initial and repeated relative activation times does suggest a significant correlation, but the distribution of regression residuals of the provided data is again not normal and non-conclusive, despite the low p-value. 50% of the cells defined a first responder in the initial stimulation were part of that subpopulation also during the second stimulation, which is rather random.

      We began to describe our analysis of the response time to initial and repeated glucose stimulation earlier in this reply. Further evidence of the distance-dependence of the consistency of the response time is now presented in Fig.S4 A: a response time consistency for cells at 60 μm, 50μm, and 40 μm proximity to the first responder. The closer a cell is located to the first responder, the higher is the consistency of its response time (the lower the scatter), below.

      If we analyze this data with a linear regression model, where the r^2 allows us to quantitatively demonstrate decrease of the scatter, we observe r^2 of 0.3013, 0.3228, 0.3674 respectively for cells at 60 μm, 50μm, and 40 μm proximity to the first responder (below). This data is not included in the manuscript because residuals do not pass Shapiro-Wilk Normality test for this model (while they do for the LMEM).

      One of the most surprising features of this study is the total lack of fast [Ca2+] oscillations, which are in mouse islets, stimulated with 11 mM glucose typically several seconds long and should be easily detected with the measurement speed used.

      Our data used in this manuscript contains Ca2+ dynamics from islets with a) slow oscillations only, b) fast oscillations superimposed on the slow oscillations, c) no obvious oscillations (likely continual spiking). Representative curves are below. Because we focused our study on the slow oscillations, we used dynamics of type (a) in our figures, which formed an impression that no fast oscillations were present. In our analysis of dynamics of type (b) we used Fourier transformation to separate slow oscillations from the fast (described in Methods). Dynamics of type (c) were excluded from the analysis of the oscillatory phase, and instead only used for the first-phase analysis. We indicate this exclusion in the methods.

      And lastly, we should also not perpetuate imprecise information about the disease if we know better. The first sentence of the Introduction section, stating that "Diabetes is a disease characterised by high blood glucose, …" is not precise. Diabetes only describes polyuria. Regarding the role of high glucose, a quote from a textbook by K. Frayn, R Evans: Human metabolism - a regulatory perspective, 4rd. 2019 „The changes in glucose metabolism are usually regarded as the "hallmark" of diabetes mellitus, and treatment is always monitored by the level of glucose in the blood. However, it has been said that if it were as easy to measure fatty acids in the blood as it is to measure glucose, we would think of diabetes mellitus mainly as a disorder of fat metabolism."

      We acknowledge that Diabetes alone refers to polyurea, and instead state Diabetes Mellitus to be more precise to the disease we refer to. We stated “Diabetes is a disease characterized by high blood glucose, ... “ as this is in line with internationally accepted diagnoses and classification criteria, such as position statements from the American Diabetes Association [‘Diagnosis and Classification of Diabetes Mellitus” AMERICAN DIABETES ASSOCIATION, DIABETES CARE, 36, (2013)]. We certainly acknowledge the glucose-centric approach to characterizing and diagnosing Diabetes Mellitus is largely born of the ease of which glucose can be measured. Thus if blood lipids could be easily measured we may be characterizing diabetes as a disease of hyperlipidemia (depending how lipidemia links with complications of diabetes).

    2. Reviewer #3 (Public Review): 

      Pancreatic beta cells in each islet of Langerhans act cooperatively to produce a coordinated response to blood glucose, which takes the form of roughly synchronous electrical oscillations that result in coherent pulses of insulin secretion. However, it has become more appreciated recently that the cells are heterogeneous and that there may subgroups of cells within the islet that contribute in different ways or control different aspects of the collective behavior. 

      This paper addresses a subset of cells, termed first responders, that are the earliest to transition into activity when glucose is stepped up from a sub-stimulatory level. It is shown that the first responders determine the first transient phase of electrical activity, and implicitly secretion, that precedes the start of steady-state oscillations of the second phase. The first phase is of interest for the pathogenesis of diabetes because it is (or claimed to be) one of the first indications of the disease. The clinical data on this are actually ambiguous, but nonetheless the first phase is an important aspect of how secretion from islets is organized. This is clear from the existence of a subset of readily releasable insulin vesicles, but the electrical activity correlates that synergize with vesicle availability are less well understood. As such, the paper is an important contribution to both islet biology and potentially diabetology. 

      The approach is to use a genetically encoded calcium indicator, GCamps, and confocal imaging to identify cells that show the earliest rise in calcium and then to verify that this property remains consistent when the glucose stimulus is repeated about an hour later. Further testing over 48 h shows the first responder slowly fading. This thus appears to be a matter of continuous variation of cell properties within the islet and moreover one that is persistent but decaying over long-enough periods of time, rather than a discrete sub-type of beta cell. This is confirmed by simulations using an islet model in which first responders emerge from random variation of properties. The text is a bit ambiguous about whether we should think of the glass as half full or half empty and could be clarified in this regard. 

      The properties that matter are the density of KATP channels and gap junctional coupling which are both lower. This is also confirmed by simulations. The intriguing suggestion is made that there may be a reciprocal negative feedback relationship between these quantities that regulates their variation over time. This would be rather different from a persistent genetic difference and would be a good subject for future investigation. 

      Interestingly and perhaps surprisingly, increased sensitivity to glucose is not a feature of the first responders. In contrast, other putative "leader" or "hub" cells that have been identified for the second phase of electrical activity are proposed to have increased glucose sensitivity. This and other features lead to the conclusion that the two types of leader cells are probably distinct. The much-discussed topic of hub cells provides interesting context and relevance to the paper, but its results stand by themselves and can be judged independent of hubs.

    1. Author Response:

      Reviewer #1 (Public Review):

      Sokolsky et al. propose a new statistical model class for descriptive modeling of stimulus encoding in the spiking activity of neural populations. The main goals are to provide a model family that (G1) captures key activity statistics, such as spike count (noise) correlations, and their stimulus dependence, in potentially large neural populations, (G2) is relatively easy to fit, and (G3) when used as a forward encoder model for Bayesian decoders leads to efficient and accurate decoding. There are also three additional goals or claims: (C1) that this descriptive model family can serve to quantitatively test computational theories of probabilistic population coding against data, (C2) that the model can offer interpretable representations of information-limiting noise correlations, (C3) that the model can be extended to the case of temporal coding with dynamic stimuli and history dependence.

      The starting point of their model is a finite mixture of independent Poisson distributions, which is then generalized and extended in two ways. Due to the "mixture", the model can account for correlations between neurons (see G1). As any mixture model, the model can be viewed in the language of latent variables, which (in this case) are discrete categorical variables corresponding to different mixture components. The two extensions of the model are based on realizing that the joint distribution (of the observed spike counts and the latent variables) is in the exponential family (EF), which opens the door to powerful classical results to be applied (e.g. towards G2-G3), and allows for the two extensions by: (E1) generalizing Poisson distributions in mixture components to Conway-Maxwell-Poisson distributions, and (E2) introducing stimulus dependence by allowing the natural parameters of the EF to depend on stimulus conditions. They call the resulting model a Conditional Poisson Mixture or CPM (although the "Poisson" in CPM really means Conway-Maxwell-Poisson). E1 is key for capturing under-dispersion, i.e. Fano Factors below 1. For the case of discrete set of stimulus conditions, they propose minimal, maximal versions of E2; depending on which natural parameters are stimulus dependent. In the case of a continuum of stimuli (they only consider 1D continuum of stimulus orientations, e.g. in V1 encoding) they also consider a model-based parametric version of the minimal E2 which gives rise to Von Mises orientation tuning curves.

      Strengths:

      -Proposing a new descriptive encoding model of spike responses that can account for sub-poissonian and correlated noise structure, and yet can be tractably fit and accurately decoded.

      -Their experiments with simulated and real (macaque V1) data presented in Figs. 2-5 and Tables 1-2 provide good evidence that the model supports G1-3.

      -Working out a concrete Expectation Maximization algorithm that allows efficient fits of the model to data.

      -Exploiting the EP framework to provide a closed form expression for the model's Fisher Information for the minimal model class, a measure that plays a key role in theoretical studies of probabilistic population coding.

      As such, the papers makes a valuable contribution to the arsenal of descriptive models used to describe stimulus encoding in neural population, including the structure and stimulus dependence of their higher-order statistics.

      Thank you very much for your thorough, exact, and positive evaluation of our manuscript!

      Weaknesses:

      1) I found the title and abstract too vague, and not informative enough as to the concrete contributions of this paper. These parts should more concretely and clearly describe the proposed/developed model family and the particular contributions listed above.

      We found your summary of the paper and model to be highly accurate, and we rewrote the abstract to summarize the key strengths as you’ve listed them. We found it difficult to develop a more exact title which wasn’t overlong, so we left it as is.

      2) I was not convinced about claims C1 and C2 (which also contribute to the vagueness of abstract), but I think even without establishing these claims the more solid contributions of the paper are valuable. And while I can see how the model can be extended towards C3, there are no results pertaining to this in the current paper, nor even a concrete discussion of how the model may be extended in this direction.

      2.1) Regarding C1, the claim is supposed to follow from the fact that the model's joint distribution is in the exponential family (EF), and that they have reasonably shown G1-G3 (in particular, that it captures noise correlations and its Bayesian inversion provides an accurate decoder). While I agree with the latter part, what puzzles me is that in the probabilistic population coding (PPC) theoretical models that claim can be quantitatively tested using their descriptive model are, as far as I remember/understand, the encoder itself is in EF. By contrast here the encoder is a mixture of EF's and as such is not itself in EF. Perhaps this distinction is not key to the claim - but if so, this has to be clearly explained, and more generally the exact connection between the descriptive encoder model here and the models used in the PPC literature should be better elaborated.

      This claim was indeed poorly explained in our manuscript, and not self-evident. There is a deeper connection between our conditional models and PPCs, which we now make explicit in a new section of the manuscript (Constrained conditional mixtures support linear probabilistic population coding, line 364), which includes an equation (Equation 4) that shows their exact relationship.

      2.2) Regarding C2, I do not see how their results in Fig 5 (and corresponding section) provide any evidence for this claim. As a theoretical neuroscientist, I take "interpretable" to mean with a mechanistic or computational (theoretical) interpretation. But, if anything, I think the example studied in Fig 5 provides a great example of the general point: that even when successful descriptive models accurately capture the statistics of data, they may nevertheless not reveal (or even hide or mis-identify) the mechanisms underlying the data. In this example's ground-truth model, the stimulus (orientation) is first corrupted by input noise and then an independent population of neurons with homogeneous tuning curves (and orientation-independent average population rate) responds to this corrupted version of the stimulus. That is a very simple AND mechanistic interpretation (which of course is not manifest to someonw only observing the raw stimulus and spiking data). The fit CPM, on the other hand, does not reveal the continuous input noise mechanism (and homogeneous population response) directly, but instead captures the resulting noise correlation structure by inferring a large (~20) number of mixture components, in each of which population response prefers a certain orientation. For a given stimulus orientation, the fluctuations between (3-4 relevant) mixture components then approximate the effect of input noise. This captures the generated data well, but misses the true mechanism and its simpler interpretation. Let me be clear that I don't take this as a fault of their descriptive model. This is a general phenomenon, despite which their descriptive model, like any expressive and tractible descriptive model, still can be a powerful tool for neural data analysis. I'm just not convinced about the claim.

      This is a very fair point, and we’ve reformulated a few passages to emphasize that the model is primarily descriptive, at least in our applications in the paper (see new section title at like 393, the first corresponding paragraph).

      2.3) Regarding C3, I think the authors can at least add a discussion of how the model can be extended in this direction (and as I'm sure they are aware, this can be done by generalizing the Von Mises version of the model, whereby the model I believe can be more generally thought of as a finite mixture of GLMs).

      In Appendix 4 we detail the relationship between CPMs and GLMs. We also note here that, at least as far as we understand, CPMs are formally distinct from finite mixtures of GLMs — the easiest way to see this distinction is to note that the index probabilities of a CPM depend on the stimulus, whereas the equivalent index probabilities in a finite mixture of GLMs would not. We have also explained this in Appendix 4.

      Reviewer #2 (Public Review):

      Sokoloski, Aschner, and Coen-Cagli present a modeling approach for the joint activity of groups of neurons using a family of exponential models. The Conway-Maxwell (CoM) Poisson models extend the "standard" Poisson models, by incorporating dependencies between neurons.

      They show the CoM models and their ability to capture mixture of Poisson distributions. Applied to V1 data from awake and anesthetized monkeys, they show it captures the Fano Factor values better than simple Poisson models, compare spike count variability and co-variability. Log-likelihood ratios in Table 1 show on-par or better performance of different variant of the CoM models, and the optimal number of parameters to use for maximizing the likelihood [balancing accuracy and overfitting] and are useful for decoding. Finally, they show how the latent variables of the model can help interpret the structure of population codes using simple simulated Poisson models over 200 neurons.

      In summary, this new family of models offer a more accurate approach to the modeling and study of large populations, and so reflects the limited value of simple Poisson based models. Under some conditions it gives has higher likelihood than Poisson models and uses fewer parameters than ANN model.

      However, the approach, presentation, and conclusions fall short on several issues that prevents a clear evaluation of the accuracy or benefits of this family of models. Key of them is the missing comparison to other statistical models.

      1) Critically, the model is not evaluated against other commonly used models of the joint spiking patterns of large populations of neurons. For example: GLMs (e.g. Pillow et al Nature 2008), latent Gaussian models (e.g. Macke et al Neural Comp 2009), Restricted Boltzmann Machines (e.g. Gardella et al PNAS 2018), Ising models for large groups of neurons (e.g. Tkacik etal PNAS 2015, Meshulam et al Neuron 2017), and extensions to higher order terms (Tkacik et al J Stat Mech 2013), coarse grained versions (Meshulam et al Phys Rev Lett 2019), or Random Projections models (Maoz et al biorxiv 2018).

      . Most of these models have been used to model comparable or even larger populations than the ones studied here, often with very high accuracy, measured by different statistics of the populations and detailed spiking patterns (see more below). Much of the benefit or usefulness of the new family of models hinges on its performance compared to these other models.

      We agree very much with this point, and have done our best to address it by thoroughly comparing our model with a factor analysis encoding model in Appendices 1 and 2, and summarizing these results at appropriate points in the manuscript (lines 196–199 and 325–328). In particular, we visualized and compared the performance of factor analysis with our mixture models, and found that (i) factor analysis is better at capturing the first and second order statistics of the data, but (ii) when evaluated on held-out data, the performance gap more-or-less vanishes. Moreover, we found that an encoding model based on FA performs poorly as a Bayesian decoder, and we provided preliminary evidence that this is because our mixture models can capture higher-order statistics that FA cannot. We believe that these results have been very valuable to conveying the strengths and weaknesses of the mixture model approach.

      We have also extended the introduction to explain the differences between other model families suggested by the reviewer and our approach, to explain how the different assumptions about the form of data make it difficult to compare them quantitatively (see lines 42–63). To wit, GLMs and latent Gaussian models are both models that critically depend on modelling spike trains, and not spike counts. On the other hand, Restricted Boltzmann machines, Ising models, and random projection models all assume binary, rather than counting spiking data. As such, any comparison would depend on coming up with methods for either (i) reshaping our datasets and comparing spike- train/binary spike-count likelihoods to trial-to-trial likelihoods, or (ii) extending our conditional mixture approach to temporal/binary data, both of which are beyond the scope of our paper. We instead used factor analysis because it has been applied widely to modelling trial-to-trial spike counts, and thus avoid further transformations that might reduce the validity of our comparisons.

      2) As some of these models are exponential models, their relations to the family of the models suggested by the authors is relevant also in terms of the learned latent variables. Moreover, the number of parameters that are needed for these different models should be compared to the CoM and its variants.

      In our comparisons with factor analysis we also compared number of latent states/dimensions required to achieve maximum performance. Overall FA was consistently the most efficient, at least when evaluated on the ability to capture second-order statistics, although our mixture models also performed quite well with modest numbers of parameters.

      3) The analysis focuses on simple statistics of neural activity, like Fano Factors (Fig. 2) and visual comparisons rather than clear quantitative ones. More direct assessments of performance in terms of other spiking statistics for single neurons and small groups (e.g., correlations of different orders ) and direct comparison to individual spiking patterns (which would be practical for groups of up to 20 neurons) would be valuable

      In the Appendix 2 we evaluated the ability of our mixtures to capture the empirical skewness and kurtosis of recorded neurons, and found that the CoM-based mixture performs quite well (r2 for the CoM-Based mixture was between 0.6 and 0.9). Because FA cannot capture these higher-order moments, we speculate that modelling these higher-order moments is critical for maximizing decoding performance. This adds another perspective on the strengths of our approach, and we appreciate the suggestion.

      Reviewer #3 (Public Review):

      The authors use multivariate mixtures of Poisson or Conway-Maxwell-Poisson distributions to model neural population activity. They derive an EM algorithm, a formula for Fisher information, and a Bayesian decoder for such models, and show it is competitive with other methods such as ANNs. The paper is clear and didactically written, and I learned a lot from reading it. Other than a few typos the math and analyses appear to be correct.

      Thank you for the positive evaluation!

      Nevertheless there are some ways the study could be further improved.

      Most important, code for performing these analyses needs to be publicly released. The EM algorithm is complicated, involving a gradient optimization on each iteration - it is very unlikely people will rewrite this themselves, so unless the authors release well-packaged and well-documented code, their impact will be limited.

      We very much agree, and we have done this. We provide a link to our gitlab page, where all relevant code can be downloaded, and installation instructions are provided (we indicate this in the manuscript at lines 799–803).

      Second, it would be nice to extend the model to continuous latent factors. It seems likely that one or two latent factors could do the work of many mixture components, as well as increasing the interpretability of the models.

      We certainly agree that in some cases continuous latent variables could be much more parsi- monious. However, to the best of our knowledge most of the expressions that we rely on would no longer be closed-form, and so the machinery of the model would require suitable approximations. Nevertheless, it’s an interesting possibility that we now address in the Discussion (lines 482–491).

      Third, it would be interesting to see the models applied to more diverse types of population data (for example hippocampal place field recordings).

      We certainly agree with the importance of applying our model to other datasets, and indeed the purpose of our manuscript is to offer a method that can be applied broadly, and our goal in making the code available publicly is to facilitate that. However, we have decided to maintain the focus of this manuscript on the method itself, and limit the application to one kind of data (V1), for which we also now provide more extensive analysis and quantification of the response statistics (Figure 2 C-D, Figure 3 G-H, Appendix 2), a study of the sample sizes required to fit the model (Appendix 3), and model-comparison (Appendix 1–2). Overall we feel that the paper is already quite long and dense even when limited to a single kind of data. We believe applications to multiple kinds of data would perhaps be better suited for a different study, focusing on the comparisons between them. In that regard, we are certainly open to future collaborations on large-scale recordings from various stimulus-driven brain areas.

      Fourth, how does a user choose how many mixture components to add?

      To clarify this, we’ve added a section in the methods (Strategies for choosing the CM form and latent structure), and in particular the number of mixture components.

    1. Author Response:

      Reviewer #2 (Public Review):

      [...] 1) A weakness of the paper is the disruption of the complex during cryoEM grid preparation resulting in about half of the observed particles missing the membrane arm and likely also contributing to the disorder and biased orientation seen in the intact complexes. This leads to poor density in the membrane arm for all of the intact complex I structures presented and large variations in the local resolution of the membrane arm focused refinement.

      Purified E. coli complex I has always been known to be labile in particularly at the junction of peripheral and membrane arms (https://pubmed.ncbi.nlm.nih.gov/12637579/).

      Air-water interface likely plays a role in disrupting the complex in addition to other possible causes. Indeed, the dissociated arms, preferred particle orientation, and low protein concentration (~0.1 mg/ml) used to produce grids with high particle density all indicate that reconstituted complex I does interact with air-water interface. While disruption and denaturation of protein complexes on air-water interface has been well documented, (https://pubmed.ncbi.nlm.nih.gov/3043536/, https://pubmed.ncbi.nlm.nih.gov/30932812/ ), we are not aware of examples where air-water interfaces caused higher mobility of a complex or induced a stable conformation, different from the one in bulk solution. Therefore, we think that air-ware interface is neither the cause of the observed high arms mobility nor of their relative rotation.

      Preferential orientation was observed in the cryo-EM studies of most complex I homologs (Gutiérrez-Fernández et al., 2020; Parey et al., 2019; Zhu et al., 2016) as well as of other proteins, suggesting that adsorption of complex I on air water interface is a common phenomenon. In this case it is not clear why relative movement of the arms observed in all the structurally characterized complex I homologs is not due to the air-water interface, but in the case of E. coli complex it is.

      To provide additional support to our interpretation of the structural data we purified complex I in detergent LMNG, showed that it catalyzes redox reactions and solved its structure to resolution of 6.7 Å (Figure 6 and corresponding figure supplements). Because cryo-EM grids had to be prepared at a protein concentration of 2-3 mg/ml and the particles displayed nearly homogeneous distribution of orientations, we conclude that the interaction with the air-water interface was reduced. Still, the complex assumes a very similar, or even somewhat more uncoupled conformation and the relative mobility of the arms remained comparable to that in the nanodisc-reconstituted complex reconstructions. These data allow us to rule out the air-water interface and reconstitution of the protein into lipid nanodiscs as the possible causes of the high mobility and the unusual relative position of the arms.

      The corresponding modifications were added to the manuscript on lines 372-382:

      “To better understand the reasons for the observed uncoupled conformation and the missing density for HTMH1, we purified E. coli complex I in detergent LMNG, showed that it can catalyze redox reactions (Figure 6 - figure supplement 1) and solved its structure to resolution of 6.7 Å (Figure 6 - figure supplement 2). The detergent-solubilized complex also displays high relative mobility of the arms (Figure 6 - figure supplement 3) and has uncoupled conformation (Figure 6). Its peripheral arm is rotated even further away from the expected coupled state position than in the nanodisc-reconstituted structures. Both the cryo-EM sample preparation conditions and more homogeneous distribution of particle orientations indicate that interaction of the complex with air-water interface was significantly reduced when compared with the complex in nanodiscs. This allows us to conclude that neither air-water interface nor reconstitution into nanodiscs cause the uncoupled conformations.”

      It is not very clear what referee means by “poor”, when referring to the focused density of the membrane arm. The density corresponds well to the reported resolution of 3.7 Å. Indeed, it is in a stark contrast with the quality of the density obtained for the peripheral arm at 2.1 Å resolution. Given high mobility of the membrane arm it had to be refined essentially independently of the peripheral arm which remains still challenging for a ~200 kDa membrane protein without water-soluble domains in lipid nanodiscs. The density is heterogeneous as clearly stated at the beginning of the section “Structure of membrane arm” from line 264:

      “The model of complete membrane arm, including the previously missing subunit NuoH (Efremov and Sazanov, 2011), was built into the density map with local resolution better than 3.5 Å at the arm center and approximately 4.0 Å at its periphery (Figure 1A, Figure 1 - figure supplement 4).”

      Finally, for most complex I homologs the resolution was gradually improved over several years, as reflected in multiple publications of essentially the same structures. In contrast, no high-resolution structure information was available for the intact E. coli complex I until now. Therefore, it would be unreasonable to expect the complete structure to be solved at resolution of 2 Å at once.

      The resolution of the membrane domain in reconstructions of complete complex I is indeed lower due to high flexibility of the complex and the fact that refinement naturally focuses on more stable peripheral arm that does not have heterogeneous nanodisc around and that contains Fe-S clusters enhancing particle alignment power. Still, these conformations clearly resolve the interface between subunits albeit at lower resolution.

      This fact was also clearly stated at the beginning of results section lines 102-106:

      “Three conformations of the entire complex were reconstructed to average resolutions between 3.3 and 3.7 Å (Figure 1 - figure supplement 4) resolving the interface between the arms; however, due to high-residual mobility of the arms, the antiporter-like subunits were resolved at below 8 Å (Figure 1 - figure supplement 4).”

      2) A weakness of the paper is the disorder of important functional regions of the complex, namely the NuoH TMH1, whose disorder is unique to these nanodisc E. coli structures, and the NuoA TMH1-TMH2 loop. As the NuoH TMH1 forms part of the entry to the quinone tunnel of the complex, its absence in the structure leads to concerns regarding the function of the nanodisc preparation. Its absence it curious as this suggests flexibility of the helix, as pointed out by the authors, but the authors also state that there is not enough room in the nanodisc to accommodate this helix (given the visible density for the lipid and membrane scaffold protein). These observations suggest denaturation or unfolding in this region of the complex as opposed to simple flexibility.

      According to the usual definition of complex I activity our preparation in nanodiscs is active. We complemented our data with additional measurements and included NADH:DQ assays (see next point) that also indicate that our preparation is active. Additional 3D reconstruction of E. coli complex I that we obtained for protein solubilized in LMNG does resolve HTMH1 and its environment appears to be more similar to other detergent-solubilized structures of complex I homologues. At the same time, the helices around HTMH1 appear to be more tightly packed and more curved than in the nanodiscs which may reflect suppressed dynamics and distorted protein conformation. Most importantly, the overall conformation of the complex remains nearly the same and still corresponds to what we call the uncoupled conformation. That of course does not allow us to say where HTMH1 is positioned within the nanodisc, but it does enable us to conclude that the local changes in the vicinity of HTMH1 do not influence the global conformation of the complex.

      The additional structure is not described on lines 383-388:

      “The HTMH1 helix is resolved in the detergent-solubilized complex (Figure 6A). Its density is weaker than that of the surrounding helices and it is strongly bent (Figure 6B). Simultaneously, HAH1 takes the conformation resembling other complex I homologs while ATMH1 bends towards the arm core. The arrangement of helices in detergent-solubilized reconstruction appears to be more compact and more bent than in the lipid environment which may restrain the otherwise more flexible HTMH1.”

      In the revised discussion the environment of HTMH1 is described more clearly on lines 426-433:

      “The absence of HTMH1 density in nanodiscs, but not in detergent, is another unique feature of E. coli complex I. HTMH1 is exposed to the lipid environment and the width of the nanodisc next to HTMH1 is similar to other regions around the membrane arm (Movie 1). Moreover, homology modelled HHTM1 fits the empty space without steric clashes suggesting that HHTM1 is dynamic rather than displaced or unfolded. By comparing the detergent-solubilized and reconstituted complexes we can conclude that position and dynamics of this helix is neither the cause of the uncoupled conformation nor of the high relative mobility of the arms.”

      Disorder of ATMH1-TMH2 loop is not unique to E. coli complex I but also observed in some conformations of ovine complex I PDB 6zkd, 6zke, 6zkf.

      3) Unfortunately, the NADH:Q1 functional data do not fully address these concerns at Q1 is far more soluble that the native Q8 substrate of the complex. Although the Q1 activity is sensitive to the inhibitor Piericidin A, which clearly demonstrates that the Q1 reduction is occurring in the native quinone binding site as Piericidin A binds specifically at that site, this does not preclude the possibility of Q1 accessing this binding site via a different path. In fact, the structures indicate that given the flexibility in the connection between peripheral and membrane arms of the complex, the quinone binding site is likely open to the cytoplasm. This leads the authors themselves to conclude that the structures presented are likely disrupted/uncoupled states in which the energy converting mechanism of the complex is not likely possible.

      To address the raised concern, we have measured the activity of complex I in nanodiscs with less soluble decylubiquinone (DQ) as well as its inhibition. Small amounts of LMNG was used to increase the DQ solubility. Our results have confirmed that E. coli complex I in nanodiscs is active and the NADH:DQ activity is sensitive to piericidin A (see the modified Figure 1-figure supplement 2). We have also remeasured the Q1 activity and its inhibition which showed lower values than previously, due to a flaw in the activity measurements reported in the original submission (qualitatively, the results remained unchanged). Moreover, we have observed a similar activity results with somewhat higher values for E. coli complex I in LMNG (Figure 6-figure supplement 1). These data demonstrate that in the reconstituted complex I quinone analogues can enter the Q-site through the membrane. It is worth noting that due to extremely low solubility of longer quinones, including native ones, they are not used for activity measurements in purified preparations.

      Regarding the complex I conformation, we do think our reconstruction represents uncoupled state which is not able to pump protons (as states in the title). We have improved the clarity of this point throughout the manuscript including the discussion lines starting from the line 412.

      “The high mobility of the interfacial regions and the relative rotation of the arms disrupts conserved interfacial interactions and exposes Q-cavity to the solvent (Figure 5A). This differentiates E. coli complex I from its structurally characterized homologs in which the Q-cavity is sealed from the solvent. Thus, we interpret the observed conformation as an uncoupled state.”

      And from line 469: “We also observed the relative rotation of the membrane and peripheral arms disrupting the conserved interface and trapping the complex in an uncoupled conformation. Whether this conformation is biologically relevant or is a result of protein purification is to be clarified by further research.”

      4) A weakness of the paper is the building of atomic models into regions of the map which do not contain sufficient detail to warrant atomic models. This is particularly the case for the intact models of complex I as well as the membrane arm focused maps and results in low map-model correlations (0.58-0.71). The models were clearly highly restrained during refinement, resulting in good geometry, as is necessary for low resolution regions. But being able to restrain the geometry is not sufficient for placing atoms into regions where the density is weak or absent. If additional information was used in building/constraining the model, such as the X-ray structure, the regions of the model that are biased towards the X-ray structure model needs to be made clearer. Also, in several places in the membrane arm map residues bulge out of the density (side chain and main chain) leading to possible frame shifts with respect to the match between subsequent residues in the model and the map (see NuoM Ile168 for example).

      A large part of the membrane domain has been solved using X-ray crystallography to resolution of 3.0 Å which was used as a starting model for model building, therefore we don’t think there are register shifts in our model. We used standard setting for model refinement in phenix_refine. Our building and refinement procedure has been described in fine details in the original submission, see from line 674:

      “For the membrane domain, the previously obtained E. coli model (PDB ID: 3RKO) was real-space-refined in PHENIX. The missing NuoH subunit was homology-modelled using the T. thermophilus structure (PDB ID: 4HEA) in Coot 0.9. The final model was obtained after several rounds of manual rebuilding and real-space refinement using standard parameters with Ramachandran restrains, secondary-structure restrains applied to the NuoL TMH9-13, without ADP restrains, and with the optimized nonbonded_weight parameter. To generate the model of the complete complex I, the separate peripheral and membrane arm structures were combined and the missing parts at the interface (Table 2) were built manually. As the density of NuoL and NuoM was very poor in all the resolved full conformations, these subunits were subjected to rigid-body refinement in PHENIX, whereas the others were subjected to real-space refinement with minimization_global, local_grid_search, morphing, and ADP refinement. Ramachandran, ADP, and secondary-structure restrains were used. After manual rebuilding in Coot, real-space refinement of the full complex was performed with standard parameters and restrains.”

      To improve clarity, we added a following sentence to the Results section from line 116:

      “Using the resulting maps, atomic models of the peripheral and membrane arms have been built. The entire E. coli complex I was modelled by fitting models of the arms and extending additionally resolved loops and termini. Due to limited resolution, the antiporter-like subunits were refined as rigid bodies.”

      The model has been improved and side chains with absent density were truncated to C position.

      The density for focused refinement density of the membrane fragment is relatively week, but of sufficient quality to allow building side chains for most of the map. It even visualizes lipid densities (not described in the manuscript). Such weaker densities are common for small membrane proteins. While fully usable for model building, they naturally result in lower model map FSC and consequently, in lower real-space correlation. In addition, real-space correlation is lower when the map is heterogeneous, and it strongly depends on the way the heterogeneous map has been filtered. Therefore, lower cross correlations do not necessarily mean that the model fit is poor. In our case they reflect weaker signal to noise of the density. Model-map FSCs (Figure 1 figure supplement 4) are more informative than a single number and show that model-map cross correlations remain above 0.5 for the complete resolution range for all models.

      5) A weakness of the paper is that several specific claims are made about the positions of side chains but, when investigated, the density for those side chains is poorly resolved. An example of this is NuoH Lys274, which is in a low-resolution region of the map and although is fit as well as possible must be considered low confidence given the local resolution (nearby residues Phe277 and Phe282 have almost no side chain density for example).

      At lower resolution, a presence of residues density strongly depends on their mobility. Well-ordered residues may have well-defined densities while others, even in the proximity, may have a poor density. In the case of Lys274, there is a clear density for the side chain, its position makes chemical sense, and it is hydrogen-bonded to the backbone oxygen of Gly258. In fact, if examined closely, this is also the only meaningful position for Lys274 side chain. At the same time, the conformations of Phe277 and Phe282 are not restrained by interactions with other residues in their vicinity which is likely why their densities are weaker.

      6) A weakness of the paper is that the conformational changes seen between the membrane and peripheral arm of the complex in the different 3D classes are difficult to interpret. It is unclear if they are mechanistically significant or, perhaps more likely given the amount of broken complex observed, due to partial disruption of the complex before it completely breaks apart.

      As we discussed above, the observed multiple conformations are not due to the complex disruption. It is not very clear what the reviewer means by ‘difficult to interpret’. Many conformations of the peripheral and membrane arms observed for the complex I homologues are likely not mechanistically meaningful per see, but rather reflect overall flexibility of such a large complex. Here, our goal was to describe our structural data as accurately as possible which resulted in several resolved conformations.

      We do think they all represent the uncoupled complex I, in this respect they do not have different mechanistic meanings. However, they do permit us to understand how the arms move relative to each other and what degree of freedom exists between them.

      7) A strength of the paper is the interesting and original mechanistic proposal put forward by the authors. But a weakness is that it is unclear how this proposal stems from the structural data presented. Also, the arguments presented are difficult to follow in their current form and warrant a more detailed discussion with the requisite thermodynamic treatment. This may warrant a more complete discussion in an appendix or unless the authors can more convincingly show how the data presented in the paper suggests their proposed mechanism perhaps a separate review article. Furthermore, the proposed mechanism, as presented would make a simple prediction that in the absence of NuoM and NuoL (or equivalent subunits in other species) complex I would not pump any net protons. Experiments that are relevant to this prediction have been done in E. coli (NuoL deletion) and Y. lipolytica (nb8m deletion that results in loss of both NuoM and NuoL subunits). See https://pubmed.ncbi.nlm.nih.gov/21417432/ and https://pubmed.ncbi.nlm.nih.gov/21886480/. In both cases the complex is still able to pump protons. The behavior of the NuoL deletion in E. coli is reconcilable with their proposed mechanism as NuoM is still present, however, the case of the nb8m deletion in Y. lipolytica is more difficult to reconcile with their proposed mechanism. The authors would need to address these experiments in order to include their proposed mechanism.

      The description of the mechanism has been modified. It is very briefly outlined in the main text along with the Figure 7 and more detailed description, including thermodynamic considerations, is moved to the supplementary text. We have also explained more clearly how the model stems from the experimental data on line 435:

      “The absence of a continuous proton-translocation pathway between the Q-site and subunit NuoN, as well as high flexibility of the peripheral arm interface are not consistent with the recently proposed coupling mechanisms relying on specific movements of the interfacial loops (Cabrera-Orefice et al., 2018; Kampjut and Sazanov, 2020). This led us to ask whether a coupling mechanism consistent with known complex I properties, but without the movements of interfacial loops is conceivable.”

      Furthermore, we state that at this point this is a hypothetical mechanism.

      Supplementary data describing mechanism in more details now also includes the discussion of both papers mentioned by the reviewer from line 1368.

      “Experiments with engineering E. coli complex I lacking subunit NuoL and Y. lipolytica complex I lacking homologs of subunits NuoM and NuoL (Dröse et al., 2011; Steimle et al., 2011)(Dröse et al., 2011; Steimle et al., 2011) (correspond to n=2 and 1, respectively) both suggested that the engineered complexes were active and for both constructs stoichiometry was estimated as 2H+/2e-. While NuoL deletion experiments support our model, the NuoL/M deletion clearly contradicts it. Both experiments should be interpreted cautiously, however. Results of NuoL deletion for E. coli complex I were not reproducible (Verkhovskaya and Bloch, 2012). In the case of Y. lipolytica, the homologs of NuoL/M dissociated from the complex along with another 11 subunits upon deletion of supernumerary subunit NB8M located at the tip of NuoL (Zickermann et al., 2015). Since the proton-translocating modules were not deleted per se, the presence of contaminating amounts of assembled complex I in the preparations that generated observed proton pumping cannot be completely excluded. It is important to note that mutation of the conserved ionizable residues on the interface between NuoN and NuoM, i.e. ME144 (Torres-Bacete et al., 2007) or its counter ion NK395 (Amarneh and Vik, 2003), result in a completely inactive complex I suggesting that dissociation of subunits NuoL/M also should render complex I inactive (Verkhovskaya and Bloch, 2012).”

      The main problem with these experiments that that they have never been reproduced by other laboratories and are not completely consistent with the mutagenesis data. Deletion of subunits may also result in distinct pumping behavior of the remaining subcomplex. For example, it was shown for the bovine complex I that it can translocate Na+ ions in the deactive state (https://pubmed.ncbi.nlm.nih.gov/22854968/).

      Appraisal of whether the authors achieved their aims, and whether the results support their conclusions:

      8) Overall, despite the many strengths of this paper detailed above it is unclear whether the authors achieved their goal of a structure of functional E. coli respiratory complex I reconstituted in lipid nano-discs. It appears that under the current grid preparation conditions that the complex is under excessive stress resulting in partial denaturation and partial-to-complete dissociation. Given the clear biophysical data presented on the intactness of the complex in solution, this disruption likely occurs during grid preparation and further optimization of grid conditions may resolve this issue. With the current maps more work needs to be done to improve the map-to-model correlation and to clearly indicate the regions in the models where this correlation is low.

      Additional reconstruction of complex I solubilized in LMNG help us to exclude the interaction of the complex with water-air interface and its reconstitution into lipid nanodiscs as the causes of the relative subunit rotation and high flexibility between the arms. At this moment, whether the structure represents an artifact of purification or is a biologically-relevant state remains an open question. However, answering it goes beyond the current study and will require additional research. This is now explained in the discussion section.

    1. Author Response:

      Reviewer #1 (Public Review):

      [...] The major limitation of the manuscript lies in the framing and interpretation of the results, and therefore the evaluation of novelty. Authors claim for an important and unique role of beliefs-of-other-pain in altruistic behavior and empathy for pain. The problem is that these experiments mainly show that behaviors sometimes associated with empathy-for-pain can be cognitively modulated by changing prior beliefs. To support the notion that effects are indeed relating to pain processing generally or empathy for pain specifically, a similar manipulation, done for instance on beliefs about the happiness of others, before recording behavioural estimation of other people's happiness, should have been performed. If such a belief-about-something-else-than-pain would have led to similar results, in terms of behavioural outcome and in terms of TPJ and MFG recapitulating the pattern of behavioral responses, we would know that the results reflect changes of beliefs more generally. Only if the results are specific to a pain-empathy task, would there be evidence to associate the results to pain specifically. But even then, it would remain unclear whether the effects truly relate to empathy for pain, or whether they may reflect other routes of processing pain.

      We thank Reviewer #1's for these comments/suggestions regarding the specificity of belief effects on brain activity involved in empathy for pain. Our paper reported 6 behavioral/EEG/fMRI experiments that tested effects of beliefs of others’ pain on empathy and monetary donation (an empathy-related altruistic behavior). We showed not only behavioral but also neuroimaging results that consistently support the hypothesis of the functional role of beliefs of others' pain in modulations of empathy (based on both subjective and objective measures as clarified in the revision) and altruistic behavior. We agree with Reviewer 1# that it is important to address whether the belief effect is specific to neural underpinnings of empathy for pain or is general for neural responses to various facial expressions such as happy, as suggested by Reviewer #1. To address this issue, we conducted an additional EEG experiment (which can be done in a limited time in the current situation), as suggested by Reviewer #1. This new EEG experiment tested (1) whether beliefs of authenticity of others’ happiness influence brain responses to perceived happy expressions; (2) whether beliefs of happiness modulate neural responses to happy expressions in the P2 time window as that characterized effects of beliefs of pain on ERPs.

      Our behavioral results in this experiment (as Supplementary Experiment 1 reported in the revision) showed that the participants reported less feelings of happiness when viewing actors who simulate others' smiling compared to when viewing awardees who smile due to winning awards (see the figure below). Our ERP results in Supplementary Experiment 1 further showed that lack of beliefs of authenticity of others’ happiness (e.g., actors simulate others' happy expressions vs. awardees smile and show happy expressions due to winning an award) reduced the amplitudes of a long-latency positive component (i.e., P570) over the frontal region in response to happy expressions. These findings suggest that (1) there are possibly general belief effects on subjective feelings and brain activities in response to facial expressions; (2) beliefs of others' pain or happiness affect neural responses to facial expressions in different time windows after face onset; (3) modulations of the P2 amplitude by beliefs of pain may not be generalized to belief effects on neural responses to any emotional states of others. We reported the results of this new ERP experiment in the revision as Supplementary Experiment 1 and also discussed the issue of specificity of modulations of empathic neural responses by beliefs of others' pain in the revised Discussion (page 49-50).

      Figure *Supplementary Experiment Figure 1. EEG results of Supplementary Experiment 1. (a) Mean rating scores of happy intensity related to happy and neutral expressions of faces with awardee or actor/actress identities. (b) ERPs to faces with awardee or actor/actress identities at the frontal electrodes. The voltage topography shows the scalp distribution of the P570 amplitude with the maximum over the central/parietal region. (c) Mean differential P570 amplitudes to happy versus neutral expressions of faces with awardee or actor/actress identities. The voltage topographies illustrate the scalp distribution of the P570 difference waves to happy (vs. neutral) expressions of faces with awardee or actor/actress identities, respectively. Shown are group means (large dots), standard deviation (bars), measures of each individual participant (small dots), and distribution (violin shape) in (a) and (c).*

      In the revised Introduction we cited additional literatures to explain the concept of empathy, behavioral and neuroimaging measures of empathy, and how, similar to previous research, we studied empathy for others' pain using subjective (self reports) and objective (brain responses) estimation of empathy (page 6-7). In particular, we mentioned that subjective estimation of empathy for pain depends on collection of self-reports of others' pain and ones' own painful feelings when viewing others' suffering. Objective estimation of empathy for pain relies on recording of brain activities (using fMRI, EEG, etc.) that differentially respond to painful or non-painful stimuli applied to others. fMRI studies revealed greater activations in the ACC, AI, and sensorimotor cortices in response to painful or non-painful stimuli applied to others. EEG studies showed that event-related potentials (ERPs) in response to perceived painful stimulations applied to others' body parts elicited neural responses that differentiated between painful and neutral stimuli over the frontal region as early as 140 ms after stimulus onset (Fan and Han, 2008; see Coll, 2018 for review). Moreover, the mean ERP amplitudes at 140–180 ms predicted subjective reports of others' pain and ones' own unpleasantness. Particularly related to the current study, previous research showed that pain compared to neutral expressions increased the amplitude of the frontal P2 component at 128–188 ms after stimulus onset (Sheng and Han, 2012; Sheng et al., 2013; 2016; Han et al., 2016; Li and Han, 2019) and the P2 amplitudes in response to others' pain expressions positively predicted subjective feelings of own unpleasantness induced by others' pain and self-report of one's own empathy traits (e.g., Sheng and Han, 2012). These brain imaging findings indicate that brain responses to others' pain can (1) differentiate others' painful or non-painful emotional states to support understanding of others' pain and (2) predict subjective feelings of others' pain and one's own unpleasantness induced by others' pain to support sharing of others' painful feelings. These findings provide effective subjective and objective measures of empathy that were used in the current study to investigate neural mechanisms underlying modulation of empathy and altruism by beliefs of others’ pain.

      In addition, we took Reviewer #1’s suggestion for VPS analyses which examined specifically how neural activities in the empathy-related regions identified in the previous research (Krishnan et al., 2016, eLife) were modulated by beliefs of others’ pain. The results (page 40) provide further evidence for our hypothesis. We also reported new results of RSA analyses(page 39) that activities in the brain regions supporting affective sharing (e.g., insula), sensorimotor resonance (e.g., post-central gyrus), and emotion regulation (e.g., lateral frontal cortex) provide intermediate mechanisms underlying modulations of subjective feelings of others' pain intensity due to lack of BOP. We believe that, putting all these results together, our paper provides consistent evidence that empathy and altruistic behavior are modulated by BOP.

      Reviewer #2 (Public Review):

      [...] 1. In laying out their hypotheses, the authors write, "The current work tested the hypothesis that BOP provides a fundamental cognitive basis of empathy and altruistic behavior by modulating brain activity in response to others' pain. Specifically, we tested predictions that weakening BOP inhibits altruistic behavior by decreasing empathy and its underlying brain activity whereas enhancing BOP may produce opposite effects on empathy and altruistic behavior." While I'm a little dubious regarding the enhancement effects (see below), a supporting assumption here seems to be that at baseline, we expect that painful expressions reflect real pain experience. To that end, it might be helpful to ground some of the introduction in what we know about the perception of painful expressions (e.g., how rapidly/automatically is pain detected, do we preferentially attend to pain vs. other emotions, etc.).

      Thanks for this suggestion! We included additional details about previous findings related to processes of painful expressions in the revised Introduction (page 7-8). Specifically, we introduced fMRI and ERP studies of pain expressions that revealed structures and temporal procedure of neural responses to others' pain (vs. neutral) expressions. Moreover, neural responses to others' pain (vs. neutral) expressions were associated with self-report of others' feelings, indicating functional roles of pain-expression induced brain activities in empathy for pain.

      1. For me, the key takeaway from this manuscript was that our assessment of and response to painful expressions is contextually-sensitive - specifically, to information reflecting whether or not targets are actually in pain. As the authors state it, "Our behavioral and neuroimaging results revealed critical functional roles of BOP in modulations of the perception-emotion-behavior reactivity by showing how BOP predicted and affected empathy/empathic brain activity and monetary donations. Our findings provide evidence that BOP constitutes a fundamental cognitive basis for empathy and altruistic behavior in humans." In other words, pain might be an incredibly socially salient signal, but it's still easily overridden from the top down provided relevant contextual information - you won't empathize with something that isn't there. While I think this hypothesis is well-supported by the data, it's also backed by a pretty healthy literature on contextual influences on pain judgments (including in clinical contexts) that I think the authors might want to consider referencing (here are just a few that come to mind: Craig et al., 2010; Twigg et al., 2015; Nicolardi et al., 2020; Martel et al., 2008; Riva et al., 2015; Hampton et al., 2018; Prkachin & Rocha, 2010; Cui et al., 2016).

      Thanks for this great suggestion! Accordingly, we included an additional paragraph in the revised Discussion regarding how social contexts influence empathy and cited the studies mentioned here (page 46-47).

      1. I had a few questions regarding the stimuli the authors used across these experiments. First, just to confirm, these targets were posing (e.g., not experiencing) pain, correct? Second, the authors refer to counterbalancing assignment of these stimuli to condition within the various experiments. Was target gender balanced across groups in this counterbalancing scheme? (e.g., in Experiment 1, if 8 targets were revealed to be actors/actresses in Round 2, were 4 female and 4 male?) Third, were these stimuli selected at random from a larger set, or based on specific criteria (e.g., normed ratings of intensity, believability, specificity of expression, etc.?) If so, it would be helpful to provide these details for each experiment.

      We'd be happy to clarify these questions. First, photos of faces with pain or neutral expressions were adopted from the previous work (Sheng and Han, 2012). Photos were taken from models who were posing but not experience pain. These photos were taken and selected based on explicit criteria of painful expressions (i.e., brow lowering, orbit tightening, and raising of the upper lip; Prkachin, 1992). In addition, the models' facial expressions were validated in independent samples of participants (see Sheng and Han, 2012). Second, target gender was also balanced across groups in this counterbalancing scheme. We also analyzed empathy rating score and monetary donations related to male and female target faces and did not find any significant gender effect (see our response to Point 5 below). Third, because the face stimuli were adopted from the previous work and the models' facial expressions were validated in independent samples of participants regarding specificity of expression, pain intensity, etc (Sheng and Han, 2012), we did not repeat these validation in our participants. Most importantly, we counterbalanced the stimuli in different conditions so that the stimuli in different conditions (e.g., patient vs. actor/actress conditions) were the same across the participants in each experiment. The design like this excluded any potential confound arising from the stimuli themselves.

      1. The nature of the charitable donation (particularly in Experiment 1) could be clarified. I couldn't tell if the same charity was being referenced in Rounds 1 and 2, and if there were multiple charities in Round 2 (one for the patients and one for the actors).

      Thanks for this comment! Yes, indeed, in both Rounds 1 and 2, the participants were informed that the amount of one of their decisions would be selected randomly and donated to one of the patients through the same charity organization (we clarified these in the revised Method section, page 55-56). We made clear in the revision that after we finished all the experiments of this study, the total amount of the participants' donations were subject to a charity organization to help patients who suffer from the same disease after the study.

      1. I'm also having a hard time understanding the authors' prediction that targets revealed to truly be patients in the 2nd round will be associated with enhanced BOP/altruism/etc. (as they state it: "By contrast, reconfirming patient identities enhanced the coupling between perceived pain expressions of faces and the painful emotional states of face owners and thus increased BOP.") They aren't in any additional pain than they were before, and at the outset of the task, there was no reason to believe that they weren't suffering from this painful condition - therefore I don't see why a second mention of their pain status should increase empathy/giving/etc. It seems likely that this is a contrast effect driven by the actor/actress targets. See the Recommendations for the Authors for specific suggestions regarding potential control experiments. (I'll note that the enhancement effect in Experiment 2 seems more sensible - here, the participant learns that treatment was ineffective, which may be painful in and of itself.)

      Thanks for comments on this important point! Indeed, our results showed that reassuring patient identities in Experiment 1 or by noting the failure of medical treatment related to target faces in Experiment 2 increased rating scores of others' pain and own unpleasantness and prompted more monetary donations to target faces. The increased empathy rating scores and monetary donations might be due to that repeatedly confirming patient identity or knowing the failure of medical treatment increased the belief of authenticity of targets' pain and thus enhanced empathy. However, repeatedly confirming patient identity or knowing the failure of medical treatment might activate other emotional responses to target faces such as pity or helplessness, which might also influence altruistic decisions. We agree with Reviewer #2 that, although our subjective estimation of empathy in Exp. 1 and 2 suggested enhanced empathy in the 2nd_round test, there are alternative interpretations of the results and these should be clarified in future work. We clarified these points in the revised Discussion (page 41-42).

      1. I noted that in the Methods for Experiment 3, the authors stated "We recruited only male participants to exclude potential effects of gender difference in empathic neural responses." This approach continues through the rest of the studies. This raises a few questions. Are there gender differences in the first two studies (which recruited both male and female participants)? Moreover, are the authors not concerned about target gender effects? (Since, as far as I can tell, all studies use both male and female targets, which would mean that in Experiments 3 and on, half the targets are same-gender as the participants and the other half are other-gender.) Other work suggests that there are indeed effects of target gender on the recognition of painful expressions (Riva et al., 2011).

      Thanks for raising this interesting question! Therefore, we reanalyzed data in Exp. 1 by including participants' gender or face gender as an independent variable. The three-way ANOVAs of pain intensity scores and amounts of monetary donations with Face Gender (female vs. male targets) × Test Phase (1st vs. 2nd_round) × Belief Change (patient-identity change vs. patient-identity repetition) did not show any significant three-way interaction (F(1,59) = 0.432 and 0.436, p = 0.514 and 0.512, ηp2 = 0.007 and 0.007, 90% CI = (0, 0.079) and (0, 0.079), indicating that face gender do not influence the results (see the figure below). Similarly, the three-way ANOVAs with Participant Gender (female vs. male participants) × Test Phase × Belief Change did not show any significant three-way interaction (F(1,58) = 0.121 and 1.586, p = 0.729 and 0.213, ηp2 = 0.002 and 0.027, 90% CI = (0, 0.055) and (0, 0.124), indicating no reliable difference in empathy and donation between men and women. It seems that the measures of empathy and altruistic behavior in our study were not sensitive to gender of empathy targets and participants' sexes.

      image Figure legend: (a) Scores of pain intensity and amount of monetary donations are reported separately for male and female target faces. (b) Scores of pain intensity and amount of monetary donations are reported separately for male and female participants.

      1. I was a little unclear on the motivation for Experiment 4. The authors state "If BOP rather than other processes was necessary for the modulation of empathic neural responses in Experiment 3, the same manipulation procedure to assign different face identities that do not change BOP should change the P2 amplitudes in response to pain expressions." What "other processes" are they referring to? As far as I could tell, the upshot of this study was just to demonstrate that differences in empathy for pain were not a mere consequence of assignment to social groups (e.g., the groups must have some relevance for pain experience). While the data are clear and as predicted, I'm not sure this was an alternate hypothesis that I would have suggested or that needs disconfirming.

      Thanks for this comment! We feel sorry for not being able to make clear the research question in Exp. 4. In the revised Results section (page 27-28) we clarified that the learning and EEG recording procedures in Experiment 3 consisted of multiple processes, including learning, memory, identity recognition, assignment to social groups, etc. The results of Experiment 3 left an open question of whether these processes, even without BOP changes induced through these processes, would be sufficient to result in modulation of the P2 amplitude in response to pain (vs. neutral) expressions of faces with different identities. In Experiment 4 we addressed this issue using the same learning and identity recognition procedures as those in Experiment 3 except that the participants in Experiment 4 had to learn and recognize identities of faces of two baseball teams and that there is no prior difference in BOP associated with faces of beliefs of the two baseball teams. If the processes involved in the learn and reorganization procedures rather than the difference in BOP were sufficient for modulation of the P2 amplitude in response to pain (vs. neutral) expressions of faces, we would expect similar P2 modulations in Experiments 4 and 3. Otherwise, the difference in BOP produced during the learning procedure was necessary for the modulation of empathic neural responses, we would not expect modulations of the P2 amplitude in response to pain (vs. neutral) expressions in Experiment 4. We believe that the goal and rationale of Exp. 4 are clear now.

    2. Reviewer #2 (Public Review):

      The authors performed six experiments examining the influence of beliefs regarding pain experience on behavioral and neural indices of empathy for pain and altruistic behavior. They demonstrate that manipulations that to reduce beliefs that individuals making painful expressions are actually in pain (e.g., revealing them to be actors, indicating that their treatment has been successful, etc.) attenuates subjective judgments of pain intensity, real monetary donations to these targets, and P2 amplitudes, and further, that regions involved in perspective-taking and emotion regulation are sensitive to representations of pain beliefs. While I think that the authors have done an admirable job in laying out the evidence for their argument across six well-devised experiments, I do think that the manuscript has some room for improvement. In particular, I hope that the authors can offer stronger grounding in the background literature and clarify some task and stimulus details.

      1. In laying out their hypotheses, the authors write, "The current work tested the hypothesis that BOP provides a fundamental cognitive basis of empathy and altruistic behavior by modulating brain activity in response to others' pain. Specifically, we tested predictions that weakening BOP inhibits altruistic behavior by decreasing empathy and its underlying brain activity whereas enhancing BOP may produce opposite effects on empathy and altruistic behavior." While I'm a little dubious regarding the enhancement effects (see below), a supporting assumption here seems to be that at baseline, we expect that painful expressions reflect real pain experience. To that end, it might be helpful to ground some of the introduction in what we know about the perception of painful expressions (e.g., how rapidly/automatically is pain detected, do we preferentially attend to pain vs. other emotions, etc.).

      2. For me, the key takeaway from this manuscript was that our assessment of and response to painful expressions is contextually-sensitive - specifically, to information reflecting whether or not targets are actually in pain. As the authors state it, "Our behavioral and neuroimaging results revealed critical functional roles of BOP in modulations of the perception-emotion-behavior reactivity by showing how BOP predicted and affected empathy/empathic brain activity and monetary donations. Our findings provide evidence that BOP constitutes a fundamental cognitive basis for empathy and altruistic behavior in humans." In other words, pain might be an incredibly socially salient signal, but it's still easily overridden from the top down provided relevant contextual information - you won't empathize with something that isn't there. While I think this hypothesis is well-supported by the data, it's also backed by a pretty healthy literature on contextual influences on pain judgments (including in clinical contexts) that I think the authors might want to consider referencing (here are just a few that come to mind: Craig et al., 2010; Twigg et al., 2015; Nicolardi et al., 2020; Martel et al., 2008; Riva et al., 2015; Hampton et al., 2018; Prkachin & Rocha, 2010; Cui et al., 2016).

      3. I had a few questions regarding the stimuli the authors used across these experiments. First, just to confirm, these targets were posing (e.g., not experiencing) pain, correct? Second, the authors refer to counterbalancing assignment of these stimuli to condition within the various experiments. Was target gender balanced across groups in this counterbalancing scheme? (e.g., in Experiment 1, if 8 targets were revealed to be actors/actresses in Round 2, were 4 female and 4 male?) Third, were these stimuli selected at random from a larger set, or based on specific criteria (e.g., normed ratings of intensity, believability, specificity of expression, etc.?) If so, it would be helpful to provide these details for each experiment.

      4. The nature of the charitable donation (particularly in Experiment 1) could be clarified. I couldn't tell if the same charity was being referenced in Rounds 1 and 2, and if there were multiple charities in Round 2 (one for the patients and one for the actors).

      5. I'm also having a hard time understanding the authors' prediction that targets revealed to truly be patients in the 2nd round will be associated with enhanced BOP/altruism/etc. (as they state it: "By contrast, reconfirming patient identities enhanced the coupling between perceived pain expressions of faces and the painful emotional states of face owners and thus increased BOP.") They aren't in any additional pain than they were before, and at the outset of the task, there was no reason to believe that they weren't suffering from this painful condition - therefore I don't see why a second mention of their pain status should *increase* empathy/giving/etc. It seems likely that this is a contrast effect driven by the actor/actress targets. See the Recommendations for the Authors for specific suggestions regarding potential control experiments. (I'll note that the enhancement effect in Experiment 2 seems more sensible - here, the participant learns that treatment was ineffective, which may be painful in and of itself.)

      6. I noted that in the Methods for Experiment 3, the authors stated "We recruited only male participants to exclude potential effects of gender difference in empathic neural responses." This approach continues through the rest of the studies. This raises a few questions. Are there gender differences in the first two studies (which recruited both male and female participants)? Moreover, are the authors not concerned about *target* gender effects? (Since, as far as I can tell, all studies use both male and female targets, which would mean that in Experiments 3 and on, half the targets are same-gender as the participants and the other half are other-gender.) Other work suggests that there are indeed effects of target gender on the recognition of painful expressions (Riva et al., 2011).

      7. I was a little unclear on the motivation for Experiment 4. The authors state "If BOP rather than other processes was necessary for the modulation of empathic neural responses in Experiment 3, the same manipulation procedure to assign different face identities that do not change BOP should change the P2 amplitudes in response to pain expressions." What "other processes" are they referring to? As far as I could tell, the upshot of this study was just to demonstrate that differences in empathy for pain were not a mere consequence of assignment to social groups (e.g., the groups must have some relevance for pain experience). While the data are clear and as predicted, I'm not sure this was an alternate hypothesis that I would have suggested or that needs disconfirming.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer 1

      This paper proposes a noise-aware approach SCRaPL for modelling the associations of single cell multi-omic data. For gene expression, it uses Poisson-lognormal model. For DNAm data, it uses Binomial noise model which explicitly takes into account the average within the region. The Bayesian hierarchical framework employed by SCRaPL could achieve higher sensitivity and better robustness in identifying correlations, and also offer a template for the application of more complex analysis techniques to multi-omics data. The symbols of this paper are a little bit confusing, and I suggest authors to carefully check them.

      We thank the reviewer for his/ her appreciation, and apologise for the confusion arising from the dense notation, which we will thoroughly revise.

      1. The symbols used in this paper are messy. For example, "1" and "2" are subscripts in Eq.(2) but become superscripts in Figure 5. Besides, there are many symbols not explained such as mj, Hj, Ψ0, etc. Also, I don't know if x{j,i}^{(1)} , x{j,i}^{(2)} in Figure 5 are same with x{ij1} and x{ij2} in Eq.(3). There are many places mismatch, authors should check carefully.
      1. Why the equations in Fig.5 are totally different with Section 4.2? For example, pj Beta(αj ,βj ) in Fig.5 but ρj Beta1,1 in Eq.(8).

      We apologise for the notational confusion, this will be fully revised.

      The paper involves a lot of hyper-parameters which doesn't demonstrate their selection. For example, c1, c2, d1, d2.

      This is a good point. We will include a sensitivity analysis on the hyperparameters, justifying the choices on both simulated and real data.

      In section4.8, I am confused about $ρ_j$ the experiment 2, 5, 8, 11. Why $ρ_j$ both represents ZI rate and correlation?

      We apologise for the notational oversight, which will be rectified.

      In Section 4.5, it is difficult to understand the sentence "for me threshold u". Besides, what is $r$ represent in Section 4.5?

      We apologise for the confusing sentence. $r$ is the Pearson correlation coefficient, as explained at the start of 4.5

      Why there is "(6a)Agreement between SCRaPL and Pearson" in Fig. 4?

      This simply means that the panel shows a methylation/ expression scatterplot for a gene where estimation by SCRaPL and Pearson return both a significant association. We will expand the caption to explain further.

      For Fig.1, I cannot see the text in the rectangle.

      Apologies, we will improve the readability of the figures

      I would like to see the efficiency analysis for SCRaPL.

      As part of part of re-implementation in a more accessible programming language, we have preformed preliminary efficiency analysis for MCMC , demonstrating linear scalability. Results will appear in the revised manuscript.

      Reviewer 2

      The authors present a Bayesian model to determine noise-corrected correlation coefficients for gene expression (RNA) and DNA-methylation data at single-cell resolution. The authors present a series of simulation data and an example of matched multi-omics data, and compare their results with Pearson correlation. Noise modelling allows the model to determine gene-methylation correlation patterns more accurately. While the authors demonstrate a neat application on accurate quantification of correlation coefficients, I see a limited use of the model for the broader single-cell community. The authors may therefore improve their manuscript on several aspects.

      We thank the reviewer for the encouraging words, and thank him/ her for the critical observations, which we have taken at heart, considerably broadening the scope of our paper to make it more attractive to a larger community.

      - Abstract: please specify the omics layers that you are analyzing (RNA + DNA methylation) in the abstract

      We acknowledge that, while SCRaPL is potentially general, in the first submission we focused only on RNA and DNA methylation. We have now decided to expand our analyses to include 10X data of simultaneous chromatin accessibility (ATAC-seq) and RNA.

      - What is the benefit of using a Bayesian model formulation in this setting?

      The benefit is twofold: a principled treatment of noise, and a quantification of the resulting uncertainty which allows for a meaningful way to compute Bayesian significance levels. We will expand the discussion of the relative merits of a Bayesian vs frequentist approach.

      - Does it also apply to unmatched data?

      In principle, given measurements with the same number of cells in all modalities, it is possible to apply SCRaPL. However, unless there is a natural pairing between different cells, the scaling of this approach will be quadratic in the number of cells, hence potentially expensive (although largely parallelizable). We will discuss this now, particularly in the light of applying SCRaPL in conjunction with other suites such as Seurat.

      • Would SCRaPL allow for differential correlation testing?

      At the moment, SCRaPL does not allow for differential correlation testing. Of course, one may run SCRaPL separately on two groups of cells and compare the resulting estimates, which would be informative. Nevertheless, extending SCRaPL to perform differential correlation testing (e.g. using Bayesian model selection) would be a non-trivial effort. We will add a comment on this issue to the discussion section.

      • Figure 1: The graphical description of the model is rudimentary. I believe that the model description could profit from a graphical model representation of SCRaPL (as presented in figure 5).

      We will redraw Fig. 1 and incorporate the graphical model from Fig 5.

      - Simulated data: all experiments seem to have rather low cell numbers (max. 200) and genes (max. 300). Given that 10X Genomics is the most widely-used sequencing platform with approx. 10,000 cells and 3,000 (highly variable) genes per experiment, and given that the authors show a use-case with 9480 genes in 487 cells, it seems appropriate to extend the simulations and runtime estimates of the presented model to several thousands of cells and genes, respectively.

      Thank you for this comment. The original simulation settings were designed with scMT data in mind, where indeed only a few hundred cells can be assayed at most. Partly because of this feedback, and also because of the request of implementing SCRaPL in a different language, we are working on a more scalable Tensorflow implementation which will be able to handle thousands of cells and genes in a matter of tens of minutes . The new simulated data will therefore extend into this regime with larger data sets.

      - Figure 4: Please revise the figure legend as I did not understand the plotted results based on the description.

      We will do so.

      - Results section 2.5: Please formulate your whole argument about epigenetic regulators. I do not think that "For further information please refer to supplementary figure XYZ." Is an appropriate closing statement for a paragraph, nor does it motivate the reader to look at the supplementary figures (I did look at them and I do not see how they support the point made in the paragraph). Please elaborate and consider a "take home message" for the paragraph such that the reader is able to understand the benefit of SCRaPL without revisiting the original data publication.

      Thank you for this pointer, we will take it on board in the full revision.

      - Conclusion: The authors mention that SCRaPL would further offer a "template for the application of more complex analysis techniques (such as clustering, dimensionality reduction and network inference)". If that was the case, the authors should consider a comparison to other tools, which offer exactly that (e.g. Seurat's CCA or non-negative matrix factorization in LIGER). Further, the authors should set their work into context with tools like bindSC.

      Thank you for the suggestion. As far as we can tell, all of these methods are thought for unmatched data, rather than multi-omics assays performed in the same cells. Having said that, it is in principle possible to “preprocess” data with SCRaPL and then feed to Seurat or other tools the latent means computed by SCRaPL. We will include an example of how this may be done in the revision.

      - Implementation: Matlab is used in about 6% of the single-cell RNAseq tools (according to scrna-tools.org). To reach a larger scientific community, do the authors plan to provide an R or Python implementation of their model?

      We are now implementing SCRaPL in Python using Tensorflow probability, hoping to achieve substantial speedups (see response to previous point).

      Additional minor points about formatting by Reviewer 2 will all be addressed.

      Reviewer 3

      Maniatis et al propose a sound strategy to analyse single-cell multi-comic data sets. A key advance is to use bespoke error models for each of the omics data. These are integrated into a multivariate gaussian model. This method is a novel and, in my opinion, a valuable addition to the analyses of the growing multi-omics single-cell data sets.

      We thank this reviewer for his/ her appreciation of our work.

      - Authors make a convincing argument of the importance of principle methods and in particular to use noise models that tailored to the data at hand. To further support this, can authors elaborate on how results would be different from using commonly applied methods ? Eg those embedded in the Seurat, OSCA, and scanpy 'suites'? Authors compare to Pearson correlation-based methods but is not clear if that is the true state-of-the-art on those methods

      As far as we know, volcano plots of p-value versus Pearson correlation are the most commonly employed approaches to assess correlations amongst different molecular modalities in single-cell multi-omics (see e.g. Argelaguet et al, Nature 2020). Seurat and other methods normally do not deal with single-cell multi-omics (i.e., multiple omics measured in the same cell), rather with multiple single-cell omics (different molecular modalities assayed in different cells). Nevertheless, it is possible to pre-apply SCRaPL to non-matched data and then use another suite; as an illustration, we will perform an analysis on scMT data using SCRaPL followed by Seurat.

      - In the case study on mouse embryonic stem cells, authors excluded the chromatic accessibilty. Why not using it to more clearly show the value of the method?

      We did use SCRaPL also on chromatin accessibility, however the signal was weaker and we did not include it in the manuscript, we will now present these results as supplementary material.

      - It would also be great if authors would use a different single-cell multi-comic data sets, using other dat modalities, e.g. CITE-Seq data. If this not possible, at least they should elaborate on which omics SCRAPL can handle, what would be the noise models for different data types, etc.

      We have started analysing a joint scATAC-scRNA- seq data set generated using the new 10X commercial platform, and will add the results of this analysis to the revised manuscript. We will also expand the description of the suitability for different data types.

      *- As the authors acknowledge, computational burden is high, which presumably limits scalability. Are authors able to further explore this (scalability on Insilico data)? Or how complex is adopting the variational inference method suggested? I appreciate that the variational inference implementation might be out of the scope of this paper, though.

      • It is a pity that the method is in Matlab. Nearly no-one in single-cell omics use Matlab. Our own lab is largely invested in this topic and we do not even have Matlab licenses. I strongly encourage authors to implement their method in e.g. R or python, ideally compatible with the broadly used 'suites' (Seurat, OSCA, and scanpy,...).*

      We are addressed these two comments jointly by re-implementing SCRaPL in Tensorflow probability (Python based), which allow us to leverage powerful libraries for variational inference. We hope that this will lead to a substantial increase of scalability, providing the possibility of running on thousands of cells / genes in under one hour (results will appear in ).

    1. For openers, don’t say “fastly”, because there is no such word in English. Also, learn to check your typing so you don’t write “Bur” when you intended “But”. In my opinion you would make a terrible mistake by trying to defend your low skill in English. It is simply an inadequacy you have, and presumably are interested in overcoming. I think it will serve you better to memorize the following speech, and practice saying it until it flows out quickly and easily, with no hesitation or errors. Say this just as the interview begins: “I am very pleased to meet you, Mr. _______. Thank you for granting me this interview. “Before we begin, please let me apologize for my inadequate English skill. I may use some incorrect words, or pronounce some words improperly. I may not be able to answer some questions suitably, because I might lack the right words. “I hope to show you that I have the technical knowledge needed for this position, and that I have the skills and work ethic needed to do the job well. “I am currently working very hard to correct my deficiencies in English, and I believe I can accomplish that soon. I have had great success in learning other languages rapidly, but I have not yet devoted enough attention to developing fluency in English. Please understand that achieving skill in English is my highest priority.” This, I believe, will gain you a very sympathetic ear, and will lead to a very productive interview.
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      First of all, we would like to thank the editor and all reviewers for the effort to evaluate our paper in this difficult era of COVID-19.

      Reviewer #1

      (Significance): Overall, this manuscript is very clear and easy to follow. The manuscript could be improved by making the following changes:

      We thank the reviewer for the favorable comment and will revise the manuscript according to the suggestions.

      Reviewer #2

      (Evidence, reproducibility and clarity): The use of genetics is particularly impressive but the lack of major discoveries dampens the enthusiasm. Additional efforts to mechanistically define wave initiation and wave propagation would significantly improve the impact of the manuscript. Moreover, some of the conclusions are not fully supported by the data and require further experimentation and/or analysis.

      We admit that marked redundancy of function among the EGFR ligands and their essential roles in cell growth prevent us from obtaining very clear results. Considering the importance of EGFR ligands in biology, we believe, our observation will give invaluable suggestions to whom wishes to clarify the roles played by EGFR-family protein in other biological contexts.

      (Significance): While it is known that ADAM17 is critical to process EGFR ligands, the specific or redundant roles of different ligands remains an open question. The authors find that all ADAM17 ligands contribute to ERK signaling waves but may have specific contributions to other phenotypes. This work would be of interest to the signaling dynamics, epithelial and developmental biology communities.

      We thank the reviewer for the favorable comment.

      Reviewer #3

      (Evidence, reproducibility and clarity): Overall, this study is carried out with a high degree of rigor and technical excellence, with clear reporting of experimental details and replication. The writing and figures are very clear, and there are no obvious technical problems. However, there are some areas in which the strength and clarity of the conclusions could be strengthened by relatively simple experiments.

      We thank the reviewer for the favorable comment. We have already performed some of the experiments suggested by the reviewer. As the reviewer might have anticipated, co-culture with the wild type MDCK cells helps mutant cells to survive. We believe we could propose a clearer model in the revised paper.

      (Significance): This study definitively establishes the role of 4 EGFR ligands in the generation of ERK activity waves in MDCK cells. While other studies, including some from the senior author's lab, have strongly indicated that EGFR autocrine signaling is important for these waves, this study goes further in comparing the roles of these ligands using knockouts to unambiguously establish the autocrine factors involved. Others who use this common experimental system (MDCK) to study epithelial dynamics will find this study of great interest. A wider audience of those who work on EGFR-mediated signaling will also find the data quite fascinating as an example of the complex relationship between ERK activation and its downstream effects. The technical excellence of the paper will make it a must-read for those in these fields. However, there are some factors that limit the scope of the significance. MDCK cells are an important experimental model system but differ in substantial ways from other epithelial cells, particularly in the expression of EGFR ligands. Because different ligands such as amphiregulin dominate in other systems (as noted by the authors, and PMID 27405981), the ability to extrapolate from these findings to other cell types is somewhat limited. Also, the paper avoids addressing the major question of how ERK waves relate to collective migration rate. From the data presented it is clear that this relationship is complex; for example, bath application of the ligands restores a high migration rate but not ERK waves. Given this lack of a clear relationship it is an understandable decision to leave this question for future work; however this does limit the conclusions that can be drawn from the study.

      We completely agree with the reviewer’s view. It is uncertain to what extent the observation with MDCK cells can be generalized to other cell types. We also admit that the conclusion is not very simple because EGFR signaling is required for various cellular functions including cell survival and migration. Even though the gene editing becomes so easy, it is still labor consuming work to knock out many genes in a single cell line with extensive characterization. We believe the data shown in our work will provide a basis for the understanding of EGFR ligands.

      Reviewer #1

      For Fig 1F, 3 individual experiments should be conducted to confirm results.

      We will follow the reviewer’s suggestion and repeat the experiment.

      For Fig 1G, could the authors please show the original western blot data in full rather than just the densitometry graphs?

      We did not show just for the sake of brevity. We are happy to will include the images as a supplementary data.

      The authors should explain the origin/phenotype of MDCK cells for those who are not familiar with the cell line.

      We will modify the text according to the reviewer’s suggestion.

      The authors should give a future outlook/direction for future experimentation to further confirm redundancy in EGF ligands in the propagation of ERK activation waves.

      We will discuss on the redundancy in other cell types based on available NGS data.

      Some mention of the use of biosensors in the abstract and introduction is recommended as this is a major part of the experimental work.

      We will refer to the biosensors in the abstract and introduction.

      Reviewer #2

      There are conflicts with some of the conclusions made about ligands. dEGFR cells have basal ERK activity as high as WT which argues against EGF being responsible for basal ERK activity. Further, basal ERK activity was not rescued by restoration of EGF in the 4KO-EGF cells. The authors should address this discrepancy.

      We agree that some new questions have arisen from our observations. The discrepancy of the phenotypes between dEGFR cells and dEGF cells is an example. We are currently establishing dEGF cell lines, in which different genomic sequences of the EGF gene were targeted. We have already started to develop these cell lines and will obtain them within a month. The result will provide some clues to answer the questions. However, even if we could not solve the question, we believe, it is worth reporting observations that could not be easily understood, because such questions are often leading to another discovery.

      Besides the ones genetically disrupted in this work, other EGFR ligands seem to play functional roles given that dEGFR cells less migration and fewer ERK waves than 4KO cells. The authors could test if other ligands are upregulated in 4KO cells to compensate. On a similar note determining whether ADAM17 deficient cells are more similar to 4KO cells or dEGFR cells could provide some insight.

      According to the reviewer’s suggestion, we will conduct qPCR of growth factors in mutant cell lines to see the expression levels of seven EGFR ligands might have changed significantly. At the same time, as the reviewer suggested, we will establish ADAM17 knockout cell lines and compare the phenotype with those of cell lines deficient from EGFR ligand genes.

      • The authors propose that Nrg1 is responsible for ERK waves in QKO, 4KO, dEGFR, and 4KO-EGF cells but are limited in testing this due to Nrg1 being essential in 4KO cells. First, Nrg1 should have been deleted in TKO cells to confirm that it is only essential in the absence of the four EGFR ligands. Additionally, Nrg1 could be knocked out in 4KO-EGF cells to demonstrate the claim that EGF-induced ADAM17 cleavage of Nrg1 is responsible for ERK waves.*

      We do not think the deletion of Nrg1 in the TKO cells will abolish the ERK activation waves because EREG in TKO cells could transmit the waves. To overcome the problem of cell growth, we will try to obtain 5KO cells by Cre-induced deletion of NRG1 in 5KO-loxP-NRG1 cells, wherein EGF is supplied exogenously. We already had preliminary data suggesting that co-culture with wild type MDCK cells helps 5KO cells grow.

      The authors state that ERK activation waves are important for collective migration and seek to understand the roles of each EGFR ligand, but despite measuring migration and properties of ERK activity, there is very little analysis or commentary on the relationship between the two. The ability of HB-EGF to restore migration without ERK waves suggests that waves are not required per se. It is interesting to note that with restoration of ligands, migration is higher than WT but ERK activity is lower.

      We refrained from spending much space about the essential role of ERK activation waves in collective cell migration, because several papers have already described this issue.

      Probably, we should have spent more space to emphasize that the collective cell migration is comprised of at least two different phenomena. The migration of leader cells and the follower cells. The ERK activation waves are essential for the follower cells but not the leader cells. In 4KO cells, both the leader cell and follower cell migrations are impaired. We showed that GFs expression restore the leader cell migration, but not the follower cells. We will revise the text to include this issue.

      It is suggested that the total amount of EGFR ligands may be the primary determinant of migration, but deletion of TGFα alone causes a significant decrease in migration comparable to the DKO cells. TGFα has the lowest expression of the four ligands studied but is the only ligand to have a significant impact on migration in the single knockout context, which disagrees with that conclusion.

      Each EGFR ligand has different affinity to EGFR, which makes it difficult to link the mRNA levels directly to the effect of each EGFR ligand. We will modify the discussion to include this argument.

      Other:

      Fig. S3B needs clarification that the WT (black) and 4KO (green) did not receive a stimulus.

      We will follow the reviewer’s advice.

      Reviewer #3:

      The experiments in Fig. 5 are undertaken with the purpose of assessing whether NRG acts as an additional ligand that mediates the residual ERK waves in 4KO/QKO cells. However, this question is never addressed in the NRG/4KO cells. While it might be challenging due to the proliferative defect, it seems important to attempt this experiment in some way; measuring the ERK waves for these cells would establish whether all of the critical autocrine factors have been identified. Can the proliferation be rescued by application of high amounts of growth factors?

      This question is similar to a question raised by reviewer #2.

      To overcome the problem of cell growth, we will try to obtain 5KO cells by Cre-induced deletion of NRG1 in 5KO-loxP-NRG1 cells, wherein EGF is supplied exogenously.

      The bath exposure to EGFR ligands shown in Fig. S3A is an important experiment, but it is surprising that ERK signaling is not maintained under these conditions. Is this due to depletion of the added ligands, perhaps locally? Or is the intermittent nature of paracrine signaling needed to maintain ERK activity? These possibilities could be distinguished by checking whether the added EGF or the other ligands are depleted after several hours, or by restimulating with a new bolus of ligand after several hours.

      We thank the reviewer for this invaluable suggestion. We will conduct the experiments suggested by the reviewer.

      The connection between ERK activity and migration is somewhat confusing. It would be helpful to show the dose sensitivity of migration to a MEK or ERK inhibitor. Are other pathways downstream of EGFR such as PI3K involved in the autocrine-mediated migration? This could also be established with the appropriate inhibitors.

      We should have spent more space to emphasize that the collective cell migration is comprised of at least two different phenomena. The migration of leader cells and the follower cells. The ERK activation waves are essential for the follower cells but not the leader cells. In 4KO cells, both the leader cell and follower cell migrations are impaired. We showed that GFs expression restore the leader cell migration, but not the follower cells. We will emphasize this issue in the revised manuscript.

      Reviewer #1:

      Line 47 in Abstract should read "Aiming for" not "Aiming at".

      We have corrected the mistake as suggested.

      Some in the field call fluorescence lifetime microscopy "FLIM", you could adopt the same wording in your manuscript to attract more readers.

      We have included FLIM according to the reviewer’s suggestion.

      Reviewer #1 :

      Figure 1D, the images should be presented using the same scale for both the EKAREV and EKARrEV constructs so that they can be directly compared.

      Because the basal FRET/CFP ratio is significantly different between EKAREV-NLS and EKARrEV-NLS, the changes during mitosis become unclear if we applied the same scale. This figure is prepared to show the reactivity to Cdk1 during mitosis; therefore, we believe the current scale is better for presentation.

      The names QKO and 4KO are a bit confusing. Could the authors please change the naming of the knockout cells so that readers understand that QKO and 4KO are two separate cell types? Perhaps instead of 4KO use FKO for "full knockout" or something similar. The 5KO line might also need to be named something else if you change to FKO.

      We have discussed this issue with the co-authors, but could not reach a better idea. Instead of changing the names, we will include a detailed explanation for each cell line.

      Reviewer #2:

      The interpretation of the RA-SOS coculture experiments is confusing. Based on the author's reasoning, I would expect ADAM17 shedding in the RA-SOS cells to trigger signaling at the interface of both WT and 4KO cells but the 4KO should be unable to propagate the wave farther away from the interface. This does not seem to be the case. Do RA-SOS ADAM17KO cells still trigger waves of ERK signaling in the WT cells? Do ADAM17KO cells behave as the 4KO cells in this coculture system?

      Probably, the reviewer misunderstood the method. The GF-less 4KO cells were co-cultured with wild type cells harboring the RA-SOS system. We will describe more in detail to avoid misunderstanding.

      Finally, the growth curve in Fig. 5B indicates that 5KO-loxP-NRG1-CreERT2 cells are viable for about two days after Cre induction. The authors could perform a confinement release assay of these cells 1-1.5 days after Cre induction to look for further reduction of ERK waves and migration to demonstrate the role of Nrg1.

      This experiment may not be necessary. It is clear that NRG1 is required for the survival of 4KO cells. The reason why cells are still alive 1 to 2 days after 4-OHT application is simply because NRG1 protein is remaining. The interpretation of the results would be difficult during the phase of NRG1 reduction.

      In Fig. 1G, the normalization of all WT pERK samples to 1 artificially lowers the variance to zero when performing the T-test.

      For the comparison of immunoblotting data derived from independent experiments, the signals must be normalized to the control. We believe the use of pERK/ERK of the wild type cells as the control is reasonable for this experiment.

    1. Author Response:

      Reviewer #1 (Public Review):

      I think that it is important for the authors to consider that for most (if not all) SARS-CoV-2 variants, increased transmissibility of the virus has not been directly demonstrated. While it is clear that numerous variants have emerged and will continue to emerge, the rapid upsurge of cases with a variant may be related to many factors (e.g. host susceptibility due to immunity or genetic factors, virus seeding events, predominant replication in particular age cohorts, ...) that cannot simply be captured as "transmissibility of the virus". Even for B.1.1.7 and D614G mutants, the direct evidence of increased transmissibility in humans is extremely limited if available at all. Most studies erroneously simply take the increasing occurrence of particular lineages or mutations in sequence databases as a measure of increased "transmissibility", which should be avoided, also in the present manuscript. Increased transmissibility can only be derived from field studies where transmission is measured directly.

      We thank the reviewer for pointing out that this is a controversial area. We have adjusted the text throughout to accommodate the fact that the published evidence of increased transmissibility/infectivity is not definitive.

      On several occasions in the manuscript (e.g. page 3, page 4 L58-59, page 9 in submitted version), the authors seem to suggest that changes that lead to increased "transmission" or binding affinity and changes that lead to immune escape are mutually exclusive. But the opposite might be true. Viruses may escape from antibody-mediated immunity by amino acid substitutions in linear or structural antibody-binding epitopes. However, viruses may also escape from antibody-mediated immunity through altered protein density on virion surfaces (e.g. less Spike) and/or altered affinity, making it harder for antibody to inhibit virus attachment. As an example, increased affinity may facilitate virus replication with less dense Spike protein, allowing more effective antibody escape. Lower affinity but more dense coverage of Spike may reduce accessibility of critical virus parts by antibodies. Several viruses are known to escape from antibody-mediated neutralization through changes in affinity/avidity.

      We agree with this point and have modified the text to avoid implying that increased transmissibility and antibody escape are mutually exclusive.

      In relation to the previous point, it is important that authors mention some limitations of the present work in the discussion. SARS-CoV-2 virion attachment to cells is not just a matter of spike protein binding and certainly not of a monomeric RBD. Escape from antibodies and effects on affinity are heavily influenced by the entire (trimeric) spike protein, including its N-terminal domains. Such components are not taken into account in the present experimental designs, and this should be discussed, as e.g. the NTD can be important in attachment and antibody-mediated neutralization.

      We thank the reviewer for this suggestion. We have added an appropriate caveat to the Discussion.

      The authors suggest that the pandemic virus as it spread across the globe initially did not have "optimized" affinity. However, in the first months of the pandemic, there was relatively limited variation in spike protein sequences. The major variants emerged only later and mostly in areas where population immunity was building up. Again, this begs the question whether natural selection is occurring as a consequence of receptor affinity or immune escape?

      We thank the reviewer for making this point. However, we do not think it is that surprising that it took a few months for the first Spike variants to be detected, for the following reasons. Firstly, the number of infections would have been relatively low early in the pandemic and SARS-CoV-2 replicates with a comparatively low error rate for an RNA virus. Secondly, the introduction of strict non-pharmacological measures (social-distancing etc), which would have increased the selective pressure on the virus, was somewhat delayed. Thirdly, it would take some time for any variant that emerged by chance to expand sufficiently to be detected by sequencing. While there is evidence suggestive of broader immunity in populations were the Beta and Gamma variants emerged, which we cite, we are not aware of evidence of widespread immunity in populations where the D614G, S477N and Alpha variants first emerged.

      Reviewer #2 (Public Review):

      Barton and colleagues investigated the effect of common SARS-CoV-2 RBD mutations and two ACE2 mutations on the RBD/ACE2 interaction. They concluded that the N501Y, E484K and S477N increased receptor binding while the K417N/T had the opposite effect. Double and triple mutants were also included. The ACE2 mutations (that are rare in the human population) also increased binding to most RBD mutants. The study is well-performed and written clearly.

      The primary conclusions of the manuscript were supported by the results. However, the interpretation was too speculative. In the abstract (lines 14-17), the authors suggest that the 501 and 477 mutations enhance transmission solely based on data on the RBD-ACE2 interaction. It is unknown whether increased affinity to ACE2 is beneficial for transmission. In addition, increased RBD affinity to ACE2 does not mean that the whole spike or virus particle also binds stronger to ACE2. Lastly, increasing ACE2 affinity does not necessarily increase binding to cells (for example S1A binding to sugars or spike abundance can also influence this).

      We agree that it would be inappropriate to assume, based on our affinity/kinetic studies alone, that 501 and 477 enhance transmission. That is why the relevant sentence in the abstract starts with the phrase, “Taken together with other studies”. We summarises the evidence from these other studies in the Discussion. We acknowledge that we have not examined the effects of the mutations on binding of the whole Spike protein to ACE2 or viruses to cells, and have added a suitable caveats to the Discussion.

      The overall impact on the field will be limited as there is substantial overlap with already published studies. The observation that the N501Y and E484K increase receptor binding while the K417N/T mutations decrease binding was already made prior by Laffeber et al (2021; J Mol Biol). Laffeber et al also investigated double and triple mutants and came to similar conclusions. Liu et al (2021) confirmed that the N501Y increases binding whereas the K417N/T have opposing effects (Liu et al., 2021 mAbs). The observation that the Y501N increases ACE2 affinity has been made by several groups (e.g. Liu et al 2021 Cell research; Starr et al 2020 Cell).

      We thank the reviewer for highlighting these addition studies, two of which are very recent. We have now cited these studies.

      Starr et all 2020 was a high throughput study in which the affinity measurements were semi-quantitative, and no kinetic analysis was performed. Liu et al (2021) and Laffeber et al (2021) were performed at 25 C and without rigorous controls for mass-transport and protein aggregation. Liu et al (2021) did not report kinetic measurements. Their results are broadly consistent with ours but their affinity and kinetic measurments are ~ 10 fold different. While we accept that some of the measurements of the effects of mutations have been made before, our measurements of affinity and especially kinetics are performed more rigorously than in previous studies and, for the first time, at a physiological temperature (37 C). Thus, the affinity and kinetic data that we have obtained for single and combinations variants are more definitive. As noted in our Discussion there is a wide variation in reported binding affinities and kinetics in previously published studies. We think the comprehensive data that we report here, the same robust method to measure binding properties of all these variants, adds significant value.

      Reviewer #3 (Public Review):

      [...] 1) The ACE2 receptor exists naturally as a dimeric form and the RBD is a component of the SARS-CoV-2 spike trimer. The assay format here was monomeric RBD binding against monomeric ACE2 throughout this study. While the measurements are indeed carefully executed and under more physiological conditions than many other reported studies, the authors should discuss potential avidity effects, the consequences of mutations on the accessibility of the RBD in VOC versus wildtype, and impact of other domains such as the NTD, in the context of their monomeric ACE2 measurements with isolated RBD here.

      We thank the reviewer for raising this issue. We have added a section to the Discussion addressing these important points.

      2) As shown in Figure S2, RBD WT, K417N, K417T, KN/EK, KT/EK, and S477N, the ~30kDa monomeric proteins were flanked by additional ~60kDa bands (which correspond to the smaller peaks to the left of the main peaks) some of which bleed through to the main fraction to different extents, whereas RBDs SA, UK1, UK2, BR, and E484K, do not seem to have as much or any of these extra species. Can the authors comment on whether these contaminants are RBD-dimers as observed before (Dai et al. 2020)? If yes, would such dimers affect the affinity and kinetics?

      We thank the reviewer for pointing out these larger ~60 kDa bands in some RBD preps. We think that it is unlikely that these are RBD dimers as these are reducing gels. The strictly monophasic kinetics of all RBD preps, also argues against this being an RBD dimer. We have confirmed by densitometry that the larger band comprises less that 5% of the protein in all the preparations. This will have only a minor effect on estimated of RBD concentration. We have added this information to the Figure S2 legend.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      Provide a short summary of the findings and key conclusions (including methodology and model system(s) where appropriate). Please place your comments about significance in section 2.

      In this paper the authors used a targeted approach to identify rare mutations in a cohort of glioma patients. Using this approach they identified a recurrent mutation in the TOP2A gene encoding for Topoisomerase 2A, and suggest that this mutation creates a more effective protein, binding DNA strongly and maybe more enzymatically active. RNAseq analysis of TOP2A WT and TOP2A mut tumor samples suggest different transcription patterns and points to possible splicing defects. The most recurrent variant (E9448Q) is described in depth and some experimental information shows this variant might be a gain-of-function mutation.

      **Major comments:**

      • Are the key conclusions convincing? The validation of both the methodology and the presence of never described TOP2A variations in HGG is done quite successfully. Interesting evidence about relevance of the most frequent mutation is provided. However, besides having computational and biochemistry assays performed, lack of details about in vitro experiment statistics (no p-values are provided in figures 4 and 5, neither sample size, repetitions) weakens the conclusions claimed by the authors about the properties of the mutated topoisomerase. Ad. In the revised version we provided more details about in vitro experiments, including statistics when is applicable, sample size and a number of repetitions. In the fig. 4 we show the results of two repetitions (so we can’t calculate statistics) but I would like to stress that we tested independently two fragments of the protein and the results were similar, so our conclusion was justified. However, we do agree with the reviewer that a statistical analysis of those biochemical tests is required. We already started to produce a new batch of recombinant proteins and we will add repetitions to reinforce our claims. We will provide statistical analysis details once all experiments are performed. __

      • Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether? Claims about E948Q variant function should be revised. Data is not presented in a convincing way, plus there is ambiguous language used from the results ("We conclude that the E9448Q TOP2A protein is functional, and MIGHT have a higher activity than the WT protein") to the rest of the paper where they strongly support the claims about the TOP2A activity. Ad. We will provide more data on biochemical features of the TOP2A variant to confirm the impact of the E948Q substitution on enzyme activities, which would allow more strong conclusions. This will present our results in more convincing way. A language of the manuscript has been critically revised and modified (see a version with tracked changes).

      • Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation. In line with the presented data in the paper, additional experiments that show catalytic changes of the E9448Q variation must be added. It is shown that there are differences in the DNA binding capacity by EMSA compared to the WT form, however, the DNA supercoil relaxation activities is not that different, at least the way the results are presented. The authors suggest that TOP2A mutation is a driver mutation but no validation in vitro of this claim is shown. Can this mutation alone or in combination with e.g. tumor suppressors transform normal cells to cancer cells? Do cell lines expressing this mutation (compared to parental TOP2A wt expressing cells) display increased transcription? Increased invasion? Ad. In the revised version we moderated our conclusions and we do not state that the mutated TOP2A is an oncogenic driver. We suggest this mutation (and possibly other TOP2A mutations, as we analyzed the impact of other variants on the TOP2A protein function) contribute to gliomagenesis. This conclusion is based not only on the changes in biochemical properties, but also on the observation of the impact of the mutation of transcription and patient survival. We expanded the analysis of TOP2A mutations and expression levels on TCGA datasets and those new results support our conclusions about a pathogenic nature of TOP2A overexpression and mutations (the supplementary fig.4). We believe in such situation, there is no rationale to make a classical oncogenic driver experiment.

      Due to a rarity of the TOP2A mutations it is impossible to find a patient derived cell line with such defect. We attempted to overexpress TOP2A in glioma cells but apparently there is some autoregulation preventing overexpression of this protein is cells with endogenous TOP2A expression. Therefore, we can’t verify if cell lines expressing this variant (compared to parental TOP2A wt expressing cells) have increased transcription. Moreover, such experiments are costly and require more time investment for substantial experiments

      I would like to stress that modeling some events in cell cultures is difficult and we found in GBMs the link between the mutated TOP2A and increased transcription along with decrease of splicing factors expression.

      We have attempted to make CRISPR/Cas9 mediated knock-in in glioma cells but without success. This is a difficult and time consuming procedure. Although in principle, we agree on the rationale for such experiment, we think that the current data are consistent and convincing. If reviewers find it necessary we may attempt to create glioma cell lines with TOP2A knock-out and overexpression of the mutated TOP2A gene and study it functionally, but it would require more time.

      • Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments. If the authors can complement the already presented in vitro experiments with additional ones supporting their hypothesis, this should be feasible. The authors can use patient derived glioma cells or glioma cell lines manipulated to express either the parental TOP2A wt enzyme or the identified mutated form. __Ad. Due to a rarity of the TOP2A mutations it is impossible to find a patient derived cell line with such defect. Our findings partly relied on frozen historical samples, so it is not possible to develop patient-derived cell lines. As mentioned above, we can create a TOP2A knock-out cell line and overexpress a wild type or mutated version but there is no certainty that TOP2A deficient cells would survive (this is an essential enzyme) and such manipulation would be feasible.__

      • Are the data and the methods presented in such a way that they can be reproduced? Yes, the authors provide a quite detailed explanation of the methods implemented to reach each one of the results they are presenting.

      • Are the experiments adequately replicated and statistical analysis adequate? No, there is no information about the statistical analysis or number of replicates in any of the in vitro experiments performed. This information should be added to the manuscript.

      Ad. In the revised version the requested information was added where was possible and additional repetitions for biochemical experiments are currently in progress.

      **Minor comments:**

      • Specific experimental issues that are easily addressable.
      • Are prior studies referenced appropriately? Yes, authors clearly address the state of the art regarding previous NGS methodologies and let us know the advantages and novelty of their approach.
      • Are the text and figures clear and accurate? There are some discrepancies between the strength of the language used in different sections of the paper to refer the conclusions they can infer from the results they are showing. While they are all valid, authors should revise it. Ad. The text of the manuscript has been unified and revised.

      • Do you have suggestions that would help the authors improve the presentation of their data and conclusions? First of all, describe the statistical analysis used in every figure, include number of biological and technical replicates. I would also suggest to change the title or the scope of the discussion, there is too much focus on the TOP2A in the introduction, neglecting all the technical NGS work that actually lead to several new variants being described. This may be confusing when it collides with a conclusion that is heavily focused on the first half describing potential implications of at least another 3 proteins where genetic alterations were described. Given the fact there is not much experimental work that shows TOP2A mutations relevance in HGG or strong enough evidence of the variant's function I would suggest to change a bit the scope of the title. Ad. The description of the results and discussion have been revised to include additional data/discussion on technicalities and other finding not related to TOP2A. We performed additional computational analyses of TOP2A expression/mutations in the TCGA datasets. We believe that the planned experiments on genetically modified cell lines would provide additional support for our claims. We think that in the revised version a balance between landscape/NGS content and TOP2A content is well balanced.

      Reviewer #1 (Significance (Required)):

      • Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field. The authors describe a methodology that proved to be sensitive and specific enough in order to allow them to detect rare genetic alterations in patient glioma samples. This information could be valuable to describe new driver mutations or infer in genetic pathway alterations that could be potential therapeutic targets. As the authors state at the beginning of the paper, given the poor therapeutical approaches existing for HGG currently, information of this kind could still be highly useful and provide a better outcome to a specific cohort of patients.

      On a personal note, I think there is too much speculation about how TOP2A mutations could be interesting from a biological point of view (authors referred to evidence about implications of this mutation in other forms of cancer) but since no experimental validation is provided in glioma cells, it is difficult to conclude that this enzyme gain-of-function mutation could have a relevant role in HGG and thus make these variants a potential therapeutic target. There are no experiments conducted in glioma cells that express TOP2A variants, it would be interesting to see if it has an effect in the migratory/invasive phenotype like described in other cancer types or like it is suggested by analysis of the genetic pathways activated in the HGG patients samples harboring TOP2A mutation. In addition, there is no evidence of the TOP2A mutations possible role as a driver mutation, which is an interesting aspect that could be further explored from both a computational and an experimental approach.

      Ad. As mentioned above, there is no glioma cells that express TOP2A variants and we are not convinced that such experiment will be feasible taking into account an essential role of TOP2A. We will attempt to perform experiments with CRISP/Cas9 knock-in cell lines and functional validation, but until now we did not accomplish knock-in in glioma cells. We will try to knock-out the endogenous TOP2A using CRISPR and express a TOP2A WT or E948Q variant from plasmids encoding these proteins, but we can’t predict if TOP2A KO cell would survive. If we manage to produce such cells, then we will investigate proliferation, migration and invasion of cells expressing TOP2A WT or mutated variant.

      We do agree with the reviewer that our previous conclusions were too strong, and in the revised version we moderated our claims. We do not say that the mutated TOP2A is an oncogenic driver. We suggest this mutation (and possibly other TOP2A mutations, as we analyzed the impact of other variants on the TOP2A protein structure) contribute to gliomagenesis.

      __Data on the Fig. 1A suggests that TOP2A has a mutational hotspot in the position E948Q in our dataset. In the revised version of the manuscript we have extended RNA-seq analysis of our datasets and TCGA PanCancer datasets to search for TOP2A mutations/ overexpression. We found that another computational prediction using CADD algorithm strongly confirms that TOP2A E948Q is in the top 1% of most deleterious variants in the human genome (CADD score >20). This results was added to Supplementary Table 2.__

      • Place the work in the context of the existing literature (provide references, where appropriate). The quality of the paper is high and in line with other studies in the literature that perform genome and transcriptome analysis of tumor samples. It is only the experimental validation that is lacking data supporting the "in silico" findings. Ad. We would like to point that we provided the results of experimental, biochemical validation (2 assays) showing that the variant TOP2A proteins have different properties. The associations of transcriptional dysregulation in variant TOP2A bearing gliomas was not a in silico prediction but the result of the analysis of real tumor samples.

      As stated above, we are ready to perform further biological validation if the editors find it necessary.

      • State what audience might be interested in and influenced by the reported findings. Computational biologists are the right audience to target this paper. If additional experimental work further validating their initial bioinformatic findings is added to the manuscript then probably a wider population could be targeted.

      Ad. As stated above, we are working now on providing more replicates of biochemical assays and we are ready to perform further biological validation if the editors find it necessary. I would like to stress that genome editing by knock-in is not always possible/feasible, and these type of experiments is time and money consuming.

      • Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate. Brain tumors, immunotherapy, cancer stem cells, tumor microenvironment, tumor heterogeneity. I do not have sufficient expertise to evaluate the bioinformatic analysis and software/programs used to analyze the NGS data.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      By exon targeted resequencing of 664 genes frequently mutated in cancer, authors identify novel mutations associated to Glioma in a cohort of 182 Polish and Canadian samples. Most of these novel mutations have been identified as potential rare germline mutations, somatic mosaicism or loss-of-heterozygosity variants. Among them, authors focus on mutations associated to the TOP2A gene, which encodes one of the two Type II topoisomerases paralogs present in humans. By a limited number of in vitro experiments, authors conclude that TOP2A recurrent variant E948Q, displays increased binding to DNA and topoisomerase activity. Therefore, authors suggest that the TOP2A E948Q variant is a gain-of-function mutation.

      **Major comments:**

      • Authors show an interesting plethora of new exon mutations associated with High Grade Glioma. Nevertheless, the characterization of TOP2A E948Q variant, which is the main focus of the study, although very interesting and potentially clinically relevant, remains incomplete. Association of the TOP2A E948Q glioma variant with a gain-of-function mutation would require to improve the statistical power of the presented experiments (increase number of replicates). With the existing experimental evidence, the increased DNA binding and activity of the TOP2A E948Q variant should be considered as preliminary, especially in the case of 431-1193 aa fragment. I would consider mandatory to increase experimental replicates and to analyse statistical significance in the case of DNA binding experiments and DNA relaxation assays with the TOP2A 431-1193 aa fragment. A more detailed biochemical characterization should be performed. A titration of different amounts of protein should be included in these experiments, and at least two batches of purified proteins should be analysed. Decatenation assays should also be performed to characterize the activity of the mutant protein in more detail. Recapitulation of DNA binding and activity results with other TOP2A variants obtained in this study will significantly reinforce authors claims too. This improved biochemical characterization should not take longer than two months.

      Ad. We would like to stress that while two replicates are presented, we were testing two forms of TOP2A proteins and the results were similar, confirming our conclusions. But we agree that additional replicates would strengthen our claims. Therefore, we are in the process of producing another batch of recombinant proteins to increase a number of replicates and calculate statistics for the biochemical assays (binding and relaxation assay). We will perform titration of different amounts of the protein using two batches of purified proteins.

      The occurrence of other TOP2A variants is low (identified in only a single patient sample), therefore we will perform experimental validation only for E948Q. However, we performed additional computational analysis for other TOP2A variants showing the influence of the substitution on DNA binding by docking the DNA fragment into TOP2A binding pocket (Supplementary table 4).

      • To increase the significance of the results, I would encourage authors to include experiments showing the functional impact of this TOP2A mutation in cells. The connection with transcriptomic alterations is merely correlative, and would be greatly strengthened by functional experiments in cellular models. To draw definitive conclusions regarding the changes in transcription, I would encourage authors to complement the results with experiments that point to the physiological impact of TOP2A variants within the cell. Overexpression of WT and E948Q variants in a cell model and transcriptomic analysis would be desirable, but validation in these experimental models of some of the target genes identified as deregulated in patients could suffice. These experiments could be accomplished in no more than 3-4 months.

      Ad. We agree that the connection of the TOP2A mutation with transcriptomic alterations is correlative, and would be greatly strengthened by functional experiments in cellular models. If we develop a TOP2A E948Q knock-in cell line or TOP2A KO cell line with E948Q over-expression, we are planning to evaluate transcriptomic changes on selected genes by qPCR or whole transcriptome by RNAseq. We estimate that developing a stable CRISPR/Cas9 cell line may take up to 6 months.

      We provided additional results showing that the connection of the TOP2A mutation with transcriptomic alterations may be due to different expression of splicing factors (Supplementary Fig. 6).

      • Some of the methods are not presented with sufficient detail. Regarding the DNA and RNA sequencing experiments, I consider necessary to specify the DNA fragmentation method, reference for the indexed adapters and ligation and amplification procedures (ligase reference, number of PCR cycles, etc). It would be helpful to clarify or reference which are the "special oligonucleotide probes" that are mentioned. Finally, a reference for the "special beads" and final amplification number of cycles is needed. The sequence of primers used for TOP2A cloning and mutagenesis should be included. The reference for the "site mutagenesis kit" used is missing. When studying the survival rate of glioma patients depending of TOP2A expression levels, it should be clarified what is considered HIGH or LOW expression (i.e: which percentiles are used).

      Ad. We expanded the description of methodological aspects of DNA and RNA sequencing experiments. This description was revised and more details are provided in the revised version. Regarding cloning and mutagenesis, we added a table with primer sequences (Supplementary Table 5). We did not use any kit for cloning and mutagenesis. Standard methods and primers with modified nucleotides were used.

      __We have included information about the partitioned groups in the survival analyses in the figure 2 caption. “D - Kaplan-Meier overall survival curve for patients with high (> TOP2A mRNA median expression x 1.25) or low (- There is a major concern about how the experiments are replicated and about the statistical analysis, which is inexistent in some cases. Indeed, Figures 4 and 5 do not present any statistical analysis, it is therefore hard to draw any conclusion. In Figure 4b, the results for the 890-996 aa fragment looks qualitatively clear, but this is not the case for the 431-1193 aa fragment. More replicates and statistical analysis are mandatory, together with a protein titration. The replicates should be performed with at least two independent batches of protein purifications. The individual values of each experiment should be included in the graph to provide a better understanding of experimental variability. All this also applies to Figure 5.

      Ad. We will increase a number of replicates for the binding and relaxation assay. We will perform a titration of different amounts of protein in these experiments using two batches of purified proteins.

      **Minor comments:**

      • The effect on transcription of co-occurrence of TOP2A mutations with other mutations could also be analysed with the already available data. Also, a more detailed analysis of genome-wide transcription could also be used to at least partially address the proposed hypotheses of increased transcriptional rate or splicing aberrations.

      Ad. We don’t have enough samples with the TOP2A mutation to analyze the effect on transcription of co-occurrence of TOP2A with other mutations.

      We addressed the hypothesis of increased transcriptional rate or splicing aberrations by performed additional analyses of RNA-seq data to confirm splicing aberrations. Indeed we found splicing machinery genes down-regulated in the E948Q TOP2A glioma samples (Supplementary Fig.6).

      • There is no reference for the following argument "As the identified germline variants were exceptionally rare in the general population ... it is likely that these variants are pathogenic". I also find low number of references to support the suggested high frequency of altered genes in gliomas compare to other cancer types. I miss specific works relating TOP2 activity with transcriptional regulation.

      Ad. The appropriate references are provided to back-up these statements.

      • At several points in the text there are quantitative and comparative statements that should be backed up by the actual numbers (e.g. "The results of the targeted sequencing indicate a high frequency of altered genes", "The most altered gene was TP53, followed by IDH1...", "Other genes that were found to be frequently altered included KDM6B...", "These partial results combined with a low frequency of this variant in the Polish population suggest a somatic mutation"). The same thing applies to the co-occurrence of mutations, in which the percentage of co-occurrence and significance is not indicated. This lack of detail in the description is also observed in the description of the transcriptomic alterations in which no detail is provided regarding how many of the 105 analyzed samples correspond to low or high gliomas.

      Ad. We apologize that the frequencies of mutated genes were not specified. This information is included in the main text of the revised version. We now provide a gnomAD frequency for all variants of interest, confirming the low frequency in the population (AF__ __

      Regarding the total number of samples in the transcriptomic analysis, we provided an updated supplementary table covering also samples that were used for transcriptomic analyses (Supplementary Table 1).


      • For TOP2A mutation analysis, sometimes is not clear when the analysis is done with the 9 mutated samples and when with the 4 recurrent TOP2A E948Q variants. For example, in figure 2b and 2c analysis are done with 9 samples while the figure 2e is based on the 4 E948Q variants. At least this is what I have deduced from the main text, it should be clarified in the figure legend).

      Ad. This information has been included in the captions of Figure 2B, 2C and 2E and now we specify how many samples were used in each analysis.

      • Fig1. In figure 1b it would be interesting to color-code patients by glioma grade. This would also apply to Figure S1a, S1c, 2a, S3 and S4. In figure 1D it would be very informative to distinguish mutations that passed the quality control or not with different colors.

      Ad. Following reviewer’s suggestions, we have added this information, and oncoplot figures derived from the germline analysis have a distinct color for each glioma grade. In the figure 1D, all of the presented mutations have passed a quality control in terms of quality of sequencing. One additional criterion that was used for all genomic results (except some of the TOP2A variants) was a criterion of 20% variant penetration (20% of reads in the position had to come from the alternative allele). We corrected the description in the Supplementary Table to “passed 20% penetration criterion”. The rationale behind this criterion for TOP2A variants was a fact that for one of the E948Q samples it was ~13% and we didn’t want to lose this sample from the analysis due to rarity of the mutation.


      • Fig2. In figure 2b and 2c the statistical significance of differences between TOP2A and the rest of genotypes should be included. Looking at Figures 2d and 2e it looks surprising how similar is the overall survival of HIGH TOP2A mRNA expression (500 days, fig 2d) with the overall survival of the TOP2A WT samples (400 days, fig 2e). Here a I would include a graph that summarizes the TOP2A mRNA expression levels of each group in fig 2d and 2e.

      Ad. We agree that median overall survival is similar comparing patients with high TOP2A mRNA expression to TOP2A WT patients in our cohort. It is worth noting, however, that both datasets were produced using different library protocols, and the methodology is different, so it can’t be expected the levels to be equal. We think that adding two more graphs, as suggested, would add another layer of information to this section of the analysis. We have included two boxplots depicting TOP2A mRNA RPKMs, and it is clear now that the medians of High TOP2 mRNA and TOP2A mutant (E948Q) are more closely related, despite the fact that we only have a few patients with the mutation.

      • Fig3. It would be interesting to include the same simulation for the rest of TOP2A mutations as supplementary figure.

      Ad. We agree that the other TOP2A SNPs could potentially affect DNA binding. We focused on the recurrent mutation and did not analyze those occurring in a single patient. In the revised version we included predictions whether these variants could affect TOP2A DNA binding. For WT TOP2A and variants, we calculated the Gibbs free energy (ΔG). This information can be found in Supplementary Table 4. We have extended description in the Results section: “The TOP2A E948Q substitution may affect protein-DNA interactions”

      • Fig4 and Fig5. Include statistical analysis and dots representing individual replicates.

      Ad. For Fig 4 we have two replicates for two protein fragments, so we can’t present statistics now. As mentioned above we are preparing a new batch of proteins and will make more repetitions of EMSA and relaxation assays. For Fig 5. we have 3 replicates but despite a trend there is no statistical significance. We intent to make more replicates and a separate protein preparation. After including additional repetitions we will present the results as dots representing individual replicates.

      • Fig6. In Figure 6d I would increase the size differences in the dots representing the gene counts, as it is not easily perceived with current parameters.

      Ad. The dot size in Fig 6d did not reflect the true meaning. To make it easier to understand, we changed a plot type to a barplot, which now represents the number of differentially expressed genes involved in each pathway.

      • FigS2. In figure S2B, it would be informative to establish which dots are significatively above or below the diagonal.

      Ad. The purpose of this figure was to show which oncogenic signaling pathways from TCGA cohorts were affected in our cohort. The pathway's size is a variable that is used to normalize the calculation (shown in abscissa axis in S2B). RTK-RAS and NOTCH pathways contain hundreds of genes, whereas other pathways, such as the NRF2 oncogenic pathway, contains only a few. On the other hand, we counted how many genes in each pathway in our cohort were mutated (shown in ordinate axis, S2B). We used logarithms in both axes for visualization purposes, but this has no effect on the enrichment of these pathways, which is shown in the color-coded legend.

      • FigS3. How were the samples shown selected from the total?

      Ad. In this plot we show only somatic variants that were found in at least two different patients. We apologize that this information was missing, and we have added it to the figure's caption.

      • FigS4. I would include a line with the TOP2A mutation to have an idea of how these mutations are distributed between groups.

      Ad. Based on the feedback of the reviewer, this figure has been modified and improved. A new row has been added to the figure, displaying TOP2A mutations alongside other highly frequent mutations in other genes.

      Reviewer #2 (Significance (Required)):

      In this work authors have identified new mutations associated to gliomas by targeted exome sequencing using an important cohort of 182 samples. Among these new mutations epigenetic enzymes and modifiers are found. These results potentially increase the repertoire of putative molecular targets for future cancer therapies. Authors focus in mutations associated to TOP2A gene, that provides stronger DNA binding and DNA relaxation capacity in vitro. Although further characterization is needed, tumours harbouring this kind of mutations could show higher level of sensitivity to TOP2 drugs, providing potentially interesting clinical implications. Although the link between TOP2A expression and cancer prognosis is well established, the relevance of specific mutations in still largely unexplored.

      On one hand this work brings novelties in the field of Glioma providing a series of putative new players in the development of this type of cancer. Audience interested in basic or clinical aspects of these tumours would be a good target for this work. On the other hand, this putative gain-of-function mutation of TOP2A represent an interesting aspect for the DNA topology and topoisomerases field. Although, as stated above a more detailed biochemical and functional characterization would be required to draw the attention of this audience-

      Scientifically, I have experience in the DNA topology and topoisomerases field, 3D genome organization and gene regulation. I have no experience in Gliomas or any other clinical aspect of cancer, so it is difficult for me to properly establish the potential impact of the newly discovered mutations. Technically I have no capacity to critically evaluate the aspects related to the targeted exome sequencing and the suitability of the analysis performed for mutation identification.

      **Referee Cross-commenting**

      I fully agree with the comments of the other reviewer, which are perfectly aligned with my own regarding the preliminary nature of the conclusions about the biochemical and functional characterization of the TOP2A mutations.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      Provide a short summary of the findings and key conclusions (including methodology and model system(s) where appropriate). Please place your comments about significance in section 2. In this paper the authors used a targeted approach to identify rare mutations in a cohort of glioma patients. Using this approach they identified a recurrent mutation in the TOP2A gene encoding for Topoisomerase 2A, and suggest that this mutation creates a more effective protein, binding DNA strongly and maybe more enzymatically active. RNAseq analysis of TOP2Awt and TOP2Amut tumor samples suggest different transcription patterns and points to possible splicing defects. The most recurrent variant (E9448Q) is described in depth and some experimental information shows this variant might be a gain-of-function mutation.

      Major comments:

      • Are the key conclusions convincing? The validation of both the methodology and the presence of never described TOP2A variations in HGG is done quite successfully. Interesting evidence about relevance of the most frequent mutation is provided. However, besides having computational and biochemistry assays performed, lack of details about in vitro experiment statistics (no p-values are provided in figures 4 and 5, neither sample size, repetitions) weakens the conclusions claimed by the authors about the properties of the mutated topoisomerase.

      • Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether? Claims about E948Q variant function should be revised. Data is not presented in a convincing way, plus there is ambiguous language used from the results ("We conclude that the E9448Q TOP2A protein is functional, and MIGHT have a higher activity than the WT protein") to the rest of the paper where they strongly support the claims about the TOP2A activity.

      • Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation. In line with the presented data in the paper, additional experiments that show catalytic changes of the E9448Q variation must be added. It is shown that there are differences in the DNA binding capacity by EMSA compared to the WT form, however, the DNA supercoil relaxation activities is not that different, at least the way the results are presented. The authors suggest that TOP2A mutation is a driver mutation but no validation in vitro of this claim is shown. Can this mutation alone or in combination with e.g. tumor suppressors transform normal cells to cancer cells? Do cell lines expressing this mutation (compared to parental TOP2A wt expressing cells) display increased transcription? Increased invasion?

      • Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments. If the authors can complement the already presented in vitro experiments with additional ones supporting their hypothesis, this should be feasible. The authors can use patient derived glioma cells or glioma cell lines manipulated to express either the parental TOP2A wt enzyme or the identified mutated form.

      • Are the data and the methods presented in such a way that they can be reproduced? Yes, the authors provide a quite detailed explanation of the methods implemented to reach each one of the results they are presenting.

      • Are the experiments adequately replicated and statistical analysis adequate? No, there is no information about the statistical analysis or number of replicates in any of the in vitro experiments performed. This information should be added to the manuscript.

      Minor comments:

      • Specific experimental issues that are easily addressable.

      • Are prior studies referenced appropriately? Yes, authors clearly address the state of the art regarding previous NGS methodologies and let us know the advantages and novelty of their approach.

      • Are the text and figures clear and accurate? There are some discrepancies between the strength of the language used in different sections of the paper to refer the conclusions they can infer from the results they are showing. While they are all valid, authors should revise it.

      • Do you have suggestions that would help the authors improve the presentation of their data and conclusions? First of all, describe the statistical analysis used in every figure, include number of biological and technical replicates. I would also suggest to change the title or the scope of the discussion, there is too much focus on the TOP2A in the introduction, neglecting all the technical NGS work that actually lead to several new variants being described. This may be confusing when it collides with a conclusion that is heavily focused on the first half describing potential implications of at least another 3 proteins where genetic alterations were described. Given the fact there is not much experimental work that shows TOP2A mutations relevance in HGG or strong enough evidence of the variant's function I would suggest to change a bit the scope of the title.

      Significance

      • Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field. The authors describe a methodology that proved to be sensitive and specific enough in order to allow them to detect rare genetic alterations in patient glioma samples. This information could be valuable to describe new driver mutations or infer in genetic pathway alterations that could be potential therapeutic targets. As the authors state at the beginning of the paper, given the poor therapeutical approaches existing for HGG currently, information of this kind could still be highly useful and provide a better outcome to a specific cohort of patients.

      On a personal note, I think there is too much speculation about how TOP2A mutations could be interesting from a biological point of view (authors referred to evidence about implications of this mutation in other forms of cancer) but since no experimental validation is provided in glioma cells, it is difficult to conclude that this enzyme gain-of-function mutation could have a relevant role in HGG and thus make these variants a potential therapeutic target. There are no experiments conducted in glioma cells that express TOP2A variants, it would be interesting to see if it has an effect in the migratory/invasive phenotype like described in other cancer types or like it is suggested by analysis of the genetic pathways activated in the HGG patients samples harboring TOP2A mutation. In addition, there is no evidence of the TOP2A mutations possible role as a driver mutation, which is an interesting aspect that could be further explored from both a computational and an experimental approach.

      • Place the work in the context of the existing literature (provide references, where appropriate). The quality of the paper is high and in line with other studies in the literature that perform genome and transcriptome analysis of tumor samples. It is only the experimental validation that is lacking data supporting the "in silico" findings.

      • State what audience might be interested in and influenced by the reported findings. Computational biologists are the right audience to target this paper. If additional experimental work further validating their initial bioinformatic findings is added to the manuscript then probably a wider population could be targeted.

      • Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate. Brain tumors, immunotherapy, cancer stem cells, tumor microenvironment, tumor heterogeneity. I do not have sufficient expertise to evaluate the bioinformatic analysis and software/programs used to analyze the NGS data.

    1. SciScore for 10.1101/2021.07.21.21260691: (What is this?)

      Please note, not all rigor criteria are appropriate for all manuscripts.

      Table 1: Rigor

      <table><tr><td style="min-width:100px;margin-right:1em; border-right:1px solid lightgray; border-bottom:1px solid lightgray">Ethics</td><td style="min-width:100px;border-bottom:1px solid lightgray">IRB: This study was approved by the Ethics Committee of the University of Occupational and Environmental Health, Japan (reference No. R2-079 and R3-006).<br>Consent: Participants provided informed consent by completing a form on the survey website.</td></tr><tr><td style="min-width:100px;margin-right:1em; border-right:1px solid lightgray; border-bottom:1px solid lightgray">Sex as a biological variable</td><td style="min-width:100px;border-bottom:1px solid lightgray">Of the 27,036 remaining participants, data from 9,510 (5392 males and 4118 females) who stated they needed regular treatment or hospital visits were analyzed.</td></tr><tr><td style="min-width:100px;margin-right:1em; border-right:1px solid lightgray; border-bottom:1px solid lightgray">Randomization</td><td style="min-width:100px;border-bottom:1px solid lightgray">not detected.</td></tr><tr><td style="min-width:100px;margin-right:1em; border-right:1px solid lightgray; border-bottom:1px solid lightgray">Blinding</td><td style="min-width:100px;border-bottom:1px solid lightgray">not detected.</td></tr><tr><td style="min-width:100px;margin-right:1em; border-right:1px solid lightgray; border-bottom:1px solid lightgray">Power Analysis</td><td style="min-width:100px;border-bottom:1px solid lightgray">not detected.</td></tr></table>

      Table 2: Resources

      <table><tr><th style="min-width:100px;text-align:center; padding-top:4px;" colspan="2">Software and Algorithms</th></tr><tr><td style="min-width:100px;text=align:center">Sentences</td><td style="min-width:100px;text-align:center">Resources</td></tr><tr><td style="min-width:100px;vertical-align:top;border-bottom:1px solid lightgray">All analyses were conducted using Stata (Stata Statistical Software: Release 16; StataCorp LLC, TX, USA).</td><td style="min-width:100px;border-bottom:1px solid lightgray"><div style="margin-bottom:8px"><div>StataCorp</div><div>suggested: (Stata, RRID:SCR_012763)</div></div></td></tr></table>

      Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


      Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
      However, this study also had several limitations. First, because we conducted a cross-sectional study, causality could not be determined. However, since it is theoretically unlikely that treatment interruption experienced by an individual will increase the COVID-19 infection rate in a region, we think it is likely that high regional infection rates cause treatment interruption. Second, we did not identify workers’ reasons for discontinuing treatment in this study. As discussed above, there are various possible causes of treatment interruption, which may vary by region. Third, we did not inquire about the diseases being treated. Treatment interruption may vary depending on the presence or absence of symptoms and the potential disadvantages of discontinuing treatment for a particular disease.

      Results from TrialIdentifier: No clinical trial numbers were referenced.


      Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


      Results from JetFighter: We did not find any issues relating to colormaps.


      Results from rtransparent:
      • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
      • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
      • No protocol registration statement was detected.

      Results from scite Reference Check: We found no unreliable references.


      <footer>

      About SciScore

      SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.

      </footer>

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We are grateful for the constructive and highly supportive reviews provided by our Reviewers. We especially appreciate the efforts they have made to provide suggestions on how to make our revised manuscript even more robust. We have incorporated many of these suggestions into the revised manuscript that will post to Biorxiv and will be submitted to an affiliate journal. We have provided point-by-point responses to each Reviewer below each item (starting with Response: …), along with any changes made in response to that comment/suggestion (starting with In our revised manuscript, …).

      Finally, we agree with all Reviewers that this work should be of broad interest to the molecular biology, cell biology, and parasitology communities. Our discovery that Plasmodium and two related genera have taken the unorthodox approach of duplicating their NOT1 protein, and that Plasmodium has dedicated it for its unique transmission strategy, is a fascinating adaptation of the use of this core eukaryotic complex. We believe that those that focus on diverse aspects of RNA biology, including RNA preservation/decay, the maternal to zygotic transition, translational repression, and beyond will find this work to be of interest and relevant to their own research questions.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The manuscript „The Plasmodium NOT1-G paralogue acts as an essential nexus for sexual stage maturation and parasite transmission" investigates the two forms of NOT1 in rodent malaria parasites. The authors found out that the original NOT1 is crucial for gametocyte induction as well as transmission to the mosquito, they therefore renamed it NOT1-G. The paralogous proteins, on the other hand, appears to be crucial for intraerythrocytic growth, since it cannot be knocked out. The authors then investigated NOT1-G in more detail, using standard phenotyping assays. They found a slightly increased gametocytemia and a minor effect on transmission to the mosquito.

      Response: In our submitted manuscript, we do focus on PyNOT1-G because of the exciting role it has for both sexes of gametocytes, which results in a complete defect in transmission to mosquitoes. Our investigations of what domains of PyNOT1-G focused on the most likely suspect: the putative tristetraprolin-binding domain (TTPbd). It was through deletion of this domain that we observed only a minor defect in the prevalence of infection of mosquitoes, indicating that the portion of PyNOT1-G that is required for transmission lies elsewhere (in part or in total). It is also important to correct Reviewer 1’s statement regarding the other (perhaps canonical) PyNOT1. To our surprise, PyNOT1 could be deleted, but resulted in a parasite that has an extreme fitness cost and a very slow growth phenotype. This is in stark contrast to other eukaryotes, where NOT1 is essential.

      Reviewer #1 (Significance (Required)):

      If the authors are able to provide convincing data that NOT1-G is indeed important for gametocyte induction and transmission to the mosquito, then the report would be of high significance for the malaria and molecular cell biology fields.

      Response**: We have in fact shown this and more in the originally submitted manuscript, and thus we are grateful that Reviewer 1 considers this work to be of high significance in a broad readership (molecular and cell biology, parasitology). In our revised manuscript, we have added text throughout to make these results even more apparent and clear for the reader.

      My expertise: molecular cell biology of gametocytes, translational regulation, parasite transmission

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      **Summary**

      The manuscript by Hart et al. builds upon a fascinating finding presented in a previous manuscript by the same authors, in which they show that CCR4 seems to be able to associate with two members of the NOT1 family. In this work, the authors first re-annotate the two NOT1 paralogs in Plasmodium yoelii and then perform an in depth characterization of the role of NOT1-G during gametocytogenesis and early mosquito development. Using gene knockout and different genetic crosses, the authors show that NOT1-G is essential for male gametocyte development and leads to an arrest of development in zygotes arising from female gametocytes. Using RNA-seq the authors show that NOT1-G leads to lower transcript abundances, leading to the hypothesis that NOT1-G might be involved in preserving mRNAs in a larger RNA-binding complex. Lastly, the authors characterize a NOT1-G defining TPP domain and find that it is not essential for either male/female phenotype observed for the whole gene KO.

      Response**: We appreciate the concise and accurate summary of these findings.

      **Major comments:**

      • Are the key conclusions convincing?

        The phenotypic characterization of NOT1-G during gametocytogenesis / early mosquito development is nicely presented and the experiments are well performed. Because a duplication of NOT1 with possibly opposing roles of the paralogs is a very unique feature with broad implication on RNA metabolism, it would have been great to see two select experiments on the molecular level adding evidence that 1) NOT1/NOT1-G are mutually exclusive in a complex with CCR4/CAF1 and 2) NOT1-G acts post-transcriptionally in an antagonistic way to NOT1 (i.e. as a mRNA 'stabilizer' as proposed by the authors).

      Response**: We agree that inclusion of those two aspects would make for a more complete story about these two NOT1 paralogues.

      First, we also think that it is highly likely that NOT1 and NOT1-G are mutually exclusive, as in other eukaryotes NOT1 acts as a scaffold protein upon which effector proteins bind and bridging interactions are made. In our original manuscript, we did not include a mention of our previous attempts to address this question through colocalization and proteomic approaches, as they were largely unsuccessful. Specifically, we generated rabbit polyclonal antisera to PyNOT1-G’s tristetraprolin-binding domain but it did not pass our rigorous quality control (e.g. too much staining persisted in pynot1-g- parasites). Using both asexual and sexual blood stage parasites, we also attempted immunoprecipitation (with and without chemical crosslinking) and proximal labeling approaches via BioID and TurboID but all approaches did not produce rigorous results and thus we did not report them in our original manuscript. However, this question of whether the two NOT1 paralogues were mutually exclusive in complexes was also taken up by the Bozdech Laboratory in their 2020 preprint (Liu et al.) where they were able to capture the P. falciparum NOT1-G and NOT1 proteins (called Not1.1 and Not1.2 in that work). While their proteomic evidence showed that they could capture these bait proteins and that the NOT1 paralogues were not in the same complex, these results should be taken with a grain of salt: all mass spectrometry-based proteomic approaches are limited in that an absence of evidence does not mean that the protein is not present/interacting. Moreover, these efforts only identified a few other proteins that were already known to interact with the CAF1/CCR4/NOT complex, but even so, they did not use statistically rigorous methods in an attempt to quantify these results. In our revised our manuscript, we have included additional text to describe our unsuccessful efforts to do these capture proteomics experiments, and we have expanded our discussion of the Liu et al findings that provide some evidence in support of a mutually exclusive complex.

      Second, we also hypothesize that PyNOT1-G acts post-transcriptionally to affect mRNA abundance and translation. However, it is important to emphasize that NOT1 proteins typically act as scaffolds, with the recruited effector proteins acting to hasten the degradation and/or to preserve associated transcripts. We believe that studying these effector proteins is the next important effort to undertake. In fact, we hypothesized that these antagonistic effector proteins would be analogous to TTP and ELAV/HuR-family proteins as are found in other eukaryotes, and that the critical interaction with PyNOT1-G would be via its putative TTP-binding domain. It was for that reason that we interrogated the TTP-binding domain itself, and were surprised that its deletion did not phenocopy the complete gene deletion. Ongoing work will be focused on identifying these antagonistic effector proteins that likely are expressed in a stage-enriched manner, and to define how they interact with PyNOT1-G in order to direct specific mRNAs to their fates. Additionally, it would be very important and exciting to directly test if PyNOT1 and PyNOT1-G are functionally opposed. However, this would be exceptionally challenging to study from a technical standpoint. While we were able to delete the pynot1 gene after many repeated attempts, these parasites are very sickly and grow very slowly. Because of this, we believe that assessing direct versus indirect effects of PyNOT1 in these cells would not be feasible or robust. Given this, comparing functions between PyNOT1 and PyNOT1-G could not be done in a conclusive manner.** In our revised manuscript, we have expanded our descriptions of the mechanisms by which we believe PyNOT1-G and its complex affects mRNA fates. In particular, we have expanded our Discussion section to incorporate the results that indicate that the TTP-binding domain is not required for the essential functions of PyNOT1-G.

      • Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

        The authors describe the role of NOT1-G as 'preserving' mRNA. The lower abundance of many transcripts in the NOT1-G knockout suggest this, but experimental proof is not provided (see suggestions below). Maybe rephrase to 'putatively preserved/stabilized' or 'has a potentially stabilizing function'. The same is true for the mutually exclusive association of the two paralogs with CCR4/CAF1. The authors refer to a protein co-IP of CCR4 showing that CCR4 can interact with both NOT1 and NOT1-G, but a reciprocal experiment is lacking.

      Response**: In our first publication on the deadenylase members of this complex, we also saw a similar effect on specific mRNAs when pyccr4-1 was deleted: the abundance of specific mRNAs went up in pyccr4-1- parasites. In that work and here in this manuscript, we have carefully decided to apply the word “preserved” to the fate of these mRNAs as it describes in a general way what is happening. In order to robustly state that mRNAs are stabilized by PyNOT1-G (directly or indirectly) would require additional experiments designed to test this (more description on this is provided on a response below). Second, as described above, we agree that doing a reciprocal IP for mass spectrometry-based proteomics would be ideal, we attempted four different approaches to do this to no avail. However, the composite proteomics data that is already available in the literature and via the Liu et al. preprint from the Bozdech Lab all indicate that these interactions occur, and perhaps that NOT1 and NOT1-G are mutually exclusive as expected. In our revised manuscript, we have provided further explanation in the Discussion for our use of the descriptor “preserve” instead of “stabilize”, and as noted above, and we have expanded our Discussion to more comprehensively define the interaction network depicted in Figure 7.

      In both cases, the conclusions of the authors are very likely (e.g. downregulation of many genes as seen by RNA-seq), but the final experimental evidence is not provided and a network such as in Figure 7 is not fully supported. If the authors would like to maintain these statements, then they should be rephrased and made clear or the additional experimental evidence suggested below is necessary.

      Response**: We hold that the published proteomic datasets do support such a network, with further support offered from the preliminary proteomic evidence from the Liu et al preprint. Therefore, we have not modified our manuscript beyond the additional text now provided in the Discussion as noted above.

      • Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

        The essential claim that NOT1-G is important for gametocytogenesis and early mosquito development is well presented and fully supported by the experiments. As for the role of NOT1-G in 'preserving' mRNA, an mRNA half-life experiment would be necessary (or the text should be adjusted as mentioned above). In a short-term in vitro culture, pynot1-g- and WT parasites could be treated with ActD and abundances of select transcripts are measured by RT-qPCR.

      Response**: We appreciate that Reviewer 2 considers the rigor of our experiments to be high. Regarding the use of the term “preserve” vs “stabilize”, we agree that to shift from our more general descriptor (preserve) to one that has specific connotations (stabilize) would require additional experimentation. To correctly and most robustly make the claim of stabilization would require work on par with that done by Painter et al. (PMID: 29985403) that uses a thiol-containing nucleotide (4-TU) along with a yeast-derived fusion enzyme (yFCU) to convert it for use by Plasmodium. Previously we have shown that an associated deadenylase (PyCCR4-1) also acted to preserve mRNAs, and moreover that deletion of its gene resulted in no discernable effect upon the poly(A) tail or 3’ UTR of an mRNA that is bound by this complex (p28).

      While understanding mRNA stability is an exciting area of study, this 4-TU labeling experiment alone warranted a standalone, high impact publication for Painter et al. As this has not been adapted for any rodent-infectious Plasmodium species to date, and as adaptation of this labeling approach took several years for Dr. Painter while in the Llinas Laboratory (personal communication), we believe this work is beyond the scope of this study. Moreover, the additional information that it would provide to understand NOT1-g functions (preserve vs stabilize) would be incremental beyond the major storyline presented in this manuscript. In our revised manuscript, we have added text to ensure that our choice of “preserve” is well defined and explained.

      To support the idea that NOT-1 and NOT1-G associate in a mutually exclusive way or to just show that they act in distinct complexes despite their similar expression patterns, an IFA with a double stained NOT1/NOT-1G cell line could be performed. Alternatively, the authors could perform a protein co-IP using the already existing NOT1/NOT1-G-GFP cell line and show that the proteins don't interact with each other or even have certain distinct interaction partners.

      Response**: We agree, and these studies were attempted but were unsuccessful (described in our responses above). In our revised manuscript, we have included this information as noted above.

      • Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

        All necessary cell lines for a NOT1/NOT1-G co-IP and the ActD experiment are already present. The authors already present a ring to schizont in vitro culture (for ActD) and also have substantial experience in protein co-IP and proteomics.

        I am not sure about the cost for a proteomics experiment at the author's institute and I don't want to make a guess on time investment given the still on-going COVID situation.

      Response**: We agree that these experiments would be interesting, and would be costly to do at a transcriptome-wide scale and would require substantial time to conduct. We believe that the 4-TU approach noted above is the most rigorous, but is well beyond the scope of this study as it has not yet been adapted to rodent-infectious malaria parasites. As noted above, we have attempted four different proteomics approaches to provide reciprocal evidence for the complex composition which were unsuccessful. In our revised manuscript, we have added text to ensure that our choice of “preserve” is well defined and explained, and have noted the unsuccessful reciprocal proteomics approaches.

      • Are the data and the methods presented in such a way that they can be reproduced?

        The MM section is well structured and presented and the supplemental material includes all data.

      Response**: Thank you. We want to ensure that our work is clearly described and can be reproduced with the information reported.

      • Are the experiments adequately replicated and statistical analysis adequate?

        There is hardly any test of significance presented in the main text of the manuscript (e.g. Figure 3B and 4A). Please show the individual data points for these graphs and make sure the n= and the statistical test is described in the figure legend. If you use the term significant in the text, then just add the p-value behind it. This is also true for the RNA-seq data: Genes are sorted by fold-changes, leaving it unclear if these changes are significant. These data are however presented in Table S1 and could be incorporated in the main text.

      Response**: We agree. In our revised manuscript, we have incorporated additional details about the statistical tests used, p-values for noteworthy comparisons, and have included more panels for our comparative RNA-seq datasets (heatmap, PCA, MA plots). We have also made adjustments to our plots to make individual data points more readily observed, especially when error bars may block them (e.g. Figure 3B). And as in the original submission, all of the pertinent values, including fold changes, statistics and more are provided in our comprehensive supplementary files. We have structured the Supplementary Tables to flow from one tab to the next with the filtering/threshold applied noted both in the tab name and in the README tab that is found first among the tabs.

      **Minor comments:**

      • Specific experimental issues that are easily addressable.

        One idea that is also not discussed but could be added is for example that NOT1-G itself doesn't even have a stabilizing effect itself, but act as a decoy for other components of the CCR4/Caf1 complex, keeping them from associating with NOT1. In the NOT1-G knockout, the decrease in RNA abundance might then be just a result of an 'overactivity' of CCR4/Caf1/NOT1.

      Response**: This hypothesis proposed by Reviewer 2, that PyNOT1-G is acting as a decoy or a binding partner sponge, is certainly feasible. For this scenario to be effective, PyNOT1-G would need to be in excess of PyNOT1 and/or would need to be able to bind to the critical effector protein(s) better than does PyNOT1. However, our microscopy data, along with the transcriptomic data presented here and previously published proteomic data would indicate that these two gene products are in approximately balanced proportions and are similarly localized. This does not exclude the possibility that PyNOT1-G could act as a sponge for relevant binding partners. In our revised manuscript, we have raised this possibility as an alternate explanation for the phenotype in the Discussion section.

      • Are prior studies referenced appropriately?

        Throughout the manuscript, the authors should make clear what results come from which organism. Just as an example, the genome wide KO screens were performed in P. berghei and P. falciparum, CCR4/CAF1 experiments were performed in P. yoelii, whereas the original DDX6 work was done in P. berghei.

      Response**: We agree. In our revised manuscript, we have added additional text to further clarify what data comes from which Plasmodium species.

      • Are the text and figures clear and accurate?

        The Introduction is a bit long and partially turns into a minireview of eukaryotic RNA degradation. In the main text on page 13, the authors introduce a model for proteins involved in translational repression. This in not fully accurate, since for many of the proteins in this network, an effect on translation has actually not been shown. This includes NOT1-G characterized in the present work that most likely has an effect on mRNA stability, but for which a role in regulating translation is not presented.

      Response**: We believe the length and content of this Introduction is appropriate to provide the context that some readers outside of the parasitology field will need to appreciate these findings. Regarding designations for these proteins as being related to translational repression, we think that the ample proteomic evidence tying them to translationally repressive complexes warrants this. In our revised manuscript, we have made it more clear that these proteins themselves have not been directly implicated in translational repression.

      • Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

        Overall the RNA-seq is underrepresented and Figure 5 could easily be expanded by adding several panels that would help the future reader getting a better idea of the data:

      1. Summary graphs such as PCA/MDS plots of the different replicates and MA-plots (all of which can be easily generated in DESeq2)
      2. Heatmaps comparing the expression patterns of pynot1-g-, pbdozi-, pbcith-, pyalba4- highlighting some key gametocyte genes mentioned in the text
      3. Alternatively to 2., a simple Venn Diagram would already be very informative

        An informative representation might also be to sort the differentially expressed genes as predominant male and/or female. The P. berghei data by Yeoh et al (PMID: 28923023) could be a starting point.

      Response**: We agree. In our revised manuscript, we have expanded Figure 5 to include additional plots that speak the rigor of these datasets. Specifically, we have added a comprehensive heatmap and PCA plots, as well as MA plots as recommended. We have chosen not to include a Venn diagram for the overlap of affected mRNAs across these transgenic parasite lines, as we hold that this information is best provided in the text (high level observations) and the Supplement (details).

      Reviewer #2 (Significance (Required)):

      **Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.**

      Technically this manuscript builds on standard methods of the field that are well executed. There is no direct clinical advancement, although one might argue that a unique adaptation of the parasite could always be a novel therapeutic target. Conceptually this is great advancement for the parasitology field as it is, providing additional evidence for the importance of post-transcriptional regulation for parasite transmission. With the two experiments suggested above and the additional evidence gained from it, this manuscript could also gain great interest to readers outside the field by clearly showing how alternative ways to regulate RNA stability evolved.

      Response**: We are grateful for your careful review of our work and for the recommendations that you provided. We have incorporated many of them into the revised manuscript to make it even more rigorous and comprehensive. We also appreciate hearing that this work would be of great interest to a broader community. We feel that this is already the case, as the duplication of NOT1 and the dedication of one paralogue to an essential function is exciting and novel among eukaryotes.

      **Place the work in the context of the existing literature (provide references, where appropriate)**

      The work builds on the early reports of the particular RNA metabolism in gametocytes performed in the groups of Andy Waters. Since then, the authors themselves have published a great set of manuscripts extending our knowledge of the proteins involved in gametocytogenesis and nicely place the current work into this framework.

      Response**: We appreciate this positive feedback. This is a fascinating topic to study.

      **State what audience might be interested in and influenced by the reported findings.**

      The manuscript as it stands is particularly interesting for the parasitology and potentially the evolutionary biology field. For a broader readership for example in the RNA field, the possibly antagonistic roles and mutually exclusive association with CAF1/CCR4 are likely most interesting.

      Response**: We agree that this should be interesting to readers beyond our own field, as the duplication and specialization of NOT1, and the finding that the “canonical” PyNOT1 can be deleted, are both of general interest to how eukaryotes have adapted and deployed a highly conserved and essential RNA metabolic complex.

      **Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.**

      **Expertise:**

      RNA biology, Plasmodium falciparum, Bioinformatics

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, the authors investigate the requirement of two possible Not1 paralogs for the development of asexual blood stages and for the sexual transmission stages of Plasmodium yoelii. While Not1 is critical for asexual blood stages, its putative paralog, Not1G is important for the development of sexual transmission stages. In the absence of Not1G, male gametes are not formed while female gametes are formed and can be fertilised by wt male gametes. However, the resulting zygote cannot develop further into ookinete. The in vitro genetic cross assay to show this is elegant! A transcriptomic analysis further indicates that the transcriptomes of Not1G deficient parasites are significantly different from their WT counterpart.

      Response**: We are thrilled that you found our evidence and approaches to be rigorous and compelling. Thank you.

      **Major comments:**

      The discussion section is very nice and the authors describe well what is speculative and should be further confirmed by additional experiments. However, I did find this was not the case in the results section where the authors are proposing conclusions that are not supported by the results. I think the reading of this manuscript would be much more enjoyable if the authors only describe the results shown and move all the discussions to the dedicated section. Below are some examples. The data presented in this manuscript is not showing a nexus, this is a suggestion based on the results of other articles, the word should thus be removed from the title (and kept for a future review!). The last two sentences of the localisation section should be moved to the discussion because they do refer to results not shown in this manuscript. The last sentence of the second paragraph of the zygote development section should also be moved to the discussion. For the transcriptomic analysis there is also no formal comparison with transcriptomes of other previously analysed mutants: the results of the comparisons should either be shown or not discussed in the result section. Finally, the discussions mentioning interactors of the complex should be removed from the result section and moved to the discussion unless the results are formally analysed.

      Response**: We again thank you for the complement. In our original manuscript, we opted to provide some limited interpretations and context within the Results section in order to help guide readers along our train-of-thought and line-of-experimentation. While a more traditional split of keeping essentially all discussion and interpretation for the Discussion is a tried-and-true approach, we prefer this more narrative method and have opted to keep these short sections in the Results section.

      I would strongly suggest the author the better present and describe their transcriptomic results. There is only one volcano plot indicating the overall defect in mixed gametocytes in the main figure. Apart from this, the results are only described in the main text or in supplementary tables. It is therefore difficult to understand the subtilities of the analysis. For example, the authors frequently mention dysregulated genes, but without specifying whether it is up or down-regulated in the mutant. To address this issue, I would suggest the authors to better describe their results in the figures. They could show the GO term enrichment analysis they mention and show how they assign GO term or transcripts to male and female parasites. It would also be nice to discuss some of the results a bit more in details. For example, it is not surprising to see a reduction in transcripts that are under the control of AP2-O in retort-arrested ookinetes as the parasite do not reach this stage. It is thus highly speculative to specifically link this observation with ALBA4 without further detailed analysis. On the other hand, it is more surprising to see a decrease in ap2g transcripts, while the authors observe an increased gametocytaemia. Could the authors comment this observation? It may also be nice to better present the comparison between gametocytes and schizonts to possibly speculate on the early requirement of Not1G in committed schizonts.

      Response**: We (and Reviewer 2) agree. In our revised manuscript, we have expanded Figure 5 to include additional plots that speak the rigor of these datasets. Specifically, we have added a heatmap, and PCA and MA plots as recommended. We have chosen not to include a Venn diagrams for the overlap of affected mRNAs across these transgenic parasite lines for the reasons stated above in our response to Reviewer 2. Similarly, we have opted to keep the specifics of the GO Term analyses in the Supplement as we believe these should always be taken with a grain of salt (especially high level GO Terms, as many choose to report). Finally, we have expanded our discussion on our observation that pyapiap2-g transcript levels are lower in the pynot1-g- line, despite seeing a slight increase in gametocytemia.

      The conclusion regarding the similar localisation of Not1 and Not1G with other members of the CAF1/CCR4/NOT complex is not really convincing for two reasons. First, there is not colocalization shown and, second, the distribution is not very peculiar so it is difficult to draw any conclusion with this level of resolution. The presence of alpha-tubulin in the nucleus of male gametocytes is also very surprising as it is rather nucleus-excluded in both P. falciparum and P. berghei, could the authors comment this peculiar localisation?

      Response**: We agree and disagree here. First, we agree that no colocalization data is presented here to place NOT1-G within the limit of resolution of fluorescence microscopy. What we can (and do) state is that these proteins are all localized to cytosolic puncta, which matches what is observed for essentially all other studied eukaryotes. In further support of this, our published, quantitative proteomic data indicates that the bioinformatically predictable members of the CAF1/CCR4/NOT complex do associate as anticipated. In the same vein, the micrographs presented were not captured by confocal microscopy, and thus the apparent localization of alpha tubulin “in” the nucleus is most likely attributed to being above and/or below the nucleus. Taken together, we do feel that the combined evidence is convincing. As we have already made all of these points in the original manuscript, we have not adjusted the revised manuscript further.

      One of my major frustration when reading this manuscript was that the authors are not trying to discriminate between an early role of Not1G during gametocytogenesis or later in gametogenesis. The fact that the transcriptomes of gametocytes and schizonts seem to show similarities suggests that the phenotype observed during both male gametogenesis or ookinete development are probably linked to early knock-on defects during gametocytogenesis. Could the authors test whether male gametocytes replicate DNA or female activate translation? These are of course non-essential experiments as the authors are careful with their conclusions and mention possible defects during both gametocytogenesis or gametogenesis. Addressing this question may however add significant insights into the requirement for Not1G.

      Response**: We are sorry for the frustration. We wrote the manuscript so as to state what we feel we could robustly say, and where we are drawn to speculate, we made that speculation clear. As Reviewer 3 notes, we have not attempted to discriminate between functions that PyNOT1-G may be playing in different stages or substages of development because we do not believe the experiments allow that discrimination. While we could investigate finer and finer aspects of possible defects in both male and female gametocyte development, the most impactful take home messages remain the same. We continue to address questions related to translational repression and its release, and anticipate that PyNOT1-G will play a substantial and essential role in this. As Reviewer 3 noted, we have already discussed these possibilities in the original manuscript, and thus have not added anything further about this in our revised manuscript.

      **Minor comments:**

      Please use page and line numbering for your next submissions! Please describe what "bioinformatics" was used. I would show the nice localisation in oocyst and sporozoite in the main section. The conclusions drawn from the genetic cross seem to come from a single biological replicate, if this is the case please indicate it clearly.

      Response**: We apologize for these oversights. In our revised manuscript, we have provided page and line numbering, have expanded on what bioinformatic processes were done in the manuscript, and have made it more clear that the genetic crosses come from multiple biological replicates (biological triplicate for the transmission-based genetic cross, biological duplicate for the in vitro culture genetic cross). However, we have opted to retain the oocyst and sporozoite IFA data in the Supplement, as the rest of the story is focused on blood stage and early mosquito stage.

      Reviewer #3 (Significance (Required)):

      This manuscript highlights the requirement of a Not1 paralog in the transmission stages of a Plasmodium parasite. More specifically it describes a new player in the control of RNA biology during this process where our knowledge is scarce. It will be a valuable manuscript for molecular parasitologists interested in transmission or RNA biology.

      Response**: We agree and are grateful that our colleagues find this study to be a valuable addition in our efforts to understand how malaria parasites have adapted classic eukaryotic mechanisms to suit their purposes.

      Our expertise is largely in molecular and cellular parasitology.

    1. Taboo Tradeoffs and Protected Values:

      I think this is a framework that could use more emphasis. It’s one I am cueing into more after Caviola et al. (2021) included it in their review, “the psychology of (in)effective giving.”

      People have a strong aversion to prioritizing some lives over others (see Tetlock et al., 2003, "Thinking the unthinkable: sacred values and taboo cognitions"). With limited resources, we of course do this all the time. But CBA makes it uncomfortably explicit. To prioritize some recipients as a result of CBA means to deprioritize others, which feels unfair. This is one explanation for why people prefer “distributed helping” when there are multiple possible recipients, even at the expense of helping more, since then at least no one is fully deprioritized (Caviola et al. 2020a, obstacle 5; Sharps & Schroder, 2019, “The Preference for Distributed Helping”). This could also be an explanation for Berman et al., 2018 finding that people prefer to prioritize investments rather than charities, since deprioritizing an investment isn’t nearly as aversive.

      A moral aversion to (de)prioritization may also explain social judgments of people who donate effectively seeming “cold” (section 7.1). This is evidenced by the differences in instinctive judgments of “coldness” based on what is deprioritized. For example, deprioritizing investing in textbooks because it isn’t an effective intervention feels much different than deprioritizing investment in childhood cancer treatment because one could help more kids dying of malaria. People would likely make harsher judgments about someone doing the latter even though the reasoning is the same – it’s what is deprioritized that is different.

      There might also be something else at play related to ‘CBA’ discomfort: choosing whom to help makes it clear to individuals that they can’t help everyone. It reminds people of all the suffering in the world that they can’t alleviate, whereas just choosing a neat charity only introduces one cause of suffering and then gives the donor the satisfaction that they have done something to alleviate it. I can imagine that CBAs role in revealing the reality of triage (1) makes people less inclined to engage in CBA and (2) less likely to donate a lot in accordance with CBA because there is less warm glow/ that one cause just isn’t as sexy anymore. (2) is related to the idea of Pseudoineffecay developed in (Slovic, 2007; Västfjäll et al., 2015). People are less inclined to help when they learn about others they can’t help.

      A key idea that I think is relevant here is the “affect heuristic,” the importance of instinctive emotional cues of “goodness” or “badness” informing decisions (LINK). Deprioritization of emotional cause → instinctively violates moral value → aversion → less likely to engage with, worse social . Similarly, reminder of all the suffering in the world → feeling of sadness + helplessness → avoidant behavior. These oversimplified decision pathways can be overruled by rational, deliberate processing (see Tetlock, 2003 for discussion specific to sacred values) , but charitable giving is largely a system 1/emotional arena.

    1. ask

      I feel this is one of the most important parts of documentation and observation. I feel we should give children the space to think about what they make to give them a sense of ownership and accomplishment. If we constantly share our ideas, the child may not see theirs as valid.

    2. This is a risk weoften take when working with children. Even if we arenot conscious of it, we face this dilemma every daybecause of our own pre-conceived notions and theo-ries. I believe that we can choose to offer topics for thechildren’s consideration as long as we are aware of

      I think that awareness is key for being able to teach within a context we as adults may be familiar with. In this particular experience there was almost a system of checks and balances to make sure the students hypotheses and ideas stayed at the forefront of their research.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank all four reviewers for their positive and constructive comments! We have carefully considered these comments and provided a point-by-point response below.

      Reviewer #1 (Evidence, reproducibility and clarity):

      This paper explores an interesting problem of SHP1/SHP2 preferences of inhibitory immunoreceptors. The author are quick to point out that many of their individual data points confirm published results at some level, but the power of the paper is in the parallel analysis of both PD1, which is strongly biased towards SHP2 and BTLA, which is biased towards SHP1. This gives them the opportunity to test the predictions of descriptive experiment by making simple mutated receptors with swapped ITIM or ITSM domains.

      The work is very well done and generally the authors are quite careful and precise about the language used to describe results, in general.

      The results are quite striking in that the find plenty of evidence for transient interaction of SHP1 with PD1 based on the biophysical measurements, but don't detect the interactions in pull down or in "in cell" microcluster recruitment experiments. In describing the pull-downs they discuss the issue of dissociation during washing potentially missing interactions that are taking place. I would prefer that the pull down is fine evidence for binding, but lack of pull down is not evidence for lack of binding. They should double check that this language is consistent. Also, unless something has changed in the microcluster binding experiments, this in situ recruitment of SHP2 to PD1 is only observed or a 2-3 minutes and then can't be detected, the situation for SHP2 becoming the same as it is for SHP1. If the kinetics are different in the cleaner systems that have now developed they should show this in a primary figure as this would be then different when what is reported previously.

      We agree with the reviewer that pull down is evidence for binding. Indeed, in most, if not all of our assays, our results with pull down were consistent with those in the microcluster imaging. As suggested by the reviewer, we will check through the manuscript and ensure the language is accurate and consistent. In our recent study (Xu et al., JCB, 2020, PMID: 32437509), we conducted a side-by-side comparison of SHP2 and SHP1 recruitment kinetics to PD-1 in a similar system as the current study. Both microcluster imaging and co-IP assays showed that PD-1:SHP2 association lasted at least 10 minutes, whereas PD-1:SHP1 recruitment was nearly undetectable. The duration of PD- 1:SHP2 association was in good agreement with Takashi Saito’s finding in CD4+ mouse T cells (Yokosuka et al., JEM, 2012, PMID: 22641383). Regardless the somewhat different kinetics in different studies, SHP2 recruitment was transient, as pointed out by the reviewer. We believe that some other effectors contribute to PD-1 inhibitory signaling. In supportive of this notion, we recently found that PD-1 remains partially inhibitory in CD8+ T cells deficient in both SHP1 and SHP2 (Xu et al., JCB, 2020).

      The gap in this study is lack of any functional analysis. The Jurkat model could be quite useful as they have a relatively clean system for asking if the transient binding of SHP1 to PD1 has any functional impact, which they have not yet followed through on. Does PD-1 recruited SHP2 have any impact on function after the 5 minutes? Furthermore, the authors need to keep in mind that mice deficient in SHP2 respond to anti-PD1 checkpoint therapies (Rota, G., Niogret, C., Dang, A. T., Barros, C. R., Fonta, N. P., Alfei, F., Morgado, L., Zehn, D., Birchmeier, W., Vivier, E., & Guarda, G. (2018). Shp-2 Is Dispensable for Establishing T Cell Exhaustion and for PD-1 Signaling In Vivo. Cell Rep, 23(1), 39-49. https://doi.org/10.1016/j.celrep.2018.03.026). This is an important issue to discuss in light the the very interesting binding analysis the authors have performed. But I think the functional analysis can be part of a future paper.

      In our recent publication (Xu et al. JCB, 2020, PMID: 32437509), we found that deletion of SHP1 from Jurkat cells had little, if any effect on PD-1 mediated suppression of IL-2 production. As the reviewer alluded to, we did observe SHP2 dissociation from PD-1 after 10 minutes, so the question of whether and how PD-1:SHP2 complex influence T cell function in a longer term is a great one. We currently are pursuing a hypothesis that there is a SHP2-independent mechanism of PD-1 inhibitory function, and indeed, in our recent study (Xu et al. JCB, 2020, PMID: 32437509), we found that PD-1 retains its partial inhibitory function in SHP1/SHP2 double knockout murine primary T cells. These results are consistent with the in vivo data by Rota et al. cited by the reviewer. We will also briefly discuss this point in a revised manuscript.

      I would suggest that the title be modified slightly from "SHP1/SHP2 discrimination" to "differential SHP1/SHP2 interaction" and leave discussion of discrimination until they have the functional data integrated over times that are relevant to T cell transcriptional regulation (1-2 hrs). The functional analysis can be in another paper, but it would be interesting to have a paragraph in the discussion raising the outstanding issues beyond stable binding detected by the pull-down and microcluster recruitment experiments- what are the implications for function. Could the transient interactions in the noise of the steady state and equilibrium measurements be functional?

      We thank the reviewer for the suggestion, even though reviewer #3 felt that our current title is appropriate. We will be happy to change the title at the editors’ discretion.

      I would summarise that the work is outstanding as biochemistry and biophysics and it should be published nearly as is. I'm suggesting minor revisions in that the changes are just to text, but I think this is important and somewhat nuanced aspect of the paper that will make it even more helpful to readers.

      We appreciate the positive and insightful comments!

      Reviewer #1 (Significance):

      The authors generate a detailed descriptive data set about the component interaction of SHP1 and SHP2 SH2 domains with PD1 and BTLA intracellular domains. They then test hypotheses generated from the descriptive data set to better define the nature of the interactions and why PD1 recruits primarily SHP2, while BTLA mainly recruits SHP1. PD1 is a major driver or the cancer immunotherapy revolution and SHP2 is the major candidate for a signalling effector of PD1. This paper can become the reference paper for the specificity and engineering of this interaction, which will make it highly significant in a very active and still expanding field.

      Referee Cross-commenting

      I still feel that "discrimination" has a functional/activity connotation that is not addressed at all in this paper, but can be addressed. I'm happy to have the suggestion stand and let the authors decide. They need to live with it once its published. Another suggestion- the citations on regulation are mostly old. A good recent paper is Pádua, R. A. P., Sun, Y., Marko, I., Pitsawong, W., Stiller, J. B., Otten, R., & Kern, D. (2018). Mechanism of activating mutations and allosteric drug inhibition of the phosphatase SHP2. Nature Communications, 9(1),

      1. https://doi.org/10.1038/s41467-018-06814-w .

      We believe that some of the functional questions raised by this reviewer, including the SHP1 and SHP2 contribution in PD-1 signaling, was addressed in our recent publication (Xu et al., JCB, 2020). Using SHP1 KO and SHP2 KO T cells, we showed that PD-1 inhibitory function is contributed by SHP2, but very little if any by SHP1. Thus in the current study, we focus on the mechanism behind the striking SHP2 preference by PD-1. We thank this reviewer for suggesting this excellent reference. We will cite this reference in the revised manuscript.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In this study, Xu and co-workers investigate the biophysical nature of the interaction between the structurally-related non-transmembrane PTPs Shp1 and Shp2 with the ITIM/ITSM-containing inhibitory receptors PD-1 and BTLA using cell-based, biochemical, biophysical and domain swapping assays. The primary aim being to better understand how these receptors discriminate between binding Shp1 and/or Shp2, and the orientation of Shp1 and Shp2 engagement. These are major unresolved questions in the field that the authors go some way to addressing in a methodical, rigorous, clear and concise manner. Findings are convincing, correlate well with previous findings and internally, and are complemented with excellent schematics, making it easy to comprehend.

      Major comments

      The authors focus primarily on binding affinities to explain differential binding of Shp1 and Shp2 by PD-1 and BTLA ITIMs and ITSMs, but this is only part of the story. Avidity, compartmentalization, stoichiometry of kinases, and relative abundance of Shp1 and Shp2 are also important aspects of the discriminatory mechanism that are not addressed. Competition assays would go some way to addressing the latter point and should be at least be considered and discussed.

      We agree that various parameters mentioned by this reviewers, such as compartmentalization and relative expression levels would be a concern for purely cell-free assays such as SPR, however, we feel that our cell-based assays already integrate these parameters. This is also precisely the reason why we chose to examine the recruitments of Shp1/2 in a cellular context instead of a purely cell-free system.

      Regarding the competition, we have confirmed our key results in both WT and SHP2 KO background, with or without the potential competition from endogenous SHP2, suggesting that competition might not be a dominant mechanism for the recruitment specificity we observed.

      Similarly, authors do not address how distortion of the pY binding pocket of Shp1 and Shp2 nSH2 domains in the auto-inhibited conformation is released, allowing the domain to engage with phopho-ITIM/ITSM. Again, this should be at least discussed. Current binding studies do not address this issue.

      We feel that the overall recruitment to the PD-1 microclusters as we observed in cells already integrate this auto-inhibition mechanism of Shp1 and Shp2, because we used full length proteins. We do agree with the reviewer that future studies are warranted to address the contributions of each mechanism, including auto-inhibition, concentration, competition, etc., to the overall recruitment. This might require careful and extensive biophysical analyses coupled with mathematical modeling.

      Minor comments:

      Phosphorylation should be indicated in schematic representations in Figures 3, 6 b, c.

      We thank the reviewer for this advice, we will indicate phosphorylation in the revised figure 3.

      Cellular and physiological significance should be further discussed, as well as broader implications of findings to other ITIM/ITSM-containing receptors in other lineages.

      We will further discuss this as suggested.

      Reviewer #2 (Significance)

      Findings from this study advance our knowledge of how inhibitory checkpoint regulatory receptors discriminate between Shp1 and Shp2, which has important implications for understanding how the unique biochemical, cellular and physiological functions of these receptors and phosphatases are dictated. Indeed, findings lay the foundation for a universal mechanism, that may apply to all ITIM/ITSM receptors in other cell lineages, and perhaps novel ways of targeting these interactions therapeutically.

      Compare to existing published knowledge

      Although largely correlative with previous studies, findings from this study start to fill major gaps in our knowledge of these biochemical processes, in a highly rigorous, concise and clear manner. Findings from previous studies were more 'piecemeal', whereas this study consolidates and advances important nuances of these interactions. Moreover, it lays the foundation for further structural, physiological and therapeutic studies.

      Audience

      The immune receptor signaling community and beyond, including any lineage in which ITIM/ITSM-containing receptors play a major role in regulating cellular responses.

      Your expertise

      ITIM/ITSM-containing receptors, kinase-phosphatase molecular switches, cellular reactivity to extracellular matrix proteins

      Referee Cross-commenting

      Generally agree with reviewer's comments. Constructive overall and fair. Although I was thinking additional competition experiments, I do not think necessary. Over the top for this study. Hence, 1 month should suffice to revise accordingly.

      We thank this reviewer for the excellent comments and understanding!

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary:

      Inhibitory immune receptors containing ITIMs function through recruiting the phosphatases SHP-1 and SHP-2. SHP-1 and SHP-2 are remarkably similar yet have different roles in vivo. How can ITIM-containing immune receptors specifically recruit SHP-1 or SHP-2? In this paper, Xu et al ask how SHP-1 vs SHP-2 specificity is achieved. They use very thorough biochemical assays to measure the affinity of SHP-1 and SHP-2 for various ITIM/ITSMs and finally pin point some key amino acids that switch an ITIM/ITSM from SHP-2 to SHP-1 specificity. The in vitro biochemical assays are augmented by in cell assays that support their conclusions. Overall, this paper is an incredibly elegant and straight forward paper addressing how SHP-1/SHP-2 specificity is achieved.

      Major Comments: none

      Minor Comments:

      • Could the western blots in Figure 1 be quantified as the western blots in other figures?

      We will quantify the western blots in Figure 1 as suggested in the revised manuscript.

      • The data that the y+1 reside is essential for SHP-1/2 specificity is very convincing. We are curious if the other residues of the ITIM/ITSM also contribute to this specificity, albeit less potently. The PD-1 G224A mutant is still less potent than the PD-1 BTLA ITIM swap, suggesting that while the y+1 position is most important, the other residues contribute some specificity. The authors also included data on a PD-1 variant with the BTLA ITIM A224G mutation (8f), which is slightly better at recruiting SHP-1 than the PD-1 ITIM. It may be worth mentioning this data in the text of the paper as well as displaying it in the figure.

      The reviewer raised an excellent point, yes, our data does suggest that other pY-flanking residues within the ITIM also contribute to SHP1 binding. However, the pY+1 residue replacement produced the strongest effect as the reviewer noted. In the revised manuscript, we will acknowledge the potential contributions of other residues.

      • A brief introduction to ITIM vs ITSM in the introduction of the paper may be helpful background for readers. For example, ITIM receptors are reasonably well known but how ITSM functionally differs is probably less well known.

      We will rewrite the introduction about ITIM and ITSM for better clarity.

      • Although not the major focus of the paper, broadening out this SHP-1/2 specificity to other immune receptors in the discussion is fascinating. (a) The authors find that a Valine, Leucine, or Isoleucine in place of the Alanine in y+1 is very close to equivalent, yet the A is highly conserved. The authors speculate that there may be an advantage to sub-maximal SHP-1 affinity because it is more easy to regulate. I think this is reasonable speculation but a little unsatisfying given the very small observed difference in SHP-1 binding. If the authors have additional thoughts, I would be interested to hear them. (b) The authors note that PD-1 is the only ITIM with a glycine in the Y+1 position. Are there other receptors that function primarily through SHP-2, and how might they achieve this specificity?

      Response to a: Even though valine, leucine or isoleucine did not produce a striking enhancement in Shp1 recruitment over alanine, the differences were statistically significant. In fact, when we performed these point mutations at a BTLA ITIM background, valine, leucine or isoleucine markedly enhanced the SHP1 recruitment (see unpublished data below). We speculate that other pY-flanking residues in BTLA, as this reviewer alluded to above, creates an environment that amplifies the differences. The strong sensitivity on pY+1 residue, as observed in BTLA, might be true for other SHP1-recruiting receptors too. If they were to have leucine or isoleucine at the pY+1 position of ITIM, they may recruit too much SHP1 that presumably decreases the fitness/growth of the cells. We propose to show this unpublished data as a supplemental figure in the revised manuscript. We will also discuss the potential contributions of other pY-flanking residues as this reviewer suggested.

      {{images cannot be rendered at this time in reply letters}}

      Response to b: Among the several receptors that we tested, PD-1 is the only receptor that exhibited no recruitment of SHP1. The lack of SHP1 recruitment is also true for murine PD-1, which has a glutamate residue (charged) at Y+1 position. In addition, earlier work reported that PECAM1 also selectively recruits SHP2, but not SHP1. We have noted that PECAM1 contain a threonine (polar) at the pY+1 position of their ITIMs. Thus, their inability to recruit SHP1 is consistent with our model that a nonpolar residue at Y+1 position is required for strong SHP1 recruitment. We will discuss these points in the revised manuscript.

      • Figure 9 b Val not Vla, Figure 3a - a legend for the color code may be nice (ie, 20-1000 nM) Thanks for catching this, we will fix the error in Figure 9b and provide the color code in Figure 3a in the revised manuscript.

      Reviewer #3 (Significance):

      Significance:

      SHP-1 and SHP-2 play a critical role in regulating immune system function. In addition, the receptors recruiting these phosphatases (like PD-1) are important immunotherapy targets. Previously, the question of SHP-1/SHP-2 specificity has been primarily described for ITIM bearing receptors individually. Other studies have predicted consensus sequences for the tSH2 domains of SHP-1 or SHP-2, but not addressed the defining molecular characteristics of these consensus sites or how these could be combined on ITIM receptors to generate selectivity between these related phosphatases. This paper represents a significant step forward because it provides a unifying mechanism explaining how ITIM-bearing immune receptors specifically recruit SHP-1 or SHP-2. I expect this paper will be broadly interesting to biochemists, immunologists and cancer biologists.

      Referee Cross-commenting

      I generally think the other reviewers comments are reasonable and insightful. Together, they suggest no new experiments are necessary. As for the proposed title change, I prefer the authors title and find it to be justified given their data.

      Reviewer #4 (Evidence, reproducibility and clarity):

      In this manuscript, Xu and college performed an elaborate study to investigate the molecular basis of Shp1 and Shp2 discrimination by immune checkpoints PD-1 and BTLA. The paper is original, clear, and well written. I only have a few minor comments:

      1. Please label the molecular weights to all the western blots/IPs results.

      We will label the molecular weights to all the blots in the revised manuscript.

      1. Please add scale bars to all the microscopy pictures.

      We will add scale bars to all the microcopy images in the revised manuscript.

      1. For the SPR data, please add the fitting curves.

      We thank the reviewer for the suggestion. However, we did not use the fitting curve to calculate the Kd, we plotted the maximum response as a function of concentration to determine the Kd. This is another well accepted method for Kd calculation. In fact, some of the SPR curves fit poorly with the existing algorithm. Thus, showing the fitting curve might distract the readers.

      Reviewer #4 (Significance):

      The strength of this paper relies on the details they dissected by using a series of mutagenesis screening experiments, which should be interesting to cell biologists and cancer immunologists.

      Referee Cross-commenting

      I think the other reviewer's comments are insightful and constructive, the suggested experiments are necessary and will improve the paper.

      We thank this reviewer for the positive comments!

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      Inhibitory immune receptors containing ITIMs function through recruiting the phosphatases SHP-1 and SHP-2. SHP-1 and SHP-2 are remarkably similar yet have different roles in vivo. How can ITIM-containing immune receptors specifically recruit SHP-1 or SHP-2? In this paper, Xu et al ask how SHP-1 vs SHP-2 specificity is achieved. They use very thorough biochemical assays to measure the affinity of SHP-1 and SHP-2 for various ITIM/ITSMs and finally pin point some key amino acids that switch an ITIM/ITSM from SHP-2 to SHP-1 specificity. The in vitro biochemical assays are augmented by in cell assays that support their conclusions. Overall, this paper is an incredibly elegant and straight forward paper addressing how SHP-1/SHP-2 specificity is achieved.

      Major Comments:

      none

      Minor Comments:

      • Could the western blots in Figure 1 be quantified as the western blots in other figures?
      • The data that the y+1 reside is essential for SHP-1/2 specificity is very convincing. We are curious if the other residues of the ITIM/ITSM also contribute to this specificity, albeit less potently. The PD-1 G224A mutant is still less potent than the PD-1 BTLA ITIM swap, suggesting that while the y+1 position is most important, the other residues contribute some specificity. The authors also included data on a PD-1 variant with the BTLA ITIM A224G mutation (8f), which is slightly better at recruiting SHP-1 than the PD-1 ITIM. It may be worth mentioning this data in the text of the paper as well as displaying it in the figure.
      • A brief introduction to ITIM vs ITSM in the introduction of the paper may be helpful background for readers. For example, ITIM receptors are reasonably well known but how ITSM functionally differs is probably less well known.
      • Although not the major focus of the paper, broadening out this SHP-1/2 specificity to other immune receptors in the discussion is fascinating. (a) The authors find that a Valine, Leucine, or Isoleucine in place of the Alanine in y+1 is very close to equivalent, yet the A is highly conserved. The authors speculate that there may be an advantage to sub-maximal SHP-1 affinity because it is more easy to regulate. I think this is reasonable speculation but a little unsatisfying given the very small observed difference in SHP-1 binding. If the authors have additional thoughts, I would be interested to hear them. (b) The authors note that PD-1 is the only ITIM with a glycine in the Y+1 position. Are there other receptors that function primarily through SHP-2, and how might they achieve this specificity?
      • Figure 9 b Val not Vla, Figure 3a - a legend for the color code may be nice (ie, 20-1000 nM)

      Significance

      SHP-1 and SHP-2 play a critical role in regulating immune system function. In addition, the receptors recruiting these phosphatases (like PD-1) are important immunotherapy targets. Previously, the question of SHP-1/SHP-2 specificity has been primarily described for ITIM bearing receptors individually. Other studies have predicted consensus sequences for the tSH2 domains of SHP-1 or SHP-2, but not addressed the defining molecular characteristics of these consensus sites or how these could be combined on ITIM receptors to generate selectivity between these related phosphatases. This paper represents a significant step forward because it provides a unifying mechanism explaining how ITIM-bearing immune receptors specifically recruit SHP-1 or SHP-2. I expect this paper will be broadly interesting to biochemists, immunologists and cancer biologists.

      Referee Cross-commenting

      I generally think the other reviewers comments are reasonable and insightful. Together, they suggest no new experiments are necessary. As for the proposed title change, I prefer the authors title and find it to be justified given their data.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Editor comments:

      Thank you for sending your manuscript entitled "In situ imaging of bacterial membrane projections and associated protein complexes using electron cryo-tomography" to Review Commons. We have now completed the peer review of the manuscript. Please find the full set of reports below.

      We thank the editors of Review Commons and all the reviewers for their insightful comments which helped us to improve our manuscript. We have now modified our manuscript based on the Reviewers’ comments and would like to ask you to consider our revised manuscript for publication.

      Reviewer #1:

      This manuscript by the Jensen lab surveys a plethora of bacterial outer-membrane projections captured over the years by in situ cryo-tomography under near-native conditions. The authors classify the different visualized structures, highlighting both similarities and differences among them. They further describe molecular complexes that are associated with these projections. The manuscript highlights the abundance of such understudied structures in nature, indicating the need to deepen our exploration into their biological functions and mechanisms of action.

      We thank the reviewer for her/his insightful comments that allowed us to improve our manuscript.

      The authors should state in the Abstract and Introduction that only diderm bacteria and outer- membrane extensions are included in the study.

      Done. We have modified the title, the abstract and the introduction to explicitly highlight this point.

      In the Introduction or Discussion the authors should mention the limits of the in situ cryo-tomography, such as the difficulty to observe regions in between neigbouring bacterial cells, and into the thick bacterial cell body.

      Done. We have added the following to our revised manuscript:

      “Currently, only electron cryo-tomography (cryo-ET) allows visualization of structures in a near-native state inside intact (frozen-hydrated) cells with macromolecular (~5 nm) resolution. However, this capability is limited to thin samples (few hundred nanometers thick, like individual bacterial cells of many species) while thicker samples like the central part of eukaryotic cells, thick bacterial cells, or clusters of bacterial cells are not amenable for direct cryo-ET imaging. Such thick samples can be rendered suitable for cryo-ET experiments by thinning them first using different methods including focused ion beam milling and cryosectioning [30]. Cryo-ET has already been invaluable in revealing the structures of several membrane extensions, including Shewanella oneidensis nanowires [6], Helicobacter pylori tubes [15], Delftia acidovorans nanopods [25], Vibrio vulnificus OMV chains [16], and more recently cell-cell bridges in the archaeon Haloferax volcanii [31].” (Lines 108-118)

      Please provide a legend to Table S1 explaining the numbers (organelles?), how many cells were viewed? I think that at least part of it should be included in the main text. Also, there are examples of vesicles emanating from H. pylori. This information is missing from Table S1.

      Done. We added a column to the table indicating the number of cells available for each species. We also added the information about the vesicles in H. pylori to the table. This table is now incorporated into the main text of the manuscript as Table 1.

      Please provide an ordered list including all the strains (and IDs of the specific isolates) used in this study and their genotypes.

      Done. We added Table S1 to the revised manuscript that contains this information. This table also includes relevant references to all the published papers where these strains were previously used.

      The authors describe in detail the H. pylori tubes that seem to be flagellum-core independent. However, the authors found previously (ref 15) that during infection, these structures are dependent on CagA T4SS, and they visualized T4SS sub-complexes in proximity to the point of tube emanation. This should be described and discussed in the text. Also, please indicate if the "host-independent" tubes are similarly dependent on T4SS.

      Done. We added the following to the revised manuscript:

      “The scaffolded uniform tubes of H. pylori that we observed were formed in samples not incubated with eukaryotic cells, indicating that they can also form in their absence. However, the tubes we found had closed ends and no clear lateral ports, while some of the previously-reported tubes (formed in the presence of eukaryotic host cells) had open ends and prominent ports [15]. It is possible that such features are formed only when H. pylori are in the vicinity of host cells. Moreover, while it was previously hypothesized that the formation of membrane tubes in H. pylori (when they are in the vicinity of eukaryotic cells) is dependent on the cag T4SS [15], we could not identify any clear correlation between the emanation of membrane tubes and cag T4SS particles in our samples where H. pylori was not incubated with host cells. We also show that the tubes of H. pylori are CORE-independent, indicating that they are different from the CORE-dependent nanotubes described in other species.” (Lines 303-313)

      Is there any difference in the frequency or length of the tubes in the mutants presented in Figure 4? The flgS mutant in the image exhibits a very short filament; is that typical?

      We did not see any significant statistical difference in the number or lengths of the tubes in these different mutants. We added Table S2 to the revised manuscript which details the number of cells we visualized for each mutant and the number of the tubes seen there. In all these mutants the lengths of the tubes ranged between few tens to hundreds of nanometers. In addition, we added Fig. S2 to show more examples of these tubes in each of these mutants.

      Minor points:

      -Please check full bacterial names that are sometimes missing (e.g., lines 110-112).

      Done.

      -There is no reference to panel 2G. Please check the references to all panels.

      Done. Please see lines 154 and 183 in the main text.

      -Lines 181-184: There is no figure related to the formation of teardrop-like extensions from C. pinensis. Please review the text accordingly.

      Done. Corrected.

      -Line 235, not clear to what "as these" refers to.

      Done. We modified the text as the following:

      “As these MEs/MVs from S. oneidensis were purified” (Lines 246-247)

      -Line 241, not clear what "a secretin-like complex" is, and no reference is provided.

      Done. We modified the text as the following:

      “In the third category, we observed a secretin-like complex in many tubes and vesicles of F. johnsoniae. Secretins are proteins that form a pore in the outer membrane and are associated with many secretion systems like type IV pili and type II secretion systems (T2SS) [39–41]” (Lines 252-254)

      Reviewer #1 (Significance)

      As described in this manuscript, even in model bacteria these structures are generated (e.g., Caulobacter forms the hardly studied nanopod extensions). The manuscript also provides visual categories of these structures, defining "extension types" that are likely to be used by the scientific community for years to come, similar to the initial pili classification during the 1960s-70s. It is a "descriptive study," in the positive sense of the term, as it significantly contributes to the field of bacteriology.

      We thank the reviewer for her/his kind words and enthusiasm about our work. It is an honor to have our work compared to the seminal pili classification work done in the 1960s-70s by pioneers in the field of bacteriology.

      Reviewer #2:

      The manuscript "In situ imaging of bacterial membrane projections and associated protein complexes using electron cryo-tomography" by Kaplan et al., identifies and catalogues membrane extensions (MEs) and membrane vesicles (MVs) from 13 different species using cryo-electron tomography. Furthermore, they identify and discuss several protein complexes observed in these membrane projections.

      The manuscript is beautifully written, interesting, and genuinely got this reviewer excited about the biology. I applaud the authors on their manuscript and have only minor comments and a few thoughts that the authors may wish to think on and discuss.

      We thank the reviewer for her/his kind words and insightful comments that allowed us to improve our manuscript.

      Some schematics throughout the introduction would be useful to readers new to the field/ outside the field who are not used to these different membrane structure features.

      We thank the Reviewer for this suggestion. First, we made an extra figure with schematics showing the cell body and membrane tubes but that was rather redundant with Figure 8. For this reason, we added explicit labels to figure 1 highlighting the cell body and the tubes in these examples to help the reader following that figure and the subsequent ones. However, if the Reviewer has an explicit suggestion/view about the schematics then we would be very happy to do that.

      The size of scale bars should be indicated on the figure panels themselves rather than in the figure legend to assist the reader.

      Done.

      In reference to lines 193-196 - what was the extracellular environment like in these micrographs? Were other cells present? Could it be the extracellular environment/surrounding cells that stimulate pearling? Have the authors considered this? Please discuss if relevant/insightful.

      This is a good point. The cells were usually plunge-frozen in their standard growth media (except in H. pylori where the cells were resuspended in PBS and subsequently plunge-frozen). Yes, there are other cells present in the sample, however, usually, only one cell is present in the field of view of the tomogram as areas with multiple cells have thick ice and therefore not amenable for cryo-ET imaging. We added the following to the revised manuscript:

      “As usually only one (or part of a) cell is present in the cryo-tomogram, we can’t exclude that differences in the extracellular environments, like the presence of a cluster of cells in the vicinity of the individual cells with pearling tubes, might play a role in this observation” (Lines 198-201).

      "Randomly-located complexes" in this reviewers opinion should actually be described "seemingly randomly-located complexes" given there may be an organization present that is beyond the resolution limit of this study.

      The is a good point. Indeed, we can’t exclude that these complexes have a preferred localization in specific lipid patches that we can’t detect in our cryo-tomograms. We added the following statement to the revised manuscript:

      “These complexes, which were also found in the OM of intact cells, did not exhibit a preferred localization or regular arrangement within the tube at least within the fields of view provided by our cryo- tomograms (Fig. 5a & b).” (lines 227-230).

      In reference to lines 287-292 - is it possible this has to do with lipid composition? Have the authors considered this? Please discuss if relevant/insightful.

      Done. We added the following to the revised manuscript:

      “In addition, differences in the lipid compositions among the various species investigated here might also play a role in the formation of these different forms of projections” (Lines 299-301).

      Reviewer #2 (Significance ):

      These results advance the field by shedding new light on bacterial membrane extension morphologies. The authors use a cryo-ET to catalogues membrane extensions and membrane vesicles which has not been done before.

      This paper is likely to be of interest to structural biologists, biophysicist, membrane protein biologists, virologists and microbiologists.

      This reviewer is a single-particle cryo-EM structural biologist with interest in membrane proteins._

      We thank the reviewer for her/his enthusiasm about our work described here.

    1. The idea of managing information by defining associations (trails)between documents has been introduced in Vannevar Bush’s semi-nal article ‘As We May Think

      trails

    Annotators

    1. Reviewer #3 (Public Review):

      The biochemical and genetic characterization of BRCA2 has been an ongoing challenge in the DNA repair field as the protein is large, prone to degradation, and expressed at low levels in most cell types. While certain features of BRCA2 have been described previously including its ability to bind and load RAD51 onto resected DNA substrates, much remains to be discovered. In this study, the authors combine genetic studies in mouse ES cells with biochemical analysis to examine the spatial dynamics and molecular architecture of BRCA2. Notably, they utilize an innovative approach coupling endogenous tagging of mouse BRCA2 with a HALO tag to monitor BRCA2 movement within live cells by single particle tracking.

      I applaud the authors for achieving a highly technical approach to epitope tagging both endogenous BRCA2 alleles in mouse ES cells and combining this strategy with a HALO tag providing additional utility for a variety of cell biological experiments. By analyzing the endogenous alleles, the authors' system provides physiological levels of protein expression as transcription will be driven by the endogenous promoter thus preserving stoichiometric protein interactions within the cell and avoiding artifacts caused by overexpression.

      The authors determine the influence of the DNA binding domain (DBD) and c-terminal binding (CTD) on the dynamic activities of BRCA2. They begin by exposing cells containing 3 different deletion mutants ΔDBD, ΔCTD, and the double mutant ΔDBDΔCTD to four different types of DNA damage (IR, PARPi, MMC, and cisplatin). Notably, ΔDBD displays significant impairment in survival in response to all 4 types of DNA damage. The ΔCTD, in contrast, demonstrates less sensitivity to IR and Olaparib, however, complements as well as WT BRCA2 in response to crosslinking agents MMC and cisplatin. My only criticism in this aspect of the work is that it would have been informative to include a truncated BRCA2 (mimic of a patient pathogenic mutation) or null allele to compare to the survival of the ΔDBD and ΔCTD mutants. I realize that these alleles may be inviable but the authors should clearly state if that was indeed the case.

      The authors then go on to demonstrate that the ΔDBD and ΔCTD mutants are recruited to sites of IR damage in a similar manner to WT BRCA2 based on number and intensity of foci. I think it would be informative if the authors provided statistical significance for the graphs depicting the quantitation of foci number and intensity as there do appear to be differences between the mutants and the WT protein. There appears to be a delay in the kinetics of recruitment, especially at the 2 hr timepoint, for the mutants compared to WT BRCA2, which could indicate a defect in the recognition of the DNA damage. Only at the 2 hr timepoint following IR are there less RAD51 foci, and of a lesser intensity, in the three deletion mutants compared to WT BRCA2. Another possibility is the results could be interpreted as a defect in RAD51 loading and/or stabilization of the nucleoprotein filament. While immunofluorescence imaging of DNA repair foci have become common practice to measure protein recruitment to damage, it is impossible to know exactly what is happening in these foci with any granularity.

      Next, the authors measure BRCA2 movement in the mouse ES cells taking advantage of the HALO tag to track single particles. While technically and visually alluring, it is difficult to extract mechanistic insight from the results. DNA damage induces changes in diffusion leading to BRCA2 molecules with restricted mobility; the authors demonstrated this phenomenon in a prior publication. The deletion mutants appear to have little effect upon BRCA2 mobility.

      Finally, the authors utilize scanning force microscopy to analyze binding of the purified human BRCA2 proteins to RAD51 and ssDNA. In the absence of RAD51/ssDNA binding, there is a notable shift in the deletion mutants from oligomeric forms to monomeric compared to full length WT BRCA2. Upon binding to RAD51, there is a dramatic change from multimeric to monomeric forms for the WT BRCA2 (~7% to 74%) with a slight suppression of these changes shown for the deletion mutants. While WT BRCA2 forms extended molecular assemblies upon binding ssDNA, not surprisingly, deletion of the DBD or CTD fail to demonstrate any significant changes in physical architecture. In both situations, the mutant proteins respond to RAD51 and ssDNA in a dampened manner likely due to altered or loss of binding. While the architectural effects of RAD51 and ssDNA binding to BRCA2 are measurable by SFM, it is difficult to reconcile these changes in shape and oligomerization to defects in response to DNA damage and at which specific steps in homologous recombination these physical forms would impact.

      Strengths:

      1. Generation of mouse ES cells with both endogenous alleles of BRCA2 containing the deletion mutations in addition to a HALO tag is an incredible technical breakthrough and will be a highly valuable reagent for genetic and cell biological studies of mouse BRCA2.<br> 2. The deletion mutants ablating either the DBD or the CTD, or both, is a great genetic approach to understanding the role of these key domains in BRCA2. The response of these mutants (versus WT BRCA2 as a benchmark) to various DNA damage (IR, PARPI, MMC, cisplatin) provides interesting information delineating the roles of these two important domains in BRCA2. For example, the ΔCTD mutant is significantly sensitive to IR and Olaparib, yet complements as well as WT BRCA2 in response to the crosslinking agents MMC and cisplatin.<br> 3. The BRCA2 protein is notoriously difficult to purify and yet the authors succeeded in purifying 4 different forms of the protein for biophysical analysis. While it is difficult to interpret the various forms of BRCA2 by SFM, there are clear differences in the architecture between WT and the three c-terminal mutants. These differences are highlighted upon binding to RAD51 or ssDNA.

      Weaknesses:

      1. While the separation-of-function result for the CTD deletion in response to crosslinking agents MMC and cisplatin is a novel and compelling result, it would have been informative to compare the survival results and gene targeting assay using a BRCA2 null or mimic of patient mutation (truncating mutation) to see how these 3 mutants stack up against a completely non-functioning BRCA2 allele. Likely, the BRCA2 null alleles are inviable but perhaps a conditional system or truncating allele similar to a patient germline mutation would give a window into response compared to the DBD and CTD deletion mutants.<br> 2. It's not clear in the manuscript what new information we are learning about the mechanisms of BRCA2 in the single particle tracking (SPT) data. The differences in mobility between the mutants and WT BRCA2 seem minimal, but more importantly, it is not immediately clear how these data help us understand the normal cellular functions of BRCA2. No doubt, the technology and innovation to track single particle proteins in the nuclei of cells is impressive, but the authors should clearly explain how we can gain mechanistic insight from the SPT data that is presented in this manuscript.

      General Comments:

      It is unclear how missing the c-terminal domain (CTD) or the DNA binding domain (DBD) of BRCA2 can be interpreted as having "roles beyond delivering strand exchange protein RAD51" unless a complete biochemical workup of the deletion mutants was performed to detect any alterations in DNA binding, stimulation of RAD51 dependent strand exchange, etc... While interesting and certainly an impressive technical feat, foci imaging and single particle tracking do not provide much information on mechanism (i.e. whether BRCA2 is binding DNA and loading/nucleating RAD51).

      The interpretations in the discussion are not overstated, however, I somewhat disagree with the notion that the data, as presented, clarifies the role of BRCA2 beyond its canonical functions of RAD51 loading and nucleation on resected DNA substrates. I would have liked if the authors discussed the idea that it is surprising that mouse ES cells can tolerate complete loss of the DBD, CTD, and loss of both together. Questions that should be addressed in include some of the following: Are proliferation rates compromised compared to WT cells? Are they experiencing replication stress in the absence of any exogenous damage? Further, is there something unique about mouse ES cells that may differentiate BRCA2 behavior that would be expected in somatic human cells?

      It is interesting to note that many years ago Ashworth and Taniguchi published back-to-back papers in Nature (2008) describing BRCA2 reversion alleles from in vitro screens of BRCA2 mutant cells selected in cisplatin or PARPi such that some of these reversions resulted in huge deletions of the entire DBD of BRCA2, and yet, they promoted resistance to PARPi. In this context, I would much appreciate if the authors commented on their findings that their constructed DBD deletion is not resistant to PARPi and if they offered some speculation as to why the reversions in those previous studies were.

    1. Author Response:

      Reviewer #3 (Public Review):

      [...] I have only minor concerns regarding sources of error, particularly with respect to interpretation of the small effects the authors observe in many of their FRET experiments.

      • Figure 2D shows rather small changes in ΔF/F-15 mV between fluorescent protein labels inserted at different positions in the ASIC sequence, particularly for the YFP constructs. As this metric is determined from the top and bottom asymptotes for the Boltzmann fits shown in Figure 2C, it would be useful to have some estimate as to the error associated with the fits at extreme values. Perhaps the authors could provide fits to their data (as in Figure 2C), including confidence intervals, or some similar estimate as to the size of the expected error compared to the effect size in Figure 2D.

      Thank you for this point. We did use Boltzman’s fits to get the asymptotes for each cell and calculate a ΔF/F. However, we could also use a ‘fit free’ approach of simply taking the difference between fluorescence values measured at -180 mV and that at +120 mV, divided by that at -15 mV to normalize for each cell. This approach completely avoids any error associated with fitting the data or imposing any model at all. Using this approach results in slightly different ΔF/F values but the pattern of statistical significance is identical. This new analysis is included in Figure 2 figure supplement 4. It has also been corrected for multiple comparisons.

      • Along those same lines, the authors use an interesting (and potentially generalizable) approach to reducing background from intracellular proteins in their experiments: co-transfecting their channels with empty plasmid DNA. What percentage of the remaining fluorescence signal is the result of intracellular background? How would that affect the data in Figure 2 and 3? Is the ΔF/Fnorm curve for YFP labeled positions in Figure 2-figure supplement 4 so flat because of contaminating background fluorescence?

      This is a great question. We originally hoped that the CFP and YFP quenching data from different positions could be used to triangulate both a distance from the membrane and a value for background fluorescence assuming that CFP and YFP would yield similar background fluorescences. An analogous approach was used in Zachariassen et al. Proc Natl Acad Sci, 2016 where an equal background was assumed between conformational states within a recording. In the end, the YFP quenching appeared to have a greater background than CFP. We speculate that this may be because the YFP variant we used matures faster than the CFP (mVenus, 17.6 min verses mTurquiose2, 33.5 min; FPbase.org) and hence the YFP matures faster than the ‘new’ channels get to the plasma membrane. However, at present we are uncertain how much of the background fluorescence signal to confidently attribute to this intracellular FP issue.

      • In Figure 3D, the FRET efficiency between CFP-cA1-cA1 and N YFP at a 1:15 ratio of the two plasmids is higher than the FRET efficiency between CFP and YFP in the same subunit, even though the authors conclude that fluorescent proteins on the same subunit show considerably more FRET than fluorescent proteins on neighboring subunits. Could this indicate that the N-termini of adjacent subunits are closer together than the N- and C-termini of a single subunit? If, on the other hand, this effect were entirely the result of crowding in the membrane why is FRET efficiency substantially lower when CFP-cA1-cA1 is co-expressed with C4 YFP? Wouldn't this construct produce a similar crowding effect?

      We strongly suspect the N termini of adjacent subunits are closer to each other than N and C of single subunit simply because the N FPs would all be at the same ‘height’ or same depth with respect to the plasma membrane. Thus the measured FRET in this case primarily reflects distances in the x-y plane. This contrasts with the N and C FPs on the same or different subunits where both x-y distances and axial distances come into play.

      • On page 23, the authors state that they detected no pH-dependent changes in FRET between their GFP tag on the N-terminus of ASIC1 and an RFP tag on the channel's C-terminus. However, Figure 4 shows a small, but significant change in fluorescence between pH 8 and pH 7.

      We have corrected for multiple comparisons within a figure. As a result, this effect is no longer statistically significant (adjusted p value is 0.063).

      • The interpretation of distances between various tagged position on ASIC and the plasma membrane in Figure 2 is based on using two different colored tags with two different distance dependences. However, the interpretation of the data from Figure 5 provided on page 25 is less clear. For example, the reduction in fluorescence from the N-terminal tag is interpreted as the tag moving closer to the plasma membrane. Without similar data from a YFP tag to verify, it seems equally likely that the reduction in fluorescence (at steady state) could result from a movement away from the plasma membrane.

      This is a very good point. We tried to perform DPA quenching of YFP-containing constructs at pH 6.0, but the acidification resulted in proton-quenching of the YFP fluorescence (Figure 4). We didn’t feel confident in measuring DPA quenching with the concomitant loss of YFP fluorescence due to acidification. Therefore, we relied on the pH 8.0 CFP and YFP data as a starting point (Figure 2). Given the C1 insertion gives the greatest extent of CFP quenching, it is reasonable to place it around the top of the curve. The N position could then be on the left or right side of the hump or peak in the CFP distance curve. The N quenching is comparable to the C2 insertion quenching (Figure 2D, left) yet the N FP is ~ 16 amino acids from the pore-forming membrane helices while the C2 insertions is ~ 40 amino acids away. For reference, the C1 is ~ 24 amino acids. Thus we are reasonably confident the N insertion is on the left side of the hump or peak. A reduction in ΔF/F would indicate movement closer to the plasma membrane. While technically possible that the N position could move further away from the membrane, this would have to be a >25 Å movement. Given there are only 16 amino acids between the CFP and the beginning of TM1 of the channel, we do not think such a dramatic movement outward could occur.

    1. Europe-wide, covid secure travel is finally here . . .Unfortunately, many people in the UK aren’t eligible so can’t take advantage of this.Because of Brexit?Indirectly. As the UK has left the EU, it now has separate systems and regulations and its citizens lost the right to free movement. Meanwhile, the EU has launched a digital covid certificate to facilitate free movement, which will be issued and recognised by all EU states.How does it work?The certificate contains a QR code showing that the bearer has been fully vaccinated and tested negative for, or recently recovered from, covid-19.1 Fully vaccinated EU citizens will be exempt from travel related testing and quarantine across the region 14 days after having received their last dose.1 Only UK residents who are citizens of EU member states living here may be eligible for one.Sounds like the NHS covid passThis is a different system so is not automatically recognised by the EU, although some individual EU countries, including Spain and Greece, are accepting the NHS covid pass.2 The technologies behind the two systems are similar, so the EU and the UK are working on a mutual recognition agreement before the peak summer holiday season kicks in.Great, so can we book a cheap August break in the Med?Not so fast. Check which vaccine you had first. If it’s AstraZeneca then you should be covered, but you’ll need to check the batch numbers to be sure.I think it was 4120Z001 . . .Sorry, looks like you will be holidaying in Cornwall this year, if you can find any accommodation you can afford.3 Unfortunately batch numbers 4120Z001, 4120Z002, and 4120Z003 are Covishield which is not recognised by the EU.But isn’t all AstraZeneca recognised?Unfortunately not. While the two vaccines are identical, Covishield was made at the Serum Institute in India rather than in the UK or EU. EU member states only recognise vaccines that received EU marketing authorisation. The European Medicines Agency hasn’t approved Covishield because the EU isn’t receiving any Covishield doses.Bureaucratic nonsense!Maybe, but there is still hope. Individual EU member states can decide to recognise other vaccines,1 and the World Health Organization has approved Covishield for emergency use. Some countries already accept other vaccines—for example, Greece accepts China’s Sinovac, Russia’s Sputnik V, and several others.
    1. And we carry the scars of those stories as epigenetically-programmed determiners of our everyday, modern lives.But those stories are not the whole truth.Rather, they are that truth which the conscious ego and its master, the autonomic nervous system — which is delivered in each new human with factory settings locked on Sympathetic Response (stressed, ego-centered, fearful) — has determined as the impetus to create the climate-catastrophic society.And so the trauma at the deepest part of us… which may be driving all of our most self-destructive impulses and patterns… is the belief that we do not belong in this world.That we are strangers in a strange and dangerous land, instead of children who live in a supernatural garden.Because that is the other part of the story.

      interesting narrative shaping here.

      scientifically, yes our understanding of our autonomic nervous system is super important to the healing the things that are wild and disturbing above — but jumping to the next part is where I get lost.

      Agree a lot of our pain is from: "is the belief that we do not belong in this world." which I think we'll have to work on to fix the climate thing.

    1. Author Response

      Reviewer #1 (Public Review):

      [...] My main technical concern lies in the choice of decomposition filter for SEP and alpha oscillations, and the conclusions the authors draw from that. Specifically, a CCA spatial filter is optimized here for the N20 component, which is then identically applied to isolate for alpha sources, with the logic being that this procedure extracts the alpha oscillation from the same sources (e.g., L359). I have no issues (or expertise) with using the CCA filter for the SEP, but if my understanding of the authors' intent is correct, then I don't agree with the logic that using the same filter isolate for alpha as well. The prestimulus alpha oscillation can have arbitrary source configurations that are different from the SEP sources, which may hypothetically have a different association with the behavioral responses when it's optimally isolated. In other words, just because one uses the same spatial filter, it does not imply that one is isolating alpha from the same source as the SEP, but rather simply projecting down to the same subspace - looking at a shadow on the same wall, if you will. To show that they are from the same sources, alpha should be isolated independently of the SEP (using CCA, ICA, or other methods), and compared against the SEP topology. If the topology is similar, then it would strengthen the authors' current claims, but ideally the same analyses (e.g., using the 1st and 5th quintile of alpha amplitude to partition the responses) is repeated using alpha derived from this procedure. Also, have the authors considered using individualized alpha filters given that alpha frequency vary across individuals? Why or why not?

      Indeed, applying the same spatial filter to EEG signals with different spatial arrangements of the sources can lead to the extraction of neuronal activity which does not originate from the very same sources. We had chosen our approach, as it is well known that the generators of the early SEP components and the generators of the prominent somatosensory alpha rhythm co-reside at similar sites in the primary somatosensory cortex (e.g., Haegens et al., 2015). Therefore, we considered our approach appropriate to specifically focus on neural activity from the somatosensory region both in the frequency band of the SEP as well as of the alpha rhythm. Yet, we agree with the reviewer that it should be acknowledged that we may have missed or mixed-up effects of alpha activity from other sources by using this procedure (which might have led to different conclusions otherwise). In order to account for this, we repeated our analyses with an SEP-independent reconstruction of the oscillatory effects in source space (“whole brain analysis”). For this, we first reconstructed the sources of alpha activity using eLORETA and head models based on participant-specific MRI scans, and estimated the respective effects independently for all sources across the cortex using both linear-mixed effects models (LME) as well as a binning approach for the Signal Detection Theory (SDT) parameters sensitivity d’ and criterion c (consistent with the previous analyses in our manuscript). In the LME analyses, both the effects of pre-stimulus alpha activity on N20 amplitudes as well as on perceived stimulus intensity were strongest in the right primary somatosensory cortex – in accordance with the sources of the originally extracted tangential CCA component of the SEP (see Supplementary Figure 1 for Peer Review). Also, using the binning approach to examine the relation or pre-stimulus alpha activity with SDT parameter criterion c, the effects were most pronounced around the right somatosensory regions (Supplementary Figure 2 for Peer Review), yet these effects did not survive statistical correction for multiple comparisons (FDR-correction with p<.01). However, when performing the same binning analysis for our region of interest (ROI), the hand area in BA 3b of the right somatosensory cortex, a significant effect or pre-stimulus alpha on criterion c was indeed confirmed, t(31)=-2.951, p=.006, CI95%=[-.173, -.032]. Furthermore, in line with our previous CCA results, for sensitivity d’, neither the whole brain analysis nor the ROI analysis showed effects of pre-stimulus alpha amplitude, t(31)=0.633, p=.531, CI95%=[-.083, .157]. Taken together, the findings we report in our original manuscript for pre-stimulus alpha activity obtained with the spatial CCA filter can thus be replicated with a SEP-uninformed source reconstruction, both using LMEs for a “whole-brain analysis” as well as SDT analyses in a ROI-based approach. We therefore conclude that the relationships between pre-stimulus alpha activity, N20 potential of the SEP, and perceived stimulus intensity can indeed be attributed to neural activity from the same (or at least very similar) sources in the primary somatosensory cortex.

      Addressing the question on filtering alpha activity in individualized frequency bands, we considered this option, too. However, the rather short length of our pre-stimulus window (-200 to -10 ms) constitutes a natural limit for the frequency resolution in the alpha range and slightly different filter ranges (adjusted with regards to the individual alpha peak frequency) are thus unlikely to lead to large differences in the estimation of pre-stimulus alpha amplitudes. Therefore, we refrained from using individualized frequency bands here and focused on the more generic approach using one common alpha band (8-13 Hz) for all participants, which should also facilitate direct comparisons with previous studies on pre-stimulus oscillatory effects.

      In the same vein, both alpha and N20 amplitude relate to perceptual judgement, and to each other. I believe this is nicely accounted for in the multivariate analysis using the SEM, but the analysis that partitions the behavioral responses using the 20% and 80% are done separately, which means that different behavioral trials are used to compute the effect of N20 and alpha on sensitivity and criterion. While this is not necessarily an issue given that there IS a multivariate analysis, I would like to know how many of those trials overlap between the two analyses.

      This is an interesting point indeed. We included both the binning analyses and the multivariate analyses in our manuscript as we believe they offer complimentary views on the data, and also allow a direct comparison to previous studies in the field (e.g., Iemi et al., 2017). In fact, the trial overlap between the extreme bins of the alpha and N20 data were rather small.

      Since the expected trial overlap is 20% when partitioning the data into quintiles randomly, the effect-driven increments and reductions in trial overlap in our data appear to be rather small. However, they showed the expected directions: Larger alpha amplitudes were associated with more negative N20 amplitudes (and vice versa). Presumably, these small differences in trial overlap reflect the rather small effect sizes we also observed in the multivariate analyses. We have added this information to our revised manuscript in the following way to give the reader a better picture of the underlying data for the binning analyses (page 9, lines 137 ff.): “(Please note that this procedure resulted in a different trial selection as compared to the SDT analysis of pre-stimulus alpha activity. Please refer to Fig. 2—figure supplement 2 for further details on the trial overlap.)”

      At multiple points, the authors comment that the covariation of N20 and alpha amplitude in the same direction is counterintuitive (e.g., L123-125), and it wasn't clear to me why that should be the case until much later on in the paper. My naive expectation (perhaps again being unfamiliar with the field) is that alpha amplitude SHOULD be positively correlated with SEP amplitude, due to the brain being in a general state of higher variability. It was explained later in the manuscript that lower alpha amplitude and higher SEP amplitude are associated with excitability, and hence should have the opposite directions. This could be explicitly stated earlier in the introduction, as well as the expected relationship between alpha amplitude and behavior.

      Thank you for pointing out this unclarity. We have now made this rationale more explicit already at an early point in the introduction (page 3, lines 26 ff.): “According to the baseline sensory excitability model (BSEM; Samaha et al., 2020), higher alpha activity preceding a stimulus indicates a generally lower excitability level of the neural system, resulting in smaller stimulus-evoked responses, which are in turn associated with a lower detection rate of near-threshold stimuli but no changes in the discriminability of sensory stimuli (since neural noise and signal are assumed to be affected likewise).”

      Furthermore, I have a concern with the interpretation here that's rooted in the same issue as the assumption that they are from the same sources: the authors' physiological interpretation makes sense if alpha and N20 originated from the same sources, but that is not necessarily the case. In fact, the population driving the alpha oscillation could hypothetically have a modulatory effect on the (separate) population that eventually encodes the sensory representation of the stimulus, in which case the explanation the authors provide would not be wrong per se, just not applicable. A comment on this would be appreciated in the revision.

      Our extensive additional analyses suggest that the sources of behaviorally relevant alpha and N20 activity were located at very similar cortical sites. Nevertheless, this is not a proof that exactly the same neuronal populations were involved (for example, alpha and N20 effects could originate from different cortical layers). Therefore, we have added this potential limitation to our revised manuscript in the following way (page 19, lines 379 ff.): “Furthermore, with the present data, we cannot unambiguously conclude that the observed relation between pre-stimulus alpha activity and initial SEP indeed involved the very same neuronal populations – which may represent a limitation of the hypothesized mechanism. However, all approaches to localize these effects pointed to very similar cortical regions as discussed in the following section.”

      In addition, given how closely related the investigation of these two quantities are in this specific study, I think it would be relevant to discuss the perspective that SEPs are potentially oscillation phase resets. Even though the SEP is extracted using an entirely different filter range, it could nevertheless be possible that when averaged over many trials, small alpha residues (or other low freq components) do have a contribution in the SEP. If the authors are motivated enough, a simulation study could be done to check this, but is not necessary from my point of view if there is an adequate discussion on this point.

      Indeed, the phase reset mechanism may be a possible alternative explanation for relations between oscillations and later parts of the ERP. However, the N20 potential reflects the very first excitation of the cortex in response to a somatosensory stimulus and should therefore represent a textbook example of an additive response (EPSPs are added to ongoing background activity). Moreover, the N20 response should be over long before a possible phase reset in lower frequencies (such as alpha frequencies) would start to play a role (Hanslmayr et al., 2007; Sauseng et al., 2007). Nevertheless, we ran additional control analyses (including a simulation study) in order to exclude that some odd combination of phase-locking and filter residues led to the present findings: Please see Essential Revision #4 for details and how we included these considerations in our revised manuscript.

      Reviewer #2 (Public Review):

      [...] The main weaknesses of the manuscript becomes most apparent with respect to the stated impact that "The widespread belief that a larger brain response corresponds to a stronger percept of a stimulus may need to be revisited.". I am not really sure if there are many cognitive neuroscientists, that would actually subscribe to such a simplistic relationship between evoked responses and perception and that temporal differentiation (early vs late responses) and the biasing influence of prestimulus activity patterns are becoming increasingly recognized. So rather than actually changing a dominant paradigm, this work is an (excellent) contribution to a paradigm shift that is already taking place.

      Thank you for this feedback. We agree that the paradigm shift away from simplistic assumptions about the relationship between variability of neural responses and perception is already taking place and that this is already being appreciated by many scientists in the field. Also, we agree that the present study contributes more evidence to this emerging notion rather than changing the whole field. However, we do think that particularly the observation of opposite amplitude modulations of initial somatosensory evoked responses associated with presented stimulus intensity on the one hand and pre-stimulus excitability state on the other, provides a novel perspective for our understanding of how fundamental features of sensory stimuli are processed at initial cortical levels. Following your suggestions to tone down claims about the controversiality as well as to avoid over-generalization, we have therefore adjusted the impact statement of this manuscript to: “Larger evoked responses during initial cortical processing may reflect states of lower excitability.”

      Furthermore, we have adjusted similar statements throughout the manuscript accordingly.

      Also it should be considered that with regards to the analysis approach using CCA, the claims are mainly restricted to BA3b: i.e. while I also think that this is a strength of the current study, one should refrain from overinterpreting the results in a very generalized manner. The authors do include some "thalamus" and "late" evoked response patterns as well, however that presentation of the results is somewhat changed now as compared to the N20 (e.g. using LMEs rather than comparison of extremes; not using SEMs). The readablity of results and especially the comparison of effects would profit from a more coherent approach.

      We agree that our findings indeed have the specific focus on the N20 component and thus on its generators in BA3b. We did not intend to suggest that the effects we observed for this initial cortical response can be readily generalized to other (later) ERP components, too. However, we do believe (and hypothesize) that similar mechanisms may be in place for corresponding initial cortical responses in other sensory modalities, too – yet it is clear that we cannot test this generalization with the current study. To avoid misunderstandings of these interpretations and their limitations, we have further specified these aspects in the Discussion.

      Regarding our analyses of the later SEP (i.e., N140 component) and thalamus-related activity (i.e., P15 component), we initially decided to use linear-mixed effects models as they are mathematically equivalent to the way the sub-equations of the structural equation model were constructed (Table 2 in the manuscript). Nevertheless, we have now additionally run binning analyses to make a direct comparison also with Signal Detection Theory (SDT) parameters possible: For the N140 component, there was a significant effect on criterion c, t(31)=-3.010, p=.005, but no effect on sensitivity d’, t(31)=0.246, p=.807. For the P15 component, no effects emerged either for criterion c or sensitivity d’, t(12)=1.201, p=.253, and t(12)=-0.201, p=.844, respectively. These findings correspond well to the previous LME analyses and may indeed further facilitate the comparison with the findings for the N20 potential and pre-stimulus alpha activity. Therefore, we have added these complimentary analyses to our manuscript in the following way:

      Results: “In addition, the SDT analysis based on binning of the P15 amplitudes into quintiles neither suggested a relation with criterion c nor with sensitivity d’, t(12)=1.201, p=.253, and t(12)=-0.201, p=.844, respectively.” (page 14, lines 241 ff.)

      “These findings were in line with a separate SDT analysis: N140 amplitudes were associated with an effect on criterion c, t(31)=-3.010, p=.005, but no effect on sensitivity d’ emerged, t(31)=0.246, p=.807.” (page 15, lines 263 ff.)

      Discussion: “Crucially, our data are at the same time consistent with previous studies on somatosensory processing at later stages, where larger EEG potentials are typically associated with a stronger percept of a given stimulus (e.g., Al et al., 2020; Schröder et al., 2021; Schubert et al., 2006), as both our SDT and LME analyses of the N140 component showed.” (page 19, lines 367 ff.)

      “Yet, neither our SDT analyses nor the LME models of the thalamus-related P15 component supported this notion.” (page 21, lines 414 ff.)

      Methods (page 32, lines 681 ff.): “The effects of the EEG measures pre-stimulus alpha amplitude, N20 peak amplitude, P15 mean amplitude, and N140 mean amplitude on the SDT measures sensitivity d’ and criterion c were examined using a binning approach: […]”

      I have some concerns whether the relationship between large alpha power and more negative N20s could be driven by more trivial factors rather than the model explanations the authors develop in the discussion. Concretely the question whether phase locking of large alpha power along with >30 Hz high pass filtering could produce a similar finding as shown e.g. in Figure 2c. This is an important issue, as prestimulus alpha influences the N20 amplitudes as well as the perceptual reports.

      Indeed, potential phase-locking of alpha oscillations to stimulus onset and filter-related effects are important issues that could potentially offer an alternative explanation for the observed relationship between amplitudes of pre-stimulus alpha activity and the N20 potential of the SEP. Although such pre-stimulus alpha locking is rather unlikely in a paradigm with jittered stimulus onsets (in our case uniformly distributed between -50 ms and +50 ms; corresponding to a whole alpha cycle), we have run the following control analyses to fully exclude this possibility:

      First, we analyzed whether pre-stimulus alpha phase values were distributed uniformly and whether these phase distributions differed between high and low alpha amplitudes as well as between high and low N20 amplitudes. The phase of pre-stimulus alpha activity was obtained from a Fast-Fourier transform in the pre-stimulus time window from -200 to -10 ms, applied to unfiltered, but otherwise identically pre-processed data as in the original manuscript (i.e., applying the spatial filter of the tangential CCA component). For the FFT, we used zero padding (extending the pre-stimulus data segments to 2048 data points each) in order to obtain an interpolated frequency resolution of around 3 Hz. The phase was extracted at the frequency 9.766 Hz (i.e., the closest available frequency to 10 Hz). As visible from Supplementary Figure 3 for Peer Review, pre-stimulus alpha phases were distributed uniformly across all five quintiles of both alpha and N20 amplitudes. This observation was confirmed by the Rayleigh test (testing for deviations from a uniform distribution; Berens, 2009): Neither in the concatenated phase data of all participants, z=1.130, p=.323, nor in single-participant analyses within every alpha amplitude or N20 amplitude bin, we found evidence for a non-uniform distribution of alpha phase, all p>.367 (after Bonferroni correction for multiple testing). Thus, there was no phase-locking of pre-stimulus alpha activity that could serve as a trivial alternative explanation of the relationship between pre-stimulus alpha amplitude and N20 amplitude.

      Second, in order to examine whether the combination of our temporal filters (30 to 200 Hz band-pass for the SEP, and 8 to 13 Hz band-pass for alpha activity) could have led to the present findings, we additionally re-ran our analysis pipeline with simulated data: We mixed exemplary SEP responses with constant amplitudes (unfiltered; derived from within-participant averages), with simulated alpha band activity with randomized amplitude fluctuations, and pink noise, reflecting neural background activity as is typical for the human EEG. The SEP onsets were chosen according to our original experimental paradigm with inter-stimulus intervals of 1513 ms and a jitter of ±50 ms. Next, we filtered these mixed signals between 30 and 200 Hz in order to extract the single-trial SEPs, and estimated the pre-stimulus alpha amplitudes between -200 and -10 ms in the same way as was done in the original manuscript (i.e., by filtering the mixed signal between 8 and 13 Hz). This procedure was repeated for 32 generated data streams, containing 1000 SEPs each (corresponding to our empirical dataset of 32 participants). The resulting average SEPs did neither show a visually detectable difference between the five alpha amplitude quintiles nor indicated a random-slope linear-mixed-effects model any relation between pre-stimulus alpha amplitude and N20 amplitude on a single-trial level, βfixed=-.0005, t(255.16)=-.094, p=.925. Therefore, our findings cannot be explained by filter artifacts or residual activity leaking from the alpha frequency band to the frequency band of the N20 potential.

      Third, we re-analyzed our empirical EEG data in time-frequency space to obtain a more detailed view of the effects of pre-stimulus alpha activity on N20 amplitudes. For this, we decomposed our pre-processed but unfiltered data with wavelet transformation (complex Morlet wavelets) and calculated linear-mixed effects models on the relation between signal amplitudes in the time-frequency domain and single-trial N20 amplitudes as obtained from our original analyses. As shown in Supplementary Figure 5 for Peer Review, the time-frequency representations of the effects on N20 amplitudes indeed indicated a specific role of the alpha band, with its effects (i.e., already 200 ms before stimulus and in the upper alpha frequency range) separated from the time- and frequency range of the N20 potential of the SEP (i.e., from ~20 ms after stimulus onwards and above ~20 Hz). In addition, we ran the same analysis for the behavioral effect (i.e., perceived stimulus intensity). Also here, pre-stimulus effects were predominantly visible in the alpha band. Of note, there were also strong effects in the beta band. These may be interesting to study further in future studies – in particular, whether they reflect independent physiological processes or rather harmonics of the alpha band. Furthermore, these time-frequency representations suggest that the studied pre-stimulus effects might have been even more pronounced if we had analyzed the data in pre-stimulus time windows from -300 to -10 ms. However, in order to avoid inflating effect sizes by post-hoc data digging (“p-hacking”), we prefer to keep the original, a priori chosen time window for the main analyses of the manuscript. Yet, these onsets of pre-stimulus effects at around -300 ms may be of interest for future work. Taken together, these time-frequency analyses further support the notion that the observed relation between pre-stimulus alpha activity and N20 amplitudes is not due to technical issues (such as filter leakage and phase-locking) but rather reflects genuine neurophysiological effects of alpha oscillations on SEPs.

      We have added the time-frequency analysis, as well as the SEP simulation analysis as figure supplements to Figure 2 in our revised manuscript (page 8) since we believe that these control analyses comprehensively show that the observed effects were (a) specific to the alpha band and (b) not due to any data processing-related artifacts.

      It is important to emphasize that the model develop is a post-hoc one, i.e. the authors do not develop already in the discussion various alternative scenario results based on different model predictions. Therefore there is no strong evidence in support of the specific one advanced in the discussion.

      Thank you for raising this issue. Indeed, we cannot prove with the current findings that our proposed physiological model of the relation between alpha oscillations and the SEP is the correct model (or that it is at least the best one out of a selection of possible alternative models). To do so, future studies would be needed that can actually directly measure and/or manipulate differences in membrane potentials and trans-membrane currents. Rather, we aimed with the present study to associate a physiological meaning with the concept of excitability changes in the human EEG – offering a hypothesis that may be worthwhile to be studied (and either confirmed or rejected) in future studies. We have tried to make this motivation more explicit in the Discussion section (page 20, lines 384 ff.): “Also, we would like to emphasize that the presented mechanism reflects a hypothesized model, which shall be further supported or falsified with more targeted studies, for example, directly quantifying membrane potentials and trans-membrane currents in relation to different excitability states in somatosensation.”

    1. Author Response

      Reviewer #1 (Public Review):

      [...] The manuscript is excellently written and discusses the simulation results clearly and succinctly. The resolution of the simulations is very impressive and yields unprecedented insight into the effect of merozoite shape on alignment dynamics, which has important implications for how effectively the parasite can survive and multiply. The conclusions reached by the authors are certainly justified by the simulation data. In particular, the authors are careful not to draw conclusions beyond the limits of their study, and acknowledge other factors which may influence the merozoite shape, such as internal structural constraints and the energy of invasion following successful alignment.

      We thank the reviewer for a thorough reading of our manuscript and the very positive judgement.

      Regarding weaknesses of the manuscript, some of the explanations of the trends observed in the simulation data could be expanded slightly, to help gain a deeper understanding of the competition between adhesion and RBC deformability underlying the alignment dynamics. These are described in more detail below.

      1. Line 114 and lines 120-129: The discussion here of the trends observed in Figure 1 (including why the LE shape has a larger energy compared to the OB shape despite having a smaller adhesion area) is somewhat vague and should be developed further. For example, currently there is only a video showing the egg-like shape and a second video comparing the LE shape to a spherical shape - it would be helpful to have a further video comparing the LE and OB shapes and the different RBC deformations they cause. Moreover, the explanation of the energy/mobility of each shape in terms of curvatures (e.g. the OB shape having "lower curvature at its flat side") could be made more precise. I would expect that the adhesion area depends on how close the principal curvatures of the merozoite surface are to being equal and opposite to the natural curvatures of the RBC, since this determines the bending energy associated with wrapping the merozoite and forming short bonds. This would explain why the spherical shape is most mobile (its principal curvatures are constant so there is no region where at least one is relatively small), and why alignment is most likely to occur in the dimple of the RBC where the membrane is naturally concave-outward. For a given adhesion area, the deformation energy should depend on the difference in principal curvatures in the contact region, with a larger difference causing more bending of the RBC membrane. This difference is larger for the LE shape, since one principal curvature remains large at each point on the surface, compared to the OB shape whose principal curvatures are both small on the 'flat side' where contact is most likely to occur.

      We have expanded the discussion of these results to make it clearer. Furthermore, a new video was generated to visually see differences between different shapes.

      1. Lines 175-176: Given that the ratio A_m/A_s (adhesion area to total surface area) plays a key role in the probability of alignment, the authors should be more quantitative at this point. How does the ratio A_m/A_s (as measured directly, or indirectly e.g. by the area under the probability distributions inside the alignment region in figures 3a,b) scale with the system parameters, such as the adhesion strength and the off-rate k_off? Can it be estimated from an energy balance between RBC bending/stretching and the average adhesion energy?

      A change in A_m as a function of adhesion strength can be estimated analytically for a sphere, as was done in Hillringhaus et al. Biophys. J. 117:1202, 2019. For small deformations, there is essentially a competition of bending and adhesion energies, while for strong adhesion, stretching-elasticity contribution becomes important. We have included this theoretical result into the manuscript and discuss its implications.

      1. Line 197-198 and Figure 4c: Why is the deformation energy associated with the OB shape much lower than all other shapes for values of k_off/k_on^{long} smaller than 2?

      For k_off/k_on^{long} < 2, the magnitude of local curvature has a pronounced effect. For the OB shape, a large adhesion area is formed over the area with very low curvature, and close to the rim where the curvature is large, the adhesion strength may not be strong enough to induce membrane wrapping and deformation. For other shapes, the adhesion strength is large enough to lead to partial wrapping of the parasite by the membrane over moderate curvatures. As a result, the integrated deformation energy is significantly lower for the OB shape than for the other shapes in this regime of adhesion strengths. We have added this clarification to the manuscript.

      1. Alignment requires that the distance between the merozoite apex and RBC membrane is very small, and the alignment criteria necessitate examining small changes in the apex angle \theta from \pi. Can the authors comment on how sensitive are the results to the numerical discretisation used?

      The discretization length does affect the tightness of the alignment criteria. In our simulations, the average discretization length of the RBC membrane is about l0=0.2 m. The half circumference length of a parasite (corresponding to angle ) is R, which is equal to about 12 l0 for R=0.75 m, such that our angle resolution with respect to the parasite size is 0.1. Therefore, we use 0.2 for the alignment criteria, which is large enough to avoid strong discretization effects. Simulations with a finer discretization are possible, but they become very expensive computationally.

      Reviewer #2 (Public Review):

      [...] A major strength of the results is that it investigates an unstudied problem in malarial pathogenesis. The results pertaining to adhesion strength may be informative for preventing the organism from invading red blood cells. A primary weakness is that there is too little detail provided in the methods for this reviewer to adequate assess the computational method. Secondly, the results are somewhat inconclusive. While the egg-shape performs better than certain other shapes, there is no clear final understanding why this shape is preferred over the spherical or short ellipsoidal shapes. However, this possibly provides some clues as to why a certain malarial species does actively adopt a spherical shape during red blood cell binding and invasion.

      We thank the reviewer for a positive judgment of our manuscript. We have significantly expanded the methods section, so it should contain now all necessary simulation details. We agree with the reviewer that the conclusions about shape advantages/disadvantages are equivocal to some extent, but this is exactly what our simulation data show. However, from our data it is clear that the two shapes (i.e. egg-like and sphere) stand out, and they also correspond to real examples of merozoite shapes. As the reviewer points out, we do discuss some clues for the importance of parasite shape in the alignment process.

      Overall, the authors achieved their aims by quantitatively assessing the affect of parasite shape and adhesion strength on cell alignment, which is a proxy for invasion. The discussion at the end of the manuscript provides an accurate evaluation of the results that puts them into the context of invasion. While to some extent the results presented here are inconclusive, I do think that this paper achieves an important goal for its field. This is an understudied area pertinent to a major disease. This manuscript has the potential to bring questions of the biophysics of malarial invasion out to the broader community, specifically introducing these questions to biophysicists as well as microbiologists. Furthermore, the results naturally lead to new questions. If the spherical and egg shapes do not confer a strong advantage, then these specific shapes must also play a role in other processes. The authors do suggest some possibilities in the Discussion. That their remain interesting questions is a great spur for future work.

      Thank you for emphasizing the importance of multidisciplinarity. We also hope that our work will ignite interest in different communities, as only a multidisciplinary effort can bring us much closer to understanding of parasite alignment and invasion, which clearly include a combination of different mechanical and biochemical processes.

    1. Author Response

      Reviewer #1 (Public Review):

      [...] Their studies were complemented by transcriptomics and metabolomics and these results support the general conclusions that pollen contains diverse carbon sources which could be used in complementary ways by the different species, which have diverse metabolic capabilities encoded in their genomes.

      Reply: We thank the reviewer for the positive assessment of our manuscript.

      One of the points that was not completely explored in the paper is what happens in the simplified diet both in vitro and in the Bee gut. They propose in the discussion that in the presence of few and simple carbon sources (sugars) there is competition for nutrients and competitive exclusion is driving loss of some species. But this is not fully addressed in the paper.

      Reply: All four species can colonize the gut individually and grow on their own in axenic cultures when providing the simple sugars or the pollen as the only carbon source. When cultured together, all four strains are stably maintained in the presence of pollen. However, three of the four strains steadily decrease in abundance in the simple sugars. These findings are, in our opinion, consistent with the consumer-resource model (more resources = more species that can coexist) and the competitive exclusion principle which predicts that if two or more strains compete for the same nutrients they will not be able to coexist. We have added a corresponding section on line 423-425.

      The system they use (with 4 closely related bacterial species) is a simplified system. Therefore, it is not clear if the same general findings will hold in more complex systems. But the results supporting that nutrient complexity (in diet) and metabolic diversity (from the microbial side) are key factors to enable co-existence and persistence of complex microbiota communities are strong and likely generalizable. Although, it is possible that with other communities and other hosts other factors will also come into play. Nonetheless, the current study is important because it sets a good example for how these questions can be addressed to study more complex systems.

      Reply: It is true that bacterial coexistence does not necessarily need to be dependent on the nutrient complexity and that in other communities the host, the structure of the environment, or cross-feeding activities may play a more important role. We have discussed this point in the revised manuscript starting on line 423 and on line 427.

      Overall, the study described here is complete, and rigorous, except for a few points that still need to be addressed and clarified. Namely, it would be interesting to understand what drives exclusion of some members of the community in the simplified diet.

      Reply: See our reply above.

      Importantly, the current study opens the door for new studies (including in vitro studies) on the identification of network interactions that are important for Microbe-Microbe interactions that enable co-existence in other systems. Additionally, this study also highlights the importance of identifying the relevant nutritional (and metabolic) conditions for addressing those questions given the importance of the metabolic context in shaping microbe-microbe interactions.

      Reply: Thank you. We agree.

      Reviewer #2 (Public Review):

      [...] Strengths: The use of community profiling, transcriptomics, and metabolomics adds depth, as does the comparison of defined culture conditions to the host environment. The main conclusions drawn by the authors is that the presence of pollen is necessary for gut species to coexist, and that the different species, although closely related, respond in distinct ways to nutrients in pollen and consume different profiles of nutrients from pollen.

      Reply: We thank the reviewer for the positive feedback and the many valuable comments which helped us to further strengthen our manuscript.

      Weaknesses: The main weakness I see with this work is the choice of in vitro comparison conditions. The strains are cultured either on pollen or sugar water, whereas in vivo bees are fed a diet of pollen and sugar water, or only sugar water. A direct comparison is possible between the strains grown on sugar water in vitro or in vivo, but I think that in several places, the authors may have to reconsider or modify their interpretations comparing in vitro culture on pollen/pollen extract with the in vivo growth of the community on pollen and sugar water. Because there is sugar in the bee diet, differences in assembly dynamics, transcription, or metabolite consumption between pollen-containing culture conditions and the bee gut might stem from the dietary intake of sugar, or from an aspect of the host environment.

      Reply: We agree with the reviewer that the nutrient conditions that were used in vitro and in vivo are not identical and may have impacted the relative abundance of some of the community members, the transcriptional profiles, or the metabolite changes. Nevertheless, we believe that our experimental design is valid to test the main hypothesis of our study, i.e. a complex, pollen-based diet facilitates coexistence, while simple sugars lead to the dominance of a single strain independent of the environment (culture tube versus host). An important point to consider here is that bees will pre-digest the consumed pollen, and partially absorb dietary nutrients such as amino acids, glucose, and fructose, before they reach the bacteria in the hindgut. Consequently, the in vivo and in vitro conditions will never be the same even if we would have used the identical nutrients in our treatments. Also, pollen by itself contains glucose, fructose, and sucrose. So, although we have not added glucose to the in vitro pollen condition, this simple sugar was present in the corresponding condition. We have added a corresponding section in the discussion on line 402-422. This said, while we cannot recapitulate the exact same nutritional conditions in vitro, we still think that our main conclusions hold which is that we can recapitulate the pollen-dependent coexistence found in vivo.

      Reviewer #3 (Public Review):

      [...] Overall, the paper is strong and the arguments and conclusions put forth are well supported by the data. I only have a few suggestions:

      Reply: We thank the reviewer for the positive evaluation of our manuscript.

      1) The study focuses on one strain each of the 4 Firm-5 species; however, there is diversity within each species. This is only briefly mentioned in the paper at the very end, and I think the authors should address this a bit more directly. In particular, they have previously generated a large amount of genomic data from some of these other strains, so it is likely possible to infer or speculate, based on this data, whether they expect different strains within each species to utilize similar nutrients. Also, I'm wondering if the authors can comment on how their findings could extend to the related bumble bee gut microbiome. Such a discussion would help enhance the applicability and importance of this study.

      Reply: We agree that the large amount of strain-level diversity within a given species is an important point to consider. However, we would like to not expand this point much further as it would require a relatively complex genomic analysis. Also, considering that many of the strain-specific transcriptional changes are in genes shared with the other species, I am not sure how much such an analysis would reveal. Anyways, we plan to compare the coexistence between strains from the same versus another lineages in a follow-up study.

      As for the bumble bees, we currently do not know how many strains or species of Lactobacillus Firm5 can coexist in bumble bees. Therefore, we feel that a discussion extending to bumble bees would be too speculative. However, we included a sentence in the discussion which states that since pollen facilitates coexistence, it follows that dietary differences are likely to influence the diversity of Lactobacillus Firm5 and give the example of the Asian honey bee, which seems to only harbor one species of this phylotype. See line 479-488.

      2) It is interesting that different species ended up dominating in the in vivo vs. in vitro simple sugar-based communities. What do the authors think may be behind this difference?

      Reply: This is indeed an interesting point. We have not used the same sugars in vivo (sucrose) and in vitro (glucose). Moreover, the nutritional and physicochemical conditions in the hindgut are likely different from those found in a culture tube. We have mentioned that these are potential reasons for the observed differences in the relative abundance of different community members between in vivo and in vitro conditions on line 402-422 of the manuscript.

      3) Since the observed coexistence of these gut microbes is largely due to nutritional niche partitioning, it would be helpful if the authors can comment on the natural variation of key pollen derived metabolites, and if/how we could expect ecological variation in the bee microbiome due to plant pollen availability based on biogeography and seasonality.

      Reply: We agree and have included a corresponding sentence in the discussion on line 479. See also our reply to point 1.

      4) The supplementary information is nicely documented and accessible, but I think it would be even more useful if genome-wide data for the RNA-seq results, not just for select genes, are made available. Furthermore, I suggest including descriptive titles and labels within the supplementary Excel files, as there are many separate sheets and it is not always clear what each one shows.

      Reply: This has been included in the revised manuscript.

  3. migration-encounters-prototype.netlify.app migration-encounters-prototype.netlify.app
    1. Isabel:        Yeah. That's good then. This is a weird question, so do with it what you will. Do you feel Mexican or American?Nadxieli:        Mexican. Hell, yeah.Isabel:        Hell, yeah. Why is that?Nadxieli:        Well, I don't know. This is who I am, you cannot change that. Even though you move out a country, a continent, you are what you are at the end, I think you never forget that. I think when you forget that is when you lost your identity, most likely people take advantage of that. So as long as you remember who you are and where you're coming from, you're good.Isabel:        I like that.Nadxieli:        Yeah.Isabel:        I know some people say, oh, you talk different or you have these different things about you because of your time in the US, and some people may say "Oh, you're from neither here or there," some people they don't know you may not have the same experience, but some people think you're too Mexican to be American or too American to be Mexican. That's a trend you see. What would you say to that?Nadxieli:        I would say I'm 100% Mexican. I never changed that. It took a while to get into the Mexican stuff again. But at the end we already knew that. So it was not that hard. I also think that it's because I didn't spend a lot of years there. So I know people working where I'm working, they spent their whole life, so that will be different if I spend like 22 years out of 23, I guess that will be different.

      Identity, Mexican;

    1. Anita: Did Gerald Ford know you were undocumented?Rodolfo: No, Gerald Ford didn't know I was undocumented, no. I was still very young at that point. My mother and my family always told me, "Don't let anybody know you're undocumented.” If somebody finds out, for whatever reason, there's some people who just are plain out racist or don't want people like me in the States. Sometimes they just do things to... I don't know. That's what I understood and that's what I took in and that's what I applied to my life. It's like living a secret, it was like living a second life or whatever. It’s like, "Oh shit, why do I have to lie, why?" I guess it's neither here nor there now, right? I'm here in Mexico.Anita: That must have been incredibly difficult. I know personally, because I've had to keep secrets.Rodolfo: Yeah, I guess it's one of those things where you think it's never really gonna affect you, until you're in the back of the DHS, the Department of Homeland Security, van. You're next to a whole bunch of people you never met, and they're also in the same position. Some don't even speak English. You don't really understand how immediately it can affect you until it affects you. I never thought it would affect me. Okay, well I mean, I'm working, I'm going to school—I'm in high school—I'm doing this, this and that. Some of my friends who are students already dropped out. Did everything, they’ve already gone to prison and back and everything, and they haven't even hit their 21st birthday.Rodolfo: And I'm still good, I'm still good. I may not be a straight A student or anything, but hey man, I'm still here! Why can't I have the same privilege as you all do? Why can't I get my license? You know how happy I was when I got my license here, damn. I love to drive, that's one of my passions. Always, always, always I love to drive. I couldn't get my license over there. I remember even in high school in drivers ed, I knew what the answer was, but I asked my mom, “Hey mom, can I apply for drivers ed, so I can get my license? “She was like, "You know you can't get your license." Again, one of the primary things, I’m like damn, I'm just not gonna be able to drive all my life? Or if I do drive and I get pulled over—as a matter of fact, that's the reason why I got deported, driving without a valid drivers license.Rodolfo: I never got why the paper said, "Driving on a suspended license." I would always ask them, "If I don't have a license, why is it suspended?" They just told me, "Because you have a drivers license number, but you don't have a drivers license? I'm like, "Okay, so if I have a drivers license number, why can't I get my drivers license?" "You don't have the proper documentation." I'm like, "But I have my..."Rodolfo: One day I thought, “Well why don't I just grab the driver license number and have somebody make me a fake drivers license, and put the drivers license on there?” But see, if I get caught with it, now I'm in more trouble, and now I'm seen as a real criminal, because now I'm going around the system once again. That's why we don't want you here, because you're gonna do things like that. [Exhale] I haven't talked about this in a while. It just makes me want to…I don’t know.

      Time in the US, Immigration Status, Being secretive, Hiding/lying, In the shadows, Living undocumented; Reflections, The United States, US government and immigration; Feelings, Frustration; Time in the US, Jobs/employment/work, Documents, Driver's license, Social security card/ID

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their thoughtful comments. We were delighted the reviewers found our results “compelling”, “striking”, “well presented”, “implications exciting”, “excellent results! really nice!”, “this microscopy is beautiful!” and “translational-dependence (of mRNA localization) in a transcript-specific way without perturbing translation globally”, which is a “complete surprise, and opens exciting doors to investigate how translation leads to mRNA organization and its connection to **tissue development” and “may represent a new pathway of mRNA transport”.

      We also appreciated the comments regarding the “wide appeal”, “broad readership of readers”, and “broad interest” the reviewers gave to our manuscript regarding its impact, and also the comments of “well-written (and) well-cited”.

      We can address all the concerns raised by the reviewers. In addition to textual changes, we will add the following to the Results section:

      1. Additional quantitation of smFISH beyond Figure 2;
      2. Addition of a negative (uniformly distributed) mRNA control and its quantitation;
      3. Western blots for our ΔATG lines to determine what and how much protein is made.
      4. Unbiased nuclear masking. Our specific responses are shown below, in blue.

      Reviewer #1

      **Major comments**

      Fig. 1: Main and supplementary figures present smFISH signals for eight localized mRNAs, while in the results section authors describe that they analyzed twenty-five transcripts. Authors should explain the choice of transcripts presented in the paper.

      We will include a panel in Fig. S1E to show every mRNA that we tested, and we will edit Table 1 to describe the observed subcellular localization.

      We will edit the text, adding a few sentences to clarify, along the lines of: “O**ur survey revealed mRNAs with varying degrees of localization within epithelia that we divided into three classes: CeAJ/membrane localized, perinuclearly localized, and unlocalized (Fig. 1 and S1 and Table 1).” and “The rest of our tested mRNAs did not possess any evident subcellular localization at any of the analyzed embryonic stages/tissues and were not further investigated (Fig. S1E and Table 1).

      Moreover, smFISH signal of different localized mRNAs in epidermal cells was visualized at different stages (bean, comma or late comma), and authors did not comment what was the reason of such conditions. This may make transcripts localization results difficult to interpret, as further analysis showed that mRNA localization varied in a stage-specific manner.

      We have clarified this point now in Figure legend 1: “Specific embryonic stages were selected for each transcript based on the highest degree of mRNA localization they exhibited.

      Did author used smFISH probes designed against endogenous mRNAs for all tested transcripts?

      We did not. We clarify this point now in Materials and methods: “All probes were designed against the endogenous mRNA sequences except dlg-1 (some constructs), pkc-3, hmp-2, spc-1, let-805, and vab-10a, whose mRNA were detected with gfp probes in their corresponding transgenic lines (Table S2). An exception to this is Fig. S1A where we used probes against the endogenous dlg-1 mRNA.”.

      Marking dlg-1 mRNA as dlg-1-gfp suggests that smFISH probe was specific for gfp transcript. Is it true? If yes, authors should compare localization of wild-type endogenous dlg-1 mRNA with that of the transcript encoding a fusion protein, to confirm that fusion does not affect mRNA localization.

      Yes, in Fig. 1C we show smFISH for GFP (i.e., the tagged dlg-1 only). In Fig. S1A, we show smFISH against endogenous dlg-1. Tagged and endogenous dlg-1 mRNAs are both localized. We clarified this point in the main text: “Five of these transcripts were enriched at specific loci at or near the cell membrane: laterally and at the CeAJ for dlg-1 (Fig. 1C for endogenous/GFP CRISPR-tagged dlg-1::gfp mRNA and S1A for endogenous/non-tagged dlg-1 mRNA), (…)”. And in the Supplemental figure legend (Fig. S1A): “Endogenous/non-tagged dlg-1 mRNA shows CeAJ/membrane localization like its endogenous/GFP CRISPR-tagged counterpart.

      Fig. 2B: Authors conclude that at later stages of pharyngeal morphogenesis mRNA enrichment at the CeAJ decreased gradually in comparison to comma stage. Data do not show statistically significant decrease in ratio of localized mRNAs - for dlg-1: bean: 0.39{plus minus}0.09, comma: 0.29{plus minus}0.08, 1.5-fold: 0.30{plus minus}0.09; for ajm-1: bean: 0.36{plus minus}0.08, comma: 0.30{plus minus}0.05, 1.5-fold: 0.28{plus minus}0.09.

      t-test (one-tailed) analysis revealed a significant difference between bean and comma stages for both dlg-1 and ajm-1 mRNAs. Statistical analysis and data will be provided.

      Fig. 4: What was the difference between the first and the second __ΔATG transgenic line? Authors should analyze the size of the truncated DLG-1 protein that is expressed from the second Δ__ATG transgenic line that localizes to CeAJ. Knowing alternative ATGs and protein size may suggest domain composition of the truncated protein. This will allow to confront truncated protein localization with the results from.

      We will perform a Western blot to determine the size and levels of proteins produced.

      Fig. 5. Moreover, to prove that the localization of dlg-1 mRNA at the CeAJ is translation-dependent, additional experiment should be performed where transcripts localization will be analyzed in embryos treated with translation inhibitors such as cycloheximide (translation elongation inhibitor) and puromycin (that induces premature termination).

      We believe this comment might refer to Fig. 4. If this is the case: drugs like cycloheximide and puromycin affect the translation of the whole transcriptome, whereas with our ΔATG experiment, we aimed to target the translation of one specific transcript and avoid secondary effects. Nevertheless, we understand Reviewer #1’s concern and will include a second experiment. In our hands, cycloheximide and puromycin have never worked in older embryos (it’s hard to get past the eggshell and into the embryo). Instead, we will use stress conditions, which induce a “ribosome drop-off” (Spriggs et al., 2010). Heat stress has been shown to decrease polysome occupancy (Arnold et al., 2014). We, therefore, have used heat-shock at 33°C for 30’, and the results are now shown in Fig. S4. These show the loss of RNA localization upon heat shock.

      **Minor comments**

      In the introduction section authors should emphasize the main goal and scientific significance of the paper.

      We added this sentence to state the significance before summarizing the results: “To investigate the impact of mRNA localization during embryonic development, we conducted a single molecule fluorescence in situ hybridization (smFISH)-based survey (…)” and “Our data demonstrate that the dlg-1 UTRs are dispensable, whereas translation is required for localization, therefore providing an example of a translation-dependent mechanism for mRNA delivery in C. elegans.” To state the significance.

      Fig 1A: It's hard to distinguish different colors on the schematics. Schematics presents intermediate filaments that are not included in the Table 1.

      We modified Table 1 based on this and other reviewers’ comments.

      Fig. 1C: dlg-1 transcript is marked as dlg-1-gfp on the left panel and dlg-1 on the right panel.

      Corrected.

      Fig. 2B: Axis labels and titles are not visible, larger font size should be used.

      We will modify the graph (following Reviewer #2’s suggestion) and axes label and title sizes will be taken into account.

      Fig. 5C: Enlarge the font size.

      Will do.

      Fig. S2: Embryonic stages should be marked on the figure for easier interpretation.

      Added.

      Reviewer #2

      Major comments

      Figure 2 requires a negative (or uniformly distributed) mRNA control for comparison. Figure 2C should be quantified. The plot quality should be improved, and appropriate statistical tests should be employed to strengthen the claimed findings.

      We will add a negative control (jac-1 mRNA), and quantify Fig. 2C as well. Plots will be changed accordingly to the suggestion.

      Most claims of perinuclear mRNA localization are difficult to see and not well supported visually or statistically. The usage of DAPI markers, membrane markers, 3D rendering, or a quantified metric would bolster this claim. Also, sax-7 is claimed to be perinuclear and elsewhere claimed to be uniform then used as a uniform control. Please explain or resolve these discrepancies more clearly.__

      Regarding perinuclear mRNAs:

      We are not trying to make a big statement out of these data as perinuclear (ER) localization of mRNAs coding for transmembrane/secreted proteins is well known. The aim of our study was to describe transcript localized at or in the proximity of the junction. However, we thought it was worth mentioning these examples of perinuclearly localized mRNAs (hmr-1, sax-7, and eat-20) for two reasons: scientific correctness – show accessory results that might be interesting for other scientists – and use as positive controls for our smFISH survey – these mRNAs were expected to localize perinuclearly for the reasons mentioned above. We will rewrite the text to make these points clearer.

      Regarding sax-7 mRNA:

      sax-7 mRNA localizes perinuclearly in sporadic instances (Fig S1C), but it is predominantly scattered throughout the cytoplasm (i.e., unlocalized). It presumably localizes perinuclearly in a translation-dependent manner as sax-7 codes for a transmembrane protein that would be targeted to the ER. We have described this ER-type of localization in the introduction and reiterated it partially in the first paragraph of the results. sax-7 UTRs are therefore presumably not responsible for subcellular localization, which would instead depend on a signal sequence. We will better clarify this point in the main text.

      The major concern about the paper is the data display and interpretation of Figure 5C. I'm not comfortable with the approach the authors took of blurring out the nucleus. A more faithful practice would be to use an automated mask over DAPI staining or to quantify the entirety of the cell. If the entirety of the cell were quantified, one could still focus analysis on specific regions of relevance. The interpretations distinguishing membrane versus cytoplasmic localization (or mislocalization) are hard to differentiate in these images especially since they are lacking a membrane marker. The ability to make these distinctions forms the basis of Tocchini et al's two pathways of dlg-1 mRNA localization. These interpretations also heavily rely on how the image was processed through the different Z-stacks, and it's not clear to me how that was done. For example, the diffusion of mRNA in figure 5F and 5I are indistinguishable to my eye but are claimed to be different.

      In the images, the nuclei have been blurred to allow the reader to focus on the cytoplasmic signal and not on the nuclear (transcriptional) signal as it is not meaningful for this study. In the quantitation, the nuclear signal has been unbiasedly and specifically removed from the analysis by cropping out the DNA signal from the other channels. The frontal plane views of the seam cells in Fig. 5 show maximum intensity projections (MIPs) of 3 Z-stacks (0.54 µm total) that each contain nuclei and, therefore, the transcriptional signal (schematics in Fig. 5B). We will clarify these points in the text.

      Regarding cytoplasmic versus membrane-associated mRNAs, although we did not have a membrane marker, we relied on the brightness of the DLG-1::GFP signal to identify the cell borders (i.e., membranes) after over-exposure. This approach allowed us to discern apicobasal and apical sides for the intensity profile analyses. We will clarify this point as well in the text and, in parallel, we will try a different approach using transverse sections on top views to clarify our data.

      To my eye, it seems that Figure 5 could be more faithfully interpreted to state that DGL-1 protein localization depends on the L27-SH3 domains. The Huk/Guk domains are dispensable for DLG-1 protein localization; however, through other studies, we know they are important for viability. In contrast, dlg-1 mRNA localization requires all domains of the protein (L27-Guk). It is exceptionally interesting to find a mutant condition in which the mRNA and protein localizations are uncoupled. It would be very interesting to explore in the discussion or by other means what the purpose of localized translation may be. Because, in this instance, proper mRNA localization and protein function are closely associated, it may suggest that DLG-1 needs to be translated locally to function properly.

      We will rewrite the Results and Discussion to clarify our model. We agree that L27 and SH3 domains are critical, but we also detected effects of the HooK/GuK domains. We have refined our model to describe functions of the N and C termini for membrane or junctional localization.

      The manuscript requires an improve materials & methods description of the quantification __procedures and statistics employed.__

      We will add these points.

      Minor & Major comments together - text

      Summary statement: Is "adherent junction" supposed to be "adherens junction?"

      Corrected.

      Abstract: Sentence 1, I think they should add a caveat word to this sentence. Something like "...phenomenon that can facilitate sub-cellular protein targeting." In most instances this isn't very well characterized or known.

      Corrected.

      In the first paragraph, it might be good to mention that Moor et al also showed that mRNA localize to different regions to alter their level of translation (to concentrate them in high ribosome dense regions of the cell).

      Added as follows: “For example, a global analysis of localized mRNAs in murine intestinal epithelia found that 30% of highly expressed transcripts were polarized and that their localization coincided with highly abundant regions in ribosomes **(Moor, 2017).”

      There are some new studies of translation-dependent mRNA localization - that might be good to highlight - Li et al., Cell Reports (PMID: 33951426) 2021; Sepulveda et al., 2018 (PCM), Hirashima et al., 2018; Safieddine, et al 2021. Also, Hughes and Simmonds, 2019 reviews membrane associated mRNA localization in Drosophila. And a new review by Das et al (Nat Rev MCB) 2021 is also nice.

      We will add them to the text.

      Parker et al. did not show that the 3'UTR was dispensable for mRNA localization. They showed the 3'UTR was sufficient for mRNA localization.

      Quoting from the paper Parker et al.: “3′UTRs of erm-1 and imb-2 were not sufficient to drive mRNA subcellular localization. Endogenous erm-1 and imb-2 mRNAs localize to the cell or nuclear peripheries, respectively, but mNeonGreen mRNA appended with erm-1 or imb-2 3′UTRs failed to recapitulate those patterns (Fig. 4A-D).” We will make this point clearer in the rewritten text.

      In the second paragraph, the sentence about bean stages is missing one closing parenthesis.

      Corrected.

      Last paragraph: FISH is fluorescence, not fluorescent.

      Corrected.

      Both "subcellular" and "sub-cellular" are used.

      Corrected.

      Minor comments – Figures

      Figure 1

      o Figure 1A is confusing. It's not totally clear what the rectangles and circles signify. There are many acronyms within the figure. Which of the cell types depicted in the figure are shown here? For example, for the dorsal cells, which is the apical v. basal side?

      We tried to simplify the cartoon for a general C. elegans epithelial cell. We followed schematics already shown in previous publications to maintain consistency. Acronyms and color-codes are listed in the corresponding figure legend and have been better clarified.

      o Some of the colors are difficult to distinguish, particularly when printed out or for red/green colorblind readers. Is erm-1 meant to be a cytoskeletal associated or a basolateral polarity factor?

      We understand the issue, but unfortunately, with 8 classes of factors, shades of gray might not solve the problem. We tried to circumvent the red-green issue changing red to dark grey. Furthermore, we added details about shapes to the figure legends. We will work to make the colors work better.

      ERM-1 is a cytoskeletal-associated factor.

      o The nomenclature for dlg-1 is inconsistent within "C".

      Corrected.

      o Please specify what the "cr" is in "cr.dlg-1:-gfp" in the legend.

      Added.

      Figure 2

      o Can Figure 2C be quantified in a similar manner to 2A/2B?

      Currently our script cannot do that, but we will try to optimize it to be able to quantify this type of images.

      o 2B - please jitter the dots to better visualize them when they land on top of one another

      Yes, we will.

      o Please include a negative control example, a transcript that is not peripherally localized for comparison.

      Yes, we will.

      o There is no place in the text of the document where Fig 2C is referenced

      Corrected (it was wrongly referred to as “2B”).

      o I can't see any discernable ajm-1 localization in Fig 2A.

      We added some arrowheads to point at specific examples and increased the intensities of the corresponding smFISH signal for better visualization.

      o I can't see any dlg-1 pharyngeal localization in Fig2C.

      We added some arrowheads to point at specific examples and increased the intensities of the corresponding smFISH signal for better visualization.

      o More details on how the quantification was performed would be welcome. Particularly, in 2B, what is the distance from the membrane in which transcripts were called as membrane-associated? What statistics were used to test differences between groups?

      We will add a full description of the script used as well as the statistic details.

      Figure 3

      o Totally optional but might be nice: can you make a better attempt to approximate the scale of the cartoon depiction?

      The UTRs, especially the 5’ one, are much smaller than the dlg-1 gene sequence. A proper scaling of the cartoon to the actual sequences, would draw the attention away from the main subjects of this figure, the UTRs. Nevertheless, we made sure it is clear in the corresponding figure legend that the cartoon is not in scale: “The schematics are not in scale with the actual size of the corresponding sequences. UTR lengths: dlg-1 5’UTR: 61 nucleotides; sax-7 5’UTR: 63 nucleotides; dlg-1 3’UTR: 815 nucleotides; unc-54 3’UTR: 280 nucleotides.”

      o The GFP as an asterisk illustration may be confusing for some readers. Could you add another rectangular box to depict the gfp coding sequence?

      Corrected.

      o This microscopy is beautiful!

      Thanks Reviewer #2!

      o Were introns removed? Is the endogenous copy still present?

      All the transgenes were analyzed in a wild-type background, therefore, yes, the endogenous copy was still present. All the transgenes possessed introns. We will change the corresponding text as follows: “To test whether the localization of one of the identified localized mRNAs, dlg-1, relied on zip codes, we generated extrachromosomal transgenic lines carrying a dlg-1 gene whose sequence was fused to an in-frame GFP and to exogenous UTRs.”. In the figure “dlg-1 ORF” has been replaced with “dlg-1 gene”.

      o The wording in the legend "CRISPR or transgenic" may be confusing as Cas9 genome editing is still a form of transgenesis.

      We added “extrachromosomal” to clarify the nature of the mRNA.

      o The authors state that the 5'-3'UTR construct produces perinuclear dlg-1 transcripts but in the absence of DAPI imaging, it's not clear that this is the case.

      We could not find such a statement, but we tried to clarify the localization of these mRNAs in the text: “The mRNA localization patterns of the two UTR reporters were compared to the localization of dlg-1 transcripts from the CRISPR line (“wild-type”, Fig. 3A; Heppert et al., 2018), described in Fig. 2. Both reporter strains showed enrichment at the CeAJ and localization dynamics of their transcripts that were comparable to the wild-type cr.dlg-1 (Fig. 3B). These results indicate that the UTR sequences of dlg-1** mRNA are not required for its localization.”

      o Which probe set was used? The gfp probe?

      Yes, please see the main text: “Given that the transgenic constructs were expressed in a wild-type background, smFISH experiments were conducted with probes against GFP RNA sequences to focus on the transgenic dlg-1::GFP mRNAs (cr.dlg-1 and tg.dlg-1).”

      o Here, sax-7 is used as a uniform control, but sax-7 is claimed in Fig S1B-D as being perinuclear. This is a bit confusing.

      sax-7 mRNA localizes perinuclearly in sporadic instances (Fig S1C), but it is predominantly scattered throughout the cytoplasm (i.e., unlocalized). It presumably localizes perinuclearly in a translation-dependent manner as sax-7 codes for a transmembrane protein that would be targeted to the ER. We have described this ER-type of localization in the introduction and reiterated it partially in the first paragraph of the results. sax-7 UTRs are therefore presumably not responsible for any subcellular localization, which would instead rely on a signal sequence. We will better clarify this point in the main text.

      Figure 4

      o Excellent results! Really nice!

      Thanks Reviewer #2!

      o Fig 4A. The GFP depicted as a circle is strange.

      We changed it into a rectangle.

      o Fig 4A. Can you include the gene/protein name for easy skimming?

      Added.

      o Fig 4B. the color here is too faint and it is unclear what is being depicted. Overall, this part of the figure could be improved.

      We are optimizing the coloring and simplifying the schematics.

      o Were the introns removed?

      No, the introns were maintained in this and in all our transgenic lines. We described our transgenic lines in the materials and methods section (now with more detail). What we depict in the scheme (Fig. 4A) is the mature RNA (now specified in the figure), therefore no introns depicted. We will also specify this in the main text.

      Figure 5

      o Fig 5A. can you add the gene/protein name

      Added.

      o Fig 5B. Can you make the example apicobasal (non-apical) mRNA more distinctive? If it had its own peak in the lower trace, the reader would more clearly understand that this mRNA will be excluded from apical measurements whereas it will be included in apicobasal measurements.

      We actually wanted to show this specific example: a cytoplasmic mRNA and a junctional mRNA may seem close from the apicobasal analysis (partially overlapping peaks that Reviewer #2 mentioned). With the apical analysis, instead, we can show that these mRNAs are actually not close, and they belong to two different compartments (cytoplasm and junction). We would therefore like to keep the current scheme, while better clarifying this point in the corresponding figure legend.

      o D' - I' The grey font is too light.

      Noted. We will change it.

      o D' - I' The inconsistent y-axis scaling makes it difficult to compare across these samples. Can you set them to the same maximum number?

      The values are indeed quite different. We tried to use the same scale, but this would make some of the data unappreciable. The idea was to evaluate, within each graph, how mRNA and protein are localized relative to the junctional marker. We will make this clearer in the text.

      o D' - I' The x-axis labels are formatted incorrectly

      Corrected.

      o The practice of masking out the nucleus appears to remove potentially important mRNAs that are not nuclear localized. This could really impact the findings and interpretation. Instead, consider an automated DAPI mask.

      The masking on the images is not the same used for the analysis: in the images, a shaded circle has been drawn on the DNA channel and moved onto its corresponding location in the other channels or merges. For the analysis, the DNA signal has been specifically removed in the channel with the smFISH signal. Given that the analysis has been performed on maximum intensity projections of 3 Z-stacks, we believe we did not remove any non-nuclear mRNA. We will clarify this point in Materials and methods.

      o I can't see what the authors are calling membrane diffuse versus cytoplasmic. This is making it hard for me to see their "two step" pathway to localization.

      We will add in Fig. 5B-C an example of a membrane localized mRNA. Furthermore, we will add transverse sections of membrane and cytoplasm to make the date clearer to the reader.

      o Can more details of the quantification be included? How were Z-sections selected, chosen for inclusion? Which Z-sections and how many were selected?

      We will add the details to Materials and methods.

      o Also, why do these measurements focus on what I think are the seam cells when Lockwood et al., 2008 show the entire epithelium that is much easier to see?

      We are focusing on the seam cells at the bean stage as these are the cells and the embryonic stage where we see the highest localization of dlg-1 mRNA in the wild-type.

      o Please name these constructs to correlate the text more explicitly to the figures.

      Added.

      o How many embryos were analyzed for each trace? How many embryos showed consistent patterns?

      We will add the details of the analysis to Materials and methods.

      o Why were these cells used for study here? Lockwood et al., 2008 use a larger field of epithelial cells for visualization.

      As stated before: we are focusing on the seam cells at the bean stage as these are the cells and the embryonic stage where we see the highest localization of dlg-1 mRNA in the wild-type.

      Figure 6

      There are major discrepancies between what this figure is depicting graphically and what is described in the text. Again, I'm not comfortable making the "two step" claims this figure purports given the data shared in Figure 5.

      We are planning to re-write the last part of the results to better clarify our two-step model. A two-step model had been previously suggested in McMahon et al., 2001, where they could show that DLG-1 and AJM-1 (referred to in that publication as JAM-1) are initially localized laterally and only later in development are then enriched apically. Our data agree with McMahon very well, so we used the earlier study as a start. We will cite and explain this paper in greater depth during the rewriting.

      **Minor comments - Tables & Supplemental Figures**

      Table 1

      I think this table could be improved to more clearly illustrate which mRNAs were tested and what their mRNA localization patterns were (for example, gene name identifiers included, etc). Could the information that is depicted by gray shading instead be added as its own column? For example, have a column for "Observed mRNA localization"

      We modified Table 1 based on these and the other reviewers’ comments.

      Can you add distinct column names for the two columns that are labeled as "protein localization - group"

      We modified Table 1 based on these and the other reviewers’ comments.

      Can you also add which of these components are part of ASI v. ASII (as described in the introduction?)

      A new table has been added with the factors belonging to the two adhesion systems (same color code as in Table 1).

      Supplemental Figure 1

      It is hard to see that some of these spots are perinuclear. More information (membrane marker, 3D rendering, improved metrics) is required to support this claim.

      We are not trying to make a big statement out of these data as perinuclear localization for mRNAs coding for transmembrane/secreted proteins is well known. The aim of our study was to describe transcript localized at or in the proximity of the junction. We thought it was worth mentioning these examples of perinuclearly localized mRNAs (hmr-1, sax-7, and eat-20) for two reasons: scientific correctness – show accessory results that might be interesting for other scientists – and use as positive controls for our smFISH survey – these mRNAs were expected to have a somewhat perinuclear localization for the reasons mentioned above.

      What do these images look like over the entire embryo, not just in the zoomed in section?

      We added a column with the zoom-out embryos.

      sax-7 localization in S4 looks similar but a different localization claim is made.

      sax-7 mRNA can localize perinuclearly in sporadic instances (Fig S1C), but is predominantly scattered throughout the cytoplasm (i.e., unlocalized). It presumably localizes perinuclearly in a translation-dependent manner as sax-7 codes for a transmembrane protein that would be targeted to the ER. We have described this ER-type of localization in the introduction and reiterated it partially in the first paragraph of the results. sax-7 UTRs are therefore presumably not responsible for any subcellular localization, which would instead rely on a signal sequence. We will better clarify this point in the main text.

      Supplemental Figure 2

      Before adherens junctions even exist dlg-1 go to the membrane - this is really neat!

      Thanks Reviewer #2!

      Supplemental Figure 3

      Technical question: If either 5 or 3 stack images are used, how does this work? Do they have different z-spacings? Or do they do 5-stack images represent a wider Z-space?

      This is the sentence under question: “Maximum intensity projections of 5 (1.08 µm) (A) and 3 (0.54 µm) (B) Z-stacks”. The space between each Z-stack image is constant in all our imaging and its value is 270 nm. When we consider 5 planes, the distance from the 1st to the 5th is 4 x 270 nm = 1.08 µm, whereas for 3 planes will be 2 x 270 nm = 0.54 µm.

      Supplemental Figure 4

      Line #2 retains translation and keeps mRNA localization.

      Totally optional, but consider showing both lines in the main figure to illustrate the two possibilities.

      Noted.

      Materials and methods - how did they created the ATG mutations? Is it an array? - why does one translate, and one doesn't?

      We will clarify this point in Materials and methods: “dlg-1 deletion constructs ΔATG (SM2664 and SM2663) and ΔL27-PDZs (SM2641) were generated by overlap extension PCR using pML902 as a template.”.

      We will perform a Western blot to clarify Reviewer #2’s last point. Currently we do not know what peptide is translated, but the comparison with our full-length control will probably shed some light on the issue.

      Reviewer #3

      Major comments

      The smFISH results are striking and implications exciting. The conclusions made from the smFISH results reported in all Figures will be strengthened considerably by quantifying the mRNA localized to the defined specific subcellular regions. At the very least, localization to the cytoplasm versus the plasma membrane should be determined as performed in Figure 2B, but quantifying finer localization will enhance the conclusions made about regional localization (e.g. CeAJ versus plasma membrane mRNA localization in Figure 5). Inclusion of a non-localizing control in Figures 1-4 will enable statistical comparisons between mRNA localizing and non-localizing groups.

      We will add more quantitation, statistics, and negative controls.

      The script used for smFISH quantitation should be included in the methods or published in an accessible forum (Github, etc). Criteria for mRNA "dot" calling should be defined in the methods. All raw smFISH counts should also be reported.

      We will add the full description of the script in Materials and methods, and we will provide the raw data in an additional supplementary table.

      Figure 2: What is the localizing ratio of a non-localizing control mRNA (e.g. jac-1)? Including an unlocalized control with quantitation would strengthen the localization arguments presented.

      Yes, we will add quantitation for an unlocalized mRNA.

      Figure 5: Quantifying colocalization of mRNA and protein (+/- AJM-1) will strengthen the arguments made about mRNA/protein localization.

      Yes, we will quantify Fig. S5 to have a full picture of the cells (the images in Fig. 5 represent only a portion of the cell).

      Discussion of the CeAJ mRNA localization mechanism is warranted. Do the authors speculate that the newly translated protein drives localization during translation, similar in concept to SRP-mediated localization to the ER, or ribosome association is a trigger to permit a secondary factor to drive mRNA localization, or another model?

      Unfortunately, this is hard to say at the moment as we do not have any data regarding where translation actually occurs. We will add a conjecture to the Discussion.

      Minor comments

      Please complete the following sentence: "We identified transcripts enriched at the CeAJ in a stage- and cell type-specific."

      Corrected.

      It would be helpful to provide reference(s) for the protein localization summary in Table 1.

      Added.

      Figure 2B: Did dlg-1 and ajm-1 localize at similar ratios? Appropriate statistics comparing the different ratios may be informative.

      We will modify the graph (following Reviewer #2’s suggestion) and add the requested details.

      Figure 2: In the paragraph that begins, "Morphogenesis of the digestive track," the text should refer to Figure 2C? If not, the text requires further clarification.

      Corrected.

      Figure 2: Reporting the smFISH localizing ratios of 8E and 16E will be informative.

      We will add the information.

      Please include citations when summarizing the nonsense-mediated decay NMD mechanism and AJM-1 identifying the CeAJ.

      Added.

      The sentence, "Embryos from our second __Δ__ATG transgenic line displayed a little GFP protein and some dlg-1::gfp mRNA," should refer to Figure S4.

      Added.

      An immunoblot of this reporter versus wild type may be informative regarding the approximate position of putative alternative start codon.

      We will perform a Western blot to verify the size of the protein product produced.

      Figure 5: N's and repetitions performed should be included for localization experiments.

      Yes, we will add them here and in all the other quantifications we will add to the manuscript.

      Please clarify that the "the mechanism of UTR-independent targeting is unknown in any species" refers to dlg-1 mRNA localization.

      Added.

      "Our findings suggest..." discussion paragraph should reference Figure 6.

      Added.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      Tocchini et al. screened apical junction and cell membrane proteins for mRNA localization. They identified multiple proteins that are translated from localized mRNAs. Of these, dlg-1 (Discs large) mRNA localizes to cell cortices of dorsal epithelial cells, endoderm cells, and epidermal (seam) cells and is dependent on active translation for transport. The manuscript dissects the contributions of different DLG-1 protein domains to mRNA localization.

      A major strength of the paper is the way it assesses translational-dependence in a transcript-specific way without perturbing translation globally. The authors cleverly combine mutations in ATG start sites with a knock down of the non-sense mediated decay pathway. This allows Tocchini et al to examine whether dlg-1 mRNA depends on active translation for localization, which it does. The authors observe an interesting finding, that the domains required for protein localization can be separated from those required for mRNA localization. Namely, mRNA localization (but not protein localization) requires C-terminal domains of the protein.

      My major points of concern focus on the presentation and interpretation of Figure 5. In this figure, the blocking approach used seems confounding, the observations described by the authors are not visible, the quantification is confusing, and the interpretations seem like an over-reach. The

      Major comments:

      • Figure 2 requires a negative (or uniformly distributed) mRNA control for comparison. Figure 2C should be quantified. The plot quality should be improved, and appropriate statistical tests should be employed to strengthen the claimed findings.

      • Most claims of perinuclear mRNA localization are difficult to see and not well supported visually or statistically. The usage of DAPI markers, membrane markers, 3D rendering, or a quantified metric would bolster this claim. Also, sax-7 is claimed to be perinuclear and elsewhere claimed to be uniform then used as a uniform control. Please explain or resolve these discrepancies more clearly.

      • The major concern about the paper is the data display and interpretation of Figure 5C. I'm not comfortable with the approach the authors took of blurring out the nucleus. A more faithful practice would be to use an automated mask over DAPI staining or to quantify the entirety of the cell. If the entirety of the cell were quantified, one could still focus analysis on specific regions of relevance. The interpretations distinguishing membrane versus cytoplasmic localization (or mislocalization) are hard to differentiate in these images especially since they are lacking a membrane marker. The ability to make these distinctions forms the basis of Tocchini et al's two pathways of dlg-1 mRNA localization. These interpretations also heavily rely on how the image was processed through the different Z-stacks, and it's not clear to me how that was done. For example, the diffusion of mRNA in figure 5F and 5I are indistinguishable to my eye but are claimed to be different.

      • To my eye, it seems that Figure 5 could be more faithfully interpreted to state that DGL-1 protein localization depends on the L27-SH3 domains. The Huk/Guk domains are dispensable for DLG-1 protein localization; however, through other studies, we know they are important for viability. In contrast, dlg-1 mRNA localization requires all domains of the protein (L27-Guk). It is exceptionally interesting to find a mutant condition in which the mRNA and protein localizations are uncoupled. It would be very interesting to explore in the discussion or by other means what the purpose of localized translation may be. Because, in this instance, proper mRNA localization and protein function are closely associated, it may suggest that DLG-1 needs to be translated locally to function properly.

      • The manuscript requires an improve materials & methods description of the quantification procedures and statistics employed.

      Minor & Major comments together:

      Text

      • Summary statement: Is "adherent junction" supposed to be "adherens junction?"

      • Abstract: Sentence 1, I think they should add a caveat word to this sentence. Something like "...phenomenon that can facilitate sub-cellular protein targeting." In most instances this isn't very well characterized or known.

      • In the first paragraph, it might be good to mention that Moor et al also showed that mRNA localize to different regions to alter their level of translation (to concentrate them in high ribosome dense regions of the cell).

      • There are some new studies of translation-dependent mRNA localization - that might be good to highlight - Li et al., Cell Reports (PMID: 33951426) 2021; Sepulveda et al., 2018 (PCM), Hirashima et al., 2018; Safieddine, et al 2021. Also, Hughes and Simmonds, 2019 reviews membrane associated mRNA localization in Drosophila. And a new review by Das et al (Nat Rev MCB) 2021 is also nice.

      • Parker et al. did not show that the 3'UTR was dispensable for mRNA localization. They showed the 3'UTR was sufficient for mRNA localization.

      • In the second paragraph, the sentence about bean stages is missing one closing parenthesis.

      • Last paragraph: FISH is fluorescence, not fluorescent.

      • Both "subcellular" and "sub-cellular" are used. Minor comments - Figures

      • Figure 1

      o Figure 1A is confusing. It's not totally clear what the rectangles and circles signify. There are many acronyms within the figure. Which of the cell types depicted in the figure are shown here? For example, for the dorsal cells, which is the apical v. basal side? o Some of the colors are difficult to distinguish, particularly when printed out or for red/green colorblind readers. Is erm-1 meant to be a cytoskeletal associated or a basolateral polarity factor? o The nomenclature for dlg-1 is inconsistent within "C". o Please specify what the "cr" is in "cr.dlg-1:-gfp" in the legend.

      • Figure 2

      o Can Figure 2C be quantified in a similar manner to 2A/2B? o 2B - please jitter the dots to better visualize them when they land on top of one another o Please include a negative control example, a transcript that is not peripherally localized for comparison. o There is no place in the text of the document where Fig 2C is referenced o I can't see any discernable ajm-1 localization in Fig 2A. o I can't see any dlg-1 pharangeal localization in Fig2C. o More details on how the quantification was performed would be welcome. Particularly, in 2B, what is the distance from the membrane in which transcripts were called as membrane-associated? What statistics were used to test differences between groups?

      • Figure 3

      o Totally optional but might be nice: can you make a better attempt to approximate the scale of the cartoon depiction? o The GFP as an asterisk illustration may be confusing for some readers. Could you add another rectangular box to depict the gfp coding sequence? o This microscopy is beautiful! o Were introns removed? Is the endogenous copy still present? o The wording in the legend "CRISPR or transgenic" may be confusing as Cas9 genome editing is still a form of transgenesis. o The authors state that the 5'-3'UTR construct produces perinuclear dlg-1 transcripts but in the absence of DAPI imaging, it's not clear that this is the case. o Which probeset was used? The gfp probe? o Here, sax-7 is used as a uniform control, but sax-7 is claimed in Fig S1B-D as being perinuclear. This is a bit confusing.

      • Figure 4

      o Excellent results! Really nice! o Fig 4A. The GFP depicted as a circle is strange. o Fig 4A. Can you include the gene/protein name for easy skimming? o Fig 4B. the color here is too faint and it is unclear what is being depicted. Overall, this part of the figure could be improved. o Were the introns removed?

      • Figure 5

      o Fig 5A. can you add the gene/protein name o Fig 5B. Can you you make the example apicobasal (non-apical) mRNA more distinctive? If it had its own peak in the lower trace, the reader would more clearly understand that this mRNA will be excluded from apical measurements whereas it will be included in apicobasal measurements. o D' - I' The grey font is too light. o D' - I' The inconsistent y-axis scaling makes it difficult to compare across these samples. Can you set them to the same maximum number? o D' - I' The x-axis labels are formatted incorrectly o The practice of masking out the nucleus appears to remove potentially important mRNAs that are not nuclear localized. This could really impact the findings and interpretation. Instead, consider an automated DAPI mask. o I can't see what the authors are calling membrane diffuse versus cytoplasmic. This is making it hard for me to see their "two step" pathway to localization. o "F" looks the same as "I" to me, but the authors claim they represent different patterns and use these differences as the basis for their claim that X. o Can more details of the quantification be included? How were Z-sections selected, chosen for inclusion? Which Z-sections and how many were selected? o Also, why do these measurements focus on what I think are the seam cells when Lockwood et al., 2008 show the entire epithelium that is much easier to see? o Please name these constructs to correlate the text more explicitly to the figures. o How many embryos were analyzed for each trace? How many embryos showed consistent patterns? o Why were these cells used for study here? Lockwood et al., 2008 use a larger field of epithelial cells for visualization.

      • Figure 6

      o There are major discrepancies between what this figure is depicting graphically and what is described in the text. Again, I'm not comfortable making the "two step" claims this figure purports given the data shared in Figure 5.

      Minor comments - Tables & Supplemental Figures

      Table 1

      • I think this table could be improved to more clearly illustrate which mRNAs were tested and what their mRNA localization patterns were (for example, gene name identifiers included, etc). Could the information that is depicted by gray shading instead be added as its own column? For example, have a column for "Observed mRNA localization"

      • Can you add distinct column names for the two columns that are labeled as "protein localization - group"

      • Can you also add which of these components are part of ASI v. ASII (as described in the introduction? Supplemental Figure 1

      • It is hard to see that some of these spots are perinuclear. More information (membrane marker, 3D rendering, improved metrics) is required to support this claim.

      • What do these images look like over the entire embryo, not just in the zoomed in section?

      • sax-7 localization in S4 looks similar but a different localization claim is made.

      Supplemental Figure 2

      • Before adherens junctions even exist dlg-1 go to the membrane - this is really neat! Supplemental Figure 3

      • Technical question: If either 5 or 3 stack images are used, how does this work? Do they have different z-spacings? Or do they do 5-stack images represent a wider Z-space?

      Supplemental Figure 4

      • Line #2 retains translation and keeps mRNA localization.

      • Totally optional, but consider showing both lines in the main figure to illustrate the two possibilities.

      • Materials and methods - how did they created the ATG mutations? Is it an array? - why does one translate, and one doesn't?

      Significance

      The authors discover that dlg-1, ajm-1, and hmr-1 mRNAs (among others) are locally translated, and this represents an important conceptual advance in the field as these are well studied proteins and important markers. This is the first study to illustrate translation-dependent mRNA localization in C. elegans, to my knowledge. The mechanisms transporting these mRNAs and their associated translational complexes to the membrane may represent a new pathway of mRNA transport and is therefore significant. The authors identify domains within DLG-1 responsible which is a nice advance. If they are unable to order the events of association as they claim in Figure 5 (and that I dispute), this doesn't detract from the impact of the paper.

      Other high-profile studies have recently been published that echo how mRNA localization to membranes can be observed for transcripts that encode membrane-associated proteins (Choaib et al., Dev Cell, 2020; Li et al., Cell Reports, 2021 (PMID: 33951426); and Reviewed in Hughes & Simmonds, Front Gen, 2019). These recent findings underscore the impact of Tocchini et al.'s paper. Similar studies have identified mRNAs localizing through translation dependent mechanisms to a variety of different regions of the cell (Sepulveda et al., eLife, 2018; Hirashima et al., Sci Reports, 2018; Safieddine, et al., Nat Comm, 2021; and reviewed in Ryder et al., JCB 2020). Given the timely nature of these findings and the recent interest in these concepts, a broad readership of readers should be interested in this paper.

      My field of expertise is in mRNA localization imaging and quantification. I feel sufficiently qualified to evaluate the manuscript on all its merits.

    1. new digital tools may be transforming these methods and this basic work. Is the very computer upon which humanists rely so heavily still a tool, something akin to their medieval writing tablets? Or has it become an environment, its screen no longer a blank sheet on which to write but a window or portal into the entire digital realm, which acts upon the humanist as much as or more than she acts upon it? As such tools become even more integrated with the human body - Google Glass or the new Apple Watch, for example - will the distinction between tool and environment disappear even further? Might we be approaching the time when the distinction created by the term homo Jaber, the human as maker, outside and above the world of her creations, becomes meaning-less in the world of the semantic web and 3D bacterial printing?

      I think that technology has developed to the point that it is both a tool and an environment. When I use it to write a paper, it is a tool, but it becomes an environment when using it to interact with my classmates. Things like search engines are more ambiguous. They are a tool in how they help me achieve the goal of finding what I am looking for, but they immerse me into the environment created by websites and documents. Things like google street view are, without a doubt, in my mind, a tool and environment. They both help me find the place I was looking for and immerse me into the environment and visually experience it.

    1. Author Response:

      Reviewer #1:

      By sequencing a large number of SARS-CoV-2 samples in duplicate and to high depth, the authors provide a detailed picture of the mutational processes that shape within-host diversity and go on to generate diversity at the global level.

      1) Please add a description of the sequencing methods and how exactly the samples were replicated (two swaps? two RNA extractions? two RT-PCRs?). Have any limiting dilutions been done to quantify the relationship between RNA template input and CT values? Also, the read mapping/assembly pipeline needs to be described.

      Limiting dilutions were not performed however the association between Ct and discordance between replicates was explored. Samples with Ct>=24 were found to have considerable discordance between replicates, likely resulting from a low number of input RNA molecules. This is described in the first section of the results and illustrated in Figure 1 - figure supplement 3.

      We have now added additional sections to the methods to better describe the sequencing and mapping pipelines.

      Sequencing: A single swab was taken for each sample. Two libraries were then generated from two aliquots of each sample with separate reverse transcription (RT), PCR amplification and library preparation steps in order to evaluate the quality and reproducibility of within-host variant calls. The ARTIC protocol v3 was used for library preparation (a full description of the protocol used available at dx.doi.org/10.17504/protocols.io.be3wjgpe).

      Alignment and variant calling: Alignment was performed using the ARTIC Illumina nextflow pipeline available from https://github.com/connor-lab/ncov2019-artic-nf...

      2) I find the way variants are reported rather unintuitive. Within-host variation is best characterized as minor variants relative to consensus (or first sample consensus when there are multiple samples). Reporting "Major Variants" along with minor variants conflates mutations accumulated prior to infection with diversity that arose within the host. The relative contributions of these two categories to the graphs in Fig 1 would for example be very different if this study was repeated now. Furthermore, it is unclear whether variants at 90% are reversions at 10% or within-host mutations at 90%. I'd suggest calling variants relative to the sample or patient consensus rather than relative to the reference sequence (as is the norm in most within-host sequencing studies of RNA viruses).

      We are grateful for this comment and have tried to improve and clarify the reporting of variants to align with previous literature.

      Our original classification intended to classify non-reference sites as fixed changes (VAF>95%) or within-host variants (which we called “minor variants”). While we chose 95% as a cutoff (which may have been confusing), the results are analogous with a 99% cutoff, as variants in this set essentially have VAF~100%, and nearly all are expected to have occurred in a previous host. Thus, the previous classification intended to cleanly separate inter-host (fixed) mutations from within-host mutations, to compare their patterns of selection and their mutation spectra.

      Following the reviewer’s request, we have modified this classification to better align with other studies of RNA viruses by defining the majority allele at a site as the “consensus”. We note that the results remain largely similar, since the vast majority of within-host variants identified had a low VAFs (<<50%) with the majority/consensus allele most often corresponding to the reference (Wuhan) base.

      When considering recurrent mutations we now discuss the number of times variants are observed at each location within a sample. This avoids the issue of how variants are polarised.

      3) It is often unclear how numbers reported in the manuscript depend on various thresholds and parameters of the analysis pipeline. On page 2, for example, the median allele frequency will depend critically on the threshold used to call a variant, while the mean will depend on how variation is polarized. Why not report the mean of p(1-p) and show a cumulative histogram of iSNV frequencies on a log-log scale including. I think most of these analyses should be done without strict lower cut-offs or at least be done as a function of a cut-off. In contrast to analyses of cancer and bacteria, the mutation rates of the virus are on the same order of magnitude as errors introduced by RT-PCR and sequencing. Whether biological or technical variation dominates can be assessed straightforwardly, for example by plotting diversity at 1st, 2nd, and 3rd codon position as a function of the frequency threshold. See for example here:

      https://academic.oup.com/view-large/figure/134188362/vez007f3.tif [academic.oup.com]

      There are more sophisticated ways of doing this, but simpler is better in my mind.

      It would be good to explore how estimates of the mean number of mutations per genome (0.72) depend on the cut-offs used. A more robust estimate might be 2\sum_i p_i(1-p_i) (where p_i is the iSNV frequency at site i) as a measure of the expected number of differences between two randomly chosen genomes. Ideally, the results of viral RNA produced of a plasmid would be subtracted from this.

      The reviewer raises a number of important points that we have tried to address and clarify.

      We think that the quality of our variant calls is supported by several lines of evidence, including: (1) the use of the ShearwaterML calling algorithm, which uses a base-specific overdispersed error model and calls mutations only when read support is statistically above background noise in other genomes, (2) we use two independent replicates from the RT step, (3) we provide several biological signals that cannot be expected to arise from errors, including the fact that the mutation spectra of low VAF iSNVs called in our study recapitulate that of consensus mutations and the clear signal of negative selection acting on iSNVs. We note that this dN/dS analysis is closely related to the suggestion by the reviewer of comparing the frequency of mutations at positions 1/2/3 of a codon.

      To address this comment in the manuscript, we have amended the text to include these arguments and we provide two new supplementary figures: (1) a figure of the frequency of mutations at the three codon positions, as requested by the reviewer, and (2) the mutation spectra of low VAF iSNVs, demonstrating the quality of the mutation calls. Similar to the finding in Dyrak et al., (2019), and as expected from the dN/dS ratios, the distribution of variant sites is dominated by variants at the third position and not equally distributed as one might expect if errors were dominating the signal.

      We have amended the relevant section of the text to read:

      “To reliably detect within-host variants with the ARTIC protocol, we used ShearwaterML, an algorithm designed to detect variants at low allele frequencies. ShearwaterML uses a base-specific overdispersed error model and calls mutations only when read support is statistically above background noise in other genomes \cite{Gerstung2014-av,Martincorena2015-ef} (Methods). Two samples were excluded, as they had an unusually high number of low frequency variants unlikely to be of biological origin, leaving 1,179 samples for analysis, comprising 1,121 infected individuals of whom 49 had multiple samples. For all analyses we used only within-host variants that were statistically supported by both replicates (q-value<0.05 in at least one replicate and p-value<0.01 in the other, Methods). Within each sample, we classified variant calls as `consensus' if they were present in the majority of reads aligned to a position in the reference or as within-host variants otherwise. The allele frequency for each variant was taken as the frequency of the variant in the combined set of reads for both replicates.”

      ...

      “The use of replicates and a base-specific statistical error model for calling within-host diversity reduces the risk of erroneous calls at low allele frequencies. We noticed a slight increase in the number of within-host diversity calls for samples with high Ct values, which may be caused by a small number of errors or by the amplification of rare alleles and that could inflate within-host diversity estimates (Figure 1 - figure supplement 3) \cite{McCrone2016-se}. However, the overall quality of the within-host mutation calls is supported by a number of biological signals. As described in the following sections, this includes the fact that the mutational spectrum of within-host mutations closely resembles that of consensus mutations and inter-host differences and the observation of a clear signal of negative selection from within-host mutations, as demonstrated by dN/dS and by an enrichment of within-host mutations at third codon positions \cite{Dyrdak2019-xk} (Figure 1 - figure supplement 4).”

      Whilst we believe the remaining variant calls are reliable we acknowledge that how variants are polarised could impact some of the summary statistics reported. To help improve this we have amended Figure 1 to include a cumulative histogram of within-host variant frequencies on a log-log scale as suggested by the reviewer. We have also included estimates of the mean value of sqrt(p(1-p)) (indicating an estimate of the standard deviation of within-host variants assuming a Bernoulli distribution). We have also replaced the estimates of the mean number of mutations per genome with the expected number of differences between two randomly chosen genomes. The amended Figure 1C now displays a histogram of the expected number of differences between two genomes for each sample rather than the mean number of mutations.

      4) This paper provides an important baseline characterization of within-host diversity, while the patterns themselves are not extremely surprising. It is thus important that the data are provided in a form that facilitates reuse. It would be helpful to provide intermediate analysis results in addition to the raw reads in the SRA and the shearwater calls. I would like to see simple csv tables with the number of times A,C,G,U,- was observed at every position in the genomes for every sample. This would greatly facilitate the reuse of the data.

      We have now added raw count tables for each sample and each replicate to the GitHub repository. We have also archived this data using Zenodo to ensure it remains easily accessible.

      Reviewer #2:

      The paper by Tonkin-Hill and colleagues describes the analysis of intra-host variation across a large number of SARS-CoV-2 samples. The authors invested a lot of effort in replicate sequencing, allowing them to focus on more reliable data. They obtained several important insights regarding patterns of mutation and selection in this virus. Overall, this is an excellent paper that adds much novelty to our understanding of intra-host variation that develops during the time course of infection, its impact on transmission, and what we can or cannot learn on relationships between samples.

      We are grateful to the reviewer for their positive comments.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.


      Reply to the Reviewers

      We thank the Referees for their evaluation and their useful comments.


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The MS from Bonaventure and colleagues used a CRISPR to identify novel IFN-induced antiviral effectors targeting HIV-1. One hit, the DEAD Box helicase DDX42, while not itself part of the IFN response, exerts a substantial inhibitory effect on HIV-1 replication when over expressed, and gives a several fold boost to viral replication when knocked down in cells. The effect of DDX42 KO or O/E is manifest at reverse transcription and PLA analysis suggests and interaction with incoming virions. Moreover, DDX42 appears to exert an inhibitory effect generally against retroviruses and retroelements, with evidence that it associates with viral/transposon RNA. The authors further show that DDX42 has antiviral against a range (but not all) RNA viruses, with very striking phenotypes seen especially with Zika, CHIKV and SARS CoV2, with DDX42 associating with dsRNA in infected cells. These data suggest DDX42 is a constitutively expressed a broad-spectrum inhibitor of a range of mammalian RNA viruses. The manuscript is very well written, the data is of good quality and clearly DDX42 is having a general effect on viral replication. The results are novel, important and potentially of wide interest. Where the MS is somewhat lacking is understanding whether DDX42 has direct antiviral activity or is globally affecting cellular RNA metabolism. Some important areas for the authors to consider are:

      • DDX42 has a potential role in splicing and/or RNA metabolism so I think it would be important to see whether there is any clear global change in gene expression in knockout or knockdown cells cells vs control that might be suggestive of a generalized effect.

      Responses

      We thank the reviewer for this important question. Indeed, DDX42 didn’t impact the replication of 2 negative strand RNA viruses and this suggested that DDX42 didn’t have a global impact on the target cells, but we could not formally exclude a generalized effect. Therefore, we have performed RNA-seq analysis in order to evaluate the impact of DDX42 depletion (using 3 different siRNAs targeting DDX42 in comparison to a CTRL siRNA in U87-MG cells, and 2 different siRNA in comparison to a CTRL siRNA in A549-ACE2 cells, in samples obtained in 3 independent silencing experiments). The RNA-seq data (See Supplemental File 1 and Figure S5) showed that only 63 genes are commonly differentially expressed by the 3 siRNAs targeting DDX42 in U87-MG cells and only 23 of these genes were also found differentially expressed in A549-ACE2 cells depleted for DDX42. Importantly, the identity of these genes could not explain the observed antiviral phenotypes. These data are in favor of the absence of generalized effect on the target cells, which could have explained the antiviral phenotypes of the sensitive viruses.

      • The HIV experiments in primary cells are only one round at present. Does the DDX42 knockdown enhance viral replication in multiround? Does it lead to more viral PAMPs for PRRs to induce IFN?

      Responses

      We agree with the reviewer that it would have been very informative to measure the impact of DDX42 knockdown in multiround infections in primary T cells. However, we tried several times to do this experiment (with primary T cells from several donors) and we were not successful: indeed, DDX42 KO appeared to slow down cell division, which could be taken into account for a short, one-cycle experiment (i.e. 24 h) 3 days post-Cas9/sgRNA electroporation by adjusting the number of cells at the time of infection. However, DDX42 KO appeared quite toxic in longer experiments, with cells stopping to grow.

      The question regarding the generation of more viral PAMPs for PRRs to induce IFN is also very interesting. We know from published work (including ours) that primary T cells don’t normally produce IFN following HIV-1 infection (see for instance Bauby and Ward et al, mBio 2021). However, one can indeed hypothesize that as more viral DNAs are produced in the absence of DDX42, perhaps the primary T cells could detect them and produce IFN. To address this question in primary T cells, we would have needed to be able to perform multiround infections, which was not possible, as mentioned above. Moreover, we could not test this hypothesis in the cell lines that we used, such as U87-MG/CD4/CXCR4 cells, as they are unable to produce IFN following HIV-1 infection.

      • More could be made mechanistically of the lack of sensitivity of Flu and VSV to DDX42. In particular showing whether or not DDX42 interacts with the RNA of the insensitive virus, or whether DDX42/virus or dsRNA interactions by PLA occur with Flu would highlight the relevance of these observations to the antiviral mechanism.

      Responses

      This is an excellent remark. We have now performed RNA immunoprecipitation experiments using 2 viruses targeted by DDX42 (CHIKV and SARS-CoV-2) and 1 virus that is insensitive to DDX42 (IAV) (See New Figure 4J-L): whereas CHIKV and SARS-CoV-2 RNAs could be specifically pulled-down with DDX42 immunoprecipitation, this was not the case for IAV RNA. This strongly argues for a direct mechanism of action of DDX42 helicase on viral RNAs.

      Reviewer #1 (Significance (Required)):


      __ The role of helicases in host defence are of wide interest and importance. This has the potential to be a very important study that deserves a wide audience. However in my opinion it needs some further mechanistic insight along the lines I have suggested.

      Responses

      As mentioned above, we have now added important data: First, DDX42 is able to interact with RNAs from targeted viruses (and not from an insensitive virus); Second, we have checked that DDX42 didn’t have a substantial impact on the cell transcriptome. Taken together, these data are clearly in favour of a direct mode of action of DDX42.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this brief report, the authors use a CRISPR screening approach to identify cellular proteins that limit HIV infection. The screen itself is elegantly designed and most of the top hits are components of the interferon signaling pathway that would be expected to emerge from such a screen, thus providing confidence in the results. The authors followed up on DDX42 as a new hit identified in their screen and confirmed that targeting DDX42 with distinct guide RNAs resulted in increased HIV infection in at least 3 cell lines. Conversely, DDX42 overexpression inhibited infection. They also confirmed a role for DDX42 in inhibiting HIV infection in primary macrophages and CD4 T cells using siRNA and CRISPR KO strategies, respectively. They also demonstrate that DDX42 inhibits several other divergent lentiviruses as well as Chikungunya virus and SARS-CoV-2, but not influenza virus. These data convincingly show that DDX42 plays a role in inhibiting many lentivirus and positive sense RNA virus infections. Using PCR assays for reverse transcription products they conclude that DDX42 inhibits an early process in the HIV life cycle occurring after virus entry, though the statistical significance of these differences is not clear. They further use proximity ligation assays to suggest that DDX42 is in proximity to HIV-1 and SARS-CoV-2 replication complexes. Mechanistically, these data are largely unsatisfying as they do not provide specific insight into how DDX42 so broadly inhibits virus replication. Overall, the manuscript presents a significant advance, it also has some weaknesses as listed below.

      1. Statistical analysis is not included in any of the figures.

      Response

      Statistical analyses have now been included.

      Many of the figure legends do not state how many independent biological replicates the figures are based on.

      Response

      The number of biological replicates for each panel is stated at the very end of each figure legend.

      Detailed mechanistic understanding of DDX42 effects on virus replication is not provided by the manuscript.


      Response

      As mentioned in response to Reviewer 1, we have now added data showing that DDX42 could interact with RNAs from targeted viruses but not from an insensitive virus, arguing for a direct antiviral mode of action of this Dead-Box helicase.

      Reviewer #2 (Significance (Required)):

      DDX42 is a new antiviral protein identified and confirmed in this manuscript. It was also identified as one of many hits in a genome wide CRISPR screen for cellular proteins that regulate SARS-CoV-2 infections, but was not followed up. Thus, the identification and confirmation of DDX42 antiviral activity is highly significant for both the HIV and SARS-CoV-2 fields. This high significance may compensate to some extent for the lack of mechanistic insight contained in this initial report.

      **Referees Cross-commenting**

      I find the comments of the other reviewers to be fair and reasonable, and I concur that the work is overall important and novel. It seems that reviewers generally agreed that some additional mechanistic insights would be desirable for publication in a high impact journal. Reviewer 1 makes some good suggestions in this regard. As for mouse experiments, I would reserve these for a follow up manuscript.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):


      __In this manuscript, Bonaventure et al report the results of a screen to identify cellular inhibitors of HIV-1 infection in IF treated cells. They identify DDX42 as such a factor though, unexpectedly, DDX42 did not turn out to be an ISG. Strikingly, DDX42 turns out to inhibit a wide range of retroviruses as well as retrotransposons and + sense, but not - sense, RNA viruses among which SARS-CoV2 turns out to be especially sensitive to DDX42, with siRNAs specific for SARS-CoV2 DDX42 increasing viral RNA expression by a startling 3 orders of magnitude, compared to only an 2-5 fold positive effect with HIV-1.

      Response

      We agree with the reviewer that DDX42’s impact on HIV-1 may appear as somewhat modest, however, it is highly reproducible across cell lines and primary cells and, more importantly, it is observed upon depletion of the endogenous protein (either by KO or silencing) in target cells that are highly permissive to viral replication, such as activated primary CD4+ T cells. We therefore believe that these findings, combined with the findings that other positive-strand RNA viruses are targeted, are of high interest.

      Reviewer #3 (Significance (Required)):


      __I found this paper generally convincing and technically sound though the emphasis was odd and clearly driven more by the history of how this work was done than by the actual results obtained. Specifically, the emphasis is on HIV-1 yet the most interesting data are the dramatic effects seen with Chikungunya and SARS2. If I was writing this paper, I would delete figure 4 and focus this paper entirely on retroviruses and retrotransposons. In that form, I think it would be competitive at PLoS Pathogens or perhaps EMBO Journal. The RNA virus work shown in figure 4 could then be figure 1 of a new, high impact, paper looking at the mechanism of action of DDX42 as an inhibitor of + sense, but not - sense, viral gene expression. Though Wei et al do mention DDX42 in their SARS-CoV2 screening paper this is certainly not a major theme of that paper so I don't think that would be a problem.

      Responses

      We thank the reviewer for this comment. We had hesitated to present the manuscript as suggested by the reviewer (i.e. focusing only on HIV-1, retroviruses and retroelements) and prepare a second manuscript with the remaining data. We’ve finally decided against it, as we believe that showing a broad antiviral effect of DDX42 on +strand RNA viruses increases the impact of our findings.

      On another note, a conditional DDX42 KO mouse has been generated by the Wellcome trust Sanger institute and it would greatly improve this manuscript if they could show an in vivo a result similar to figure 3F using MLV.

      Responses

      We thank the reviewer for this information. We completely agree that in vivo work would be a massive plus and we will be planning to explore this in the future, but not at this stage as it would require specific funding and resources.

    1. Author Response:

      Reviewer #1:

      This study focuses on how the vmPFC supports delay discounting. The authors tested patients with vmPFC lesions (N=12) and healthy controls (N=41) on a delay discounting (DD) task with two additional conditions: (1) reward magnitude and (2) cues that should evoke episodic future thinking (EFT).

      The authors replicate their previous finding that patients with vmPFC lesions show steeper DD, and report two novel findings: (1) DD in patients is insensitive to reward magnitude, suggesting that vmPFC is critical for reward magnitude to modulate DD; (2) vmPFC patients show normal effects of EFT cues on DD, such that all subjects discounted less in the presence of cues that promote episodic future thinking. These findings have important implications for how vmPFC contributes to delay discounting, as they suggest that vmPFC is not necessary for prospective thinking to affect the evaluation of future rewards.

      1) A potential issue with the EFT finding is that it rests on accepting the null hypothesis of no group differences. However, there are reasons to assume this is not a trivial null result due to a lack of statistical power. Specifically, there is a significant effect of EFT within the vmPFC patient group and there is a significant group difference for the effect of reward magnitude. Assuming comparable power to detect effects of EFT and reward magnitude, it seems unlikely that the non-significant EFT effect is simply a lack of power. In any case, this caveat has to be considered when interpreting the effect.

      We have added a discussion of this caveat on p. 10, which reads: “Before discussing this finding further, we note that it rests on accepting the null hypothesis of no group differences in the EFT effect on DD between vmPFC patients and controls. It is unlikely, however, that this null finding simply reflects a lack of statistical power, for example due to a small sample size. First, the null effect on group differences indeed reflects a significant within-participant effect, with greater regard for future amounts in the EFT compared to the Standard condition in vmPFC patients. Second, together with the preservation of the EFT effect, we found a significant reduction of the magnitude effect in the same vmPFC patient sample. Bayesian analyses confirmed greater evidence in favour of the null compared to the alternative hypothesis regarding group differences in the EFT effect on DD.”

      2) It is somewhat surprising that the authors had such a strong prediction about the absence of group differences for the EFT effect. Based on previous work (Bertossi et al., 2016a, b), one could expect a smaller EFT effect in the VMPFC group. The authors appear to put much weight on the results by Ghosh et al. 2014, which suggest that vmPFC is critical for schema reinstatement. The rationale for this strong prediction is not very clear from the introduction.

      We have now reframed our hypotheses as suggested by the reviewers and the editors. In the Introduction, we now make only the hypothesis of a reduced EFT effect on DD in vmPFC patients, which is based on previous evidence of an EFT impairment in vmPFC patients. We present the hypothesis that vmPFC is critical for schema instantiation only in the Discussion, as an explanation of the null finding on group differences on the EFT effect.

      Thus, p. 5 now reads: “Concerning prospection, previous studies have observed an EFT effect on DD, such that people discount future rewards less steeply if cued to imagine personal future events during intertemporal choice (Peters and Büchel, 2010; Benoit et al., 2011). Considering that vmPFC is implicated in prospection (Schacter et al., 2012) and that vmPFC patients are impaired in EFT (Bertossi et al., 2016a,b; Bertossi et al., 2017), vmPFC patients' DD should remain steep even when EFT cues are provided, because patients may nevertheless fail to construct the vivid future events that might be needed to counteract DD. Thus, we predict a reduced EFT effect on DD in vmPFC patients compared to healthy controls.”

      Reviewer #2:

      Ciaramelli et al. address a timely and theoretically important issue with respect to the functional role of the vmPFC in decision-making more generally, and temporal discounting in particular. Strong points of the paper include 1) a theoretically important research question and 2) much-needed lesion data on two important behavioral effects in temporal discounting: the magnitude effect, and a modulation of discounting via episodic future thinking. Weaker points of the paper include 1) lack of clarity for a number of methodological issues (group comparisons & control group for the AI data, inconsistency analysis) and 2) many remaining open questions with respect to how vmPFC patients might have utilized the EFT cues, and whether different processes were at work compared to controls.

      We thank the reviewer for this positive evaluation of the paper and address the reviewer’s comments below.

      Major points:

      1) The authors note that their interpretation of the preserved EFT effects in the vmPFC patients in terms of e.g. semantic processing remains speculative, but is supported by the finding of intact external details production following vmPFC damage in earlier studies. But was this also the case in the present data set? This remains unclear, because for the AI data, only z-scores relative to some earlier control group (Kwan et al. 2015) are reported (Table 1 and Supplement p. 30). Was this control group matched to the patients? And since the referenced Kwan et al. (2015) paper reports only on six patients (presumably the patients from the Canada site?) - what about the patients from the Italian site, which control group were their AI data compared to?

      The Crovitz data of the Canadian patients are unpublished (the Kwan et al., 2015 paper is not about vmPFC patients, but about 6 MTL patients). We compared them to a sample of 18 age-matched healthy controls, a subset of those included in Kwan et al. (2015). The 4 Italian patients were part of the vmPFC sample tested on EFT (and episodic memory) in Bertossi et al. (2016). We compared their performance with that of the 11 healthy controls from the same study who were age-matched to the patients.

      This is clarified on p. 17, which reads: “The results of the Italian patients (a subset of those included in Bertossi et al. 2016b) were contrasted with those of the 11 healthy controls from the same study (all males; Bertossi et al., 2016b) who were age-matched to the patients (vmPFC patients: M = 47.75, SD = 5.25; healthy controls: M = 41.63, SD = 11.89, t13 = -0.97, p = 0.34). The results of the Canadian patients (unpublished) were contrasted with those of 18 healthy controls (10 males; a subset of those included in Kwan et al., 2015) age-matched to the patients (vmPFC patients: M = 61.00, SD = 9.83; healthy controls: M = 67.94, SD = 13.57, t22 = 1.15, p = 0.26).”

      2) Directly related to my previous point: The methods section states that external details were in the normal range in the vmPFC group (mean z-score for EFT = -.73) but from Table 1 we can see that 8/10 patients in fact exhibit a negative z-score. This suggests that a direct group comparison of the external details scores would very likely reveal a significant group difference. Generally, it would help to report to actual control data here, not just the z-scores, and report the respective group comparisons.

      We now report the Crovitz data in Table 2 and have run two ANOVAs on internal and external details separately in vmPFC patients and controls tested in Italy and in Canada. As the two ANOVAs show, we confirm that both patient groups produced fewer internal (episodic) details but a similar number of external details during EFT (as well as episodic remembering) than healthy controls. Therefore, the previously reported EFT problems for internal (but not external) details in vmPFC patients also apply to the patients tested here.

      P. 17 now reads: “As for the Italian sample, an ANOVA on the details produced with Group (vmPFC patients, healthy controls), Time (Past, Future), and Detail (internal, external) as factors showed a significant effect of Time (F1,13 = 14.66, p = 0.002, partial η2 = 0.53), such that all participants produced more details for past than future events (18.19 vs. 15.37). There were also significant effects of Group (F1,13 = 6.16, p = 0.02, partial η2 = 0.32) and Detail (F1,13 = 9.14, p = 0.009, partial η2 = 0.41), qualified by a Group x Detail interaction (F1,13 = 8.99, p = 0.01, partial η2 = 0.40). Post hoc Fisher tests showed that vmPFC patients produced fewer internal details (11.45 vs. 25.51; p = 0.004) but a similar number of external details than controls (11.39 vs. 11.96; p = 0.89). No other effect was significant (p > 0.31 in all cases). The same ANOVA on the Canadian sample revealed an effect of Group (F1,22 = 17.76, p = 0.0003, partial η2 =20.44), qualified by a significant Group x Detail interaction (F1,22 = 4.72, p = 0.04, partial η = 0.18), again indicating that vmPFC patients produced fewer internal details (10.63 vs. 31.78; p = 0.0003) but a similar number of external details than controls (16.79 vs. 25.65; p = 0.09). No other effect was significant (p > 0.32 in all cases).”

      3) The description of the inconsistency analysis was somewhat unclear. The authors use the procedure suggested by Johnson & Bickel (2008), which makes sense, given the overall analytical approach that focuses on the analysis of indifference points. However, this procedure is based on a comparison of adjacent indifference points. In contrast, the authors are referring to the number of inconsistent choices - this is either a typo, or a different procedure. I think the former, because the reported absolute numbers (e.g. means around 1) and the single subject plots in the supplement appear to reflect the number of inconsistent ID points rather than choices. If this is the case, I disagree with the statement that the "mean number of inconsistent choices was very low" (p. 10) - as this probably reflects the mean number of inconsistent indifference points and not choices, about 1 out of 6 ID points was inconsistent in the vmPFC group, which is a lot.

      We apologize for lack of clarity. Yes, we are referring to indifference points (as in our previous study; Sellitto et al., 2010), not single choices. Inconsistent preferences are defined as “data points in which the subjective value of a future outcome (amount = R) at a given delay (R2) was greater than that at the preceding delay (R1) by more than 10% of the amount of the future outcome (i.e., R2 > R1 + R/10, as in Sellitto et al., 2010).” To avoid confusion, we have now corrected the expression ‘inconsistent choice’ to ‘inconsistent preference’ throughout the paper, and have eliminated the claim about the low number of inconsistent choices in vmPFC patients.

      4) The EFT cues are suggested to help vmPFC patients to "circumvent their initiation problems" (p. 12) but I am not sure I follow this logic. First, the AI procedure typically entails external cues as well, and here vmPFC patients showed impairments (Table 1, but see my point 1 above). Second, some of the cited papers (e.g. Verfaellie et al., 2019) also used specific event cues, and still observed reduced internal details production in vmPFC patients.

      The AI (Crovitz) procedure uses external cues but typically these are words that are not particularly meaningful to the participants (indeed, they are the same for all participants). e.g., Imagine attending a Fourth of July cookout a few years from now; Verfaellie et al., 2019) but, again, these cues are the same for all participants. We used personalized cues, which were events that participants (1) had selected themselves, and (2) had already planned or found them plausible in their future, and therefore presumably were the most self-relevant and familiar to the participants, including patients. We think that these events may have been effective in activating self- and event- relevant schemata. We clarify this point on p. 11, which reads: “We propose, therefore, that subject-specific event cues, which were self-relevant and familiar to the participants because they had been selected by participants themselves, and were already planned or were plausible in their future, acted as external triggers of self- and situation-relevant schemata, helping to circumvent vmPFC patients’ EFT initiation problems. Their intact MTLs allowed them to construct episodic future events, which were then integrated into intertemporal choice, reducing DD.” As we note on p. 14, indeed, vmPFC patients are capable of imagining detailed experiences if they are guided to choose for themselves a specific moment from an extended future event to narrate in detail (Kurczek et al., 2015). Of course, we agree with the Reviewer’s point below that this interpretation is speculative at this point.

      5) One shortcoming with the paper is that no data are available that could inform how vmPFC patients might have utilized the EFT cues, and whether the processes at work might have differed from those in controls. Many points mentioned in the discussion (self-referential processing, semantic processing, activation of schemata, self-initiation vs. external cueing etc.) thus necessarily remain conjecture.

      We agree with the Reviewer, and we admit in several parts of the Discussion that this interpretation is speculative at this point. However, the interpretation that we offer seems the most plausible to us at this time, considering what we know about the role of the vmPFC (vs. the MTL) in event construction and the absence of the EFT effect on DD in MTL patients. We also propose an alternative interpretation, but the pattern of findings on the EFT effect on DD makes it less likely to us. On p. 12, we state, “An alternative interpretation of the DD modulation is that EFT cues simply shifted attention towards the future, or conferred a positive valence to it, as we encouraged positively valenced EFT. If so, however, one should consistently observe an EFT-induced benefit on DD also in patients with MTL lesions, but this is not the case (Kwan et al., 2015; Palombo et al., 2015).”

      Reviewer #3:

      In this manuscript, Ciaramelli et al. examined the decision-making behavior of 12 patients with vmPFC damage in a delay discounting task. The authors carried out two manipulations in this task: 1. They presented participants with small and large offers for both the immediate and delayed reward (magnitude manipulation), 2. They prefaced decisions with a cue prompting participants to vividly imagine an event in their future that was expected to occur at the same delay as the proposed larger offer (episodic future thinking (EFT) manipulation). Compared to age and education matched healthy controls, patients with vmPFC damage showed steeper discounting of delayed rewards, particularly when the amounts offered were large (reduced effect of magnitude). However, like controls, vmPFC damaged patients displayed shallower discounting of delayed rewards following the EFT manipulation.

      The manuscript is clear and concise in its presentation of the results, while still providing a detailed description of the behavior of these patients. This paper is also a good example of how pooling participants from multiple institutions can increase statistical power in a study of patients with focal brain damage targeting a fairly specific cognitive question. The positive results of the study mostly replicate previous findings. While the null result for the EFT manipulation is novel, the finding is hard to interpret. The authors state that they predicted that the EFT manipulation would not change discounting behavior in vmPFC damaged patients a priori despite the deficits of these patients in EFT in previous papers, which are also replicated here. However, I do not know why the authors would design their task in such a way to test for a null result. It is also not clear if this null result is observed for the reason proposed by the authors (that the EFT cues externally activate this process), or if this result is null for some other reason that is not accounted for here. As the authors do not provide a direct test for their hypothesized rationale for predicting this null result, the findings are hard to interpret.

      We agree with the reviewer’s and editor’s point that this paradigm does not allow testing whether subject-specific, personally relevant cues, such as those we used, are indeed effective in externally initiating EFT in vmPFC patients. Therefore, we concur that, for the sake of clarity, this is best presented only as speculative discussion of the preserved EFT effect on DD in vmPFC patients. In the Introduction, therefore, we now formulate only the hypothesis based on previous evidence of impaired EFT in vmPFC patients (e.g., Bertossi et al., 2016a,b, Verfaellie et al., 2019), which would lead to the prediction of a reduced EFT effect in vmPFC patients. We present the hypothesis that vmPFC is critical for schema instantiation only in the Discussion, as an explanation of the null finding on group differences on the EFT effect.

      P. 5 now reads: “Concerning prospection, previous studies have observed an EFT effect on DD, such that people discount future rewards less steeply if cued to imagine personal future events during intertemporal choice (Peters and Büchel, 2010; Benoit et al., 2011). Considering that vmPFC is implicated in prospection (Schacter et al., 2012) and that vmPFC patients are impaired in EFT (Bertossi et al., 2016a,b; Bertossi et al., 2017), vmPFC patients' DD should remain steep even when EFT cues are provided, because patients may nevertheless fail to construct the vivid future events that might be needed to counteract DD. Thus, we predict a reduced EFT effect on DD in vmPFC patients compared to healthy controls.”

      Overall, this manuscript makes a relatively modest contribution to our knowledge about the function of vmPFC during inter-temporal choice. It bolsters previous claims about how vmPFC damage impacts delay discounting and EFT, while not revealing new information about how vmPFC specifically contributes to the processes involved in these behaviors and why damage to this region impacts intertemporal choice in this way.

      We concur with the reviewer that our findings confirm previous evidence that vmPFC is necessary for balanced DD and for EFT. However, we think that our finding of a complete abolishment of the magnitude effect together with a complete preservation of the EFT effect on DD in vmPFC patients configures a remarkable theoretical advancement on the role of vmPFC in intertemporal choice. Indeed, it shows that during intertemporal choice vmPFC is more prominently implicated in reward valuation than in prospection. This finding is important for current theories of intertemporal choice, and is surprising considering previous demonstrations of impaired EFT in vmPFC patients (a finding that was replicated in the current study), and therefore has important implications also for theories relating to the role of vmPFC in EFT. Finally, we note that the paper focuses on one important facet of impulsivity following damage to the vmPFC in humans: steep DD. Our findings, therefore, may inform the clinical management of impulsivity in patients with vmPFC damage or dysfunction, delineating the contextual manipulations that are or are not expected to push the reach of patients' choice into the future.

    1. Author Response:

      Reviewer #1:

      This manuscript shows cell to cell variability in the relative levels of Sox2 and Brachyury (Bra) expression by individual cells within the region of the epiblast containing axial progenitors (the progenitor zone, PZ). Accordingly, some cells express high Bra and low Sox2 levels, others high Sox2 and low Bra and a third group expressing equivalent levels of both transcription factors. They then show that by experimentally promoting high Sox2 expression cells enter neural tube (NT) fates, whereas high Bra brings cells in the progenitor zone to enter the presomitic mesoderm (PSM). The authors then complement these experiments with evaluation of cell movements within the PZ, NT and PSM to show that cells in the NT are much less motile than those in the PZ and PSM. These data led the authors to propose a fundamental role for Sox2/Bra heterogeneity to maintain a pool of resident progenitors and that it is the high cell motility promoted by high Bra levels what pushes cells to join the PSM, whereas high Sox2 levels inhibit cell movement forcing cells to take NT fates. To validate their hypothesis, the authors generated a mathematical model to show that those expression and motility characteristics can indeed lead to axial extension generating NT and PSM derivatives in the proper positions, while keeping a PZ at the posterior end.

      Some specific comments on the manuscript are specified below.

      1) Although the description of cells within the PZ containing different Sox2 and Bra expression ratios is more explicit and quantitative in the present manuscript, this has already been previously reported by different methods including immunofluorescence (e.g., Wymeersch et al, 2016). Similarly, that breaking the Sox2/Bra balance towards high Sox2 or Bra is an essential step to bring the progenitors towards NT or PSM fates has also been previously shown in different ways. These observations are, therefore, not totally new. The novel contribution of this paper is the authors' interpretation that "heterogeneity among a population of progenitor cells is fundamental to maintain a pool of resident progenitors". In this work, however, this conclusion is only supported by their mathematical simulation, as the experiments described in this manuscript are not aimed at homogenizing Sox2/Bra expression levels in the progenitor cells (meaning keeping the double positive feature) but, instead, forcing the progenitors to express Sox2 or Bra alone, which permits evaluation of differentiation routes rather than how to maintain the resident progenitor pool. Interestingly, their alternative mathematical model in which the relative Sox2/Bra levels follow an anterior-posterior gradient (which is actually a feature observed in the embryo) was also successful in producing an extending embryo. This model was not favored by the authors (but see my comment below). According to this model, the progenitor zone could be maintained by a cell pool containing equivalent Sox2/Bra levels; when this balance is broken cells eventually enter NT or PSM routes. Therefore, while expression heterogeneity can be observed in the PZ, I am not sure that the work shown in this manuscript is conclusive enough to claim an essential role of such heterogeneity to maintain the progenitor pool.

      We acknowledge that regional heterogeneity of Sox2 and Bra has been described in the PZ and we made sure that we cite the bibliography including Wymeersch et al, 2016 and Kawachi,2020. Although these papers described different levels of Sox2 and Bra in the PZ, they did not clearly reported and quantified the fact that direct neighboring cells have very different levels of Sox2 and Bra, therefore we believe that our description of a “random-like” pattern of heterogeneity constitutes a real novelty. In the same lines, we are aware of the several studies independently showing that gain or loss of-function of Sox2 or Bra can act on the progenitor decision to join either the NT or the PSM (these references are cited l.70, l.72). However, we believe that our study is the first to test systematically both overexpression and downregulation of Sox2 and Bra on progenitor distribution in the same biological system and to link Sox2/Bra functions to cellular motility.

      Testing the requirements of spatial cell-to-cell heterogeneity to maintain a pool of progenitors is experimentally challenging and even if we were able to homogenize Sox2 and Bra expression, we would have to do it in all progenitors, which is not, so far, technically possible using bird embryo as a model system. We are well aware of these limitations and have toned down claims on the essential role of heterogeneity to maintain progenitor pool. In particular, we have changed the abstract (we removed the last sentence stating that heterogeneity is fundamental to maintain a pool of resident progenitors), as well as the end of the introduction (we removed “while progenitors expressing intermediate/equivalent levels of the two proteins tend to remain resident”). We have pondered our model in the discussion in saying by cell with comparable levels of Sox2 and Bra “could” remain resident (L.370)

      To better apprehend the role of cell-to-cell spatial heterogeneity, we have developed a new mathematical model (Figure 5) which integrates both gradient and random heterogeneity in Sox2/Bra values within the PZ and thus fits better to our biological results. In the new version of the manuscript, we compared this model with a model in which the PZ is fully gradient-like and second one in which it is completely random. These comparisons allow us to describe better what properties random and patterned heterogeneities could bring to the system (Figure 6).

      2) The other main novelty of this manuscript is the idea that differences in cell motility derived from their Sox2 or Bra contents are a major force driving the generation of NT and PSM from the progenitors in the PZ. While there are clear differences between cell motility in the NT and the other two regions, the differences between what is observed in the PSM and PZ is not that high (actually, from the data presented it is not clear that such differences actually exist). However, independently of motility differences, there is no experimental evidence demonstrating that the essential driver of the cell fate choices is motility itself. Differences in cell motility could be just one of the results of more fundamental (and causal) changes in cell characteristics triggered by Sox2 or Bra activity. Indeed, NT and PSM cells are different in many different ways, including adhesion properties, which are normally a major determinant of tissue morphogenesis. Cell motility could, therefore, be one of the factors but it is not clear that it plays the essential role proposed by the authors. (see also next comment).

      Cell motility distributions in the PZ are slightly different from that of the PSM since slower cells were found in the PZ. We agree with the reviewer that this difference might be difficult to see because the average motilities between the two tissues are very similar (Figure 3 and Figure 3-figure Supplement 1). To reveal this difference more clearly we have used a reporter gene for Sox2 and analyze progenitor motility by time lapse imaging. We have specifically tracked GFP positive cells (reporter gene for Sox2) in the PZ and compared them to cells which are not expressing GFP. The result is that Sox2 high progenitors are globally slower than other progenitors clearly revealing heterogeneity in cell movements within the PZ and its relation to Sox2 expression (L.225-232, Figure 3-figure Supplement 1B, video 2).

      We agree that there is no experimental evidence that motility itself is the driver of the cell fate choices. To test if the effect on cell motility is taking place downstream of differentiation events, we have analyzed the expression of markers for mesodermal and neural fate (Msgn1 and Pax6) 7hrs after overexpression of Sox2 and Bra. While Sox2 or Bra overexpression triggers changes on cell motility in this short time window, we did not observe any changes in Msgn1 and Pax6 expression (L.267-274, Figure 4-figure Supplement 2) arguing that the effect on motility is an early consequence of the Bra and Sox2 misexpression. Nevertheless, we are aware that this is not a strict demonstration that the effect on fate are coming from the differential motility only. We have therefore toned down our arguments and changed the title of the manuscript (“....guides destiny by controlling their motility “ has been replaced by “...guides motility and destiny”) .

      The effects on cell motility we observe could be a consequence of Sox2 and Bra effect on adhesion as suggested by the reviewer, this is an interesting possibility that we cannot and don’t want to rule out. The effect on cell adhesion is taken into account in our model and we discuss this hypothesis in the new version of the manuscript (L. 456-459). Identifying the mechanisms underlying the effects of Sox2 and Bra on cell motility is an extremely interesting project we want to pursue but we consider that this aspect goes beyond the scope of the current manuscript.

      3) The authors developed a mathematical model to confirm their hypothesis that Sox2/Bra expression diversity combined with different motility of cells with high, low or intermediate relative levels of Sox2 and Bra expression are the key to guarantee proper axial elongation from the PZ. I am, however, not sure that the model, the way it was designed, actually proves their point. In particular, because it introduces an additional variable that might actually be the essential parameter for the success of the mathematical model: physical boundaries between NT and PSM cells, meaning that cells with high Sox2 or high Bra are unable to mix. As I commented above, this variable reflects a key biological property of the two tissues involved, one epithelial and the other mesenchymal in nature, which might be more relevant that the motility of the cells themselves (e.g. by different cell adhesion properties). How would a model that does not include such physical barriers work? Conversely, how would a model work in which only physical barriers are applied, using similar starting conditions: a prefigured central neural tube (Sox2 high), flanked at both sides by PSM (Brachyury high) and with the PZ (variable Sox2/Bra levels) just posterior to the neural tube?

      We agree that adhesion and non-mixing properties are essential to our models. Because it was not clear in the previous version, we have explained them in more details in the new version of the manuscript (l.295-300 and Appendix 1). To assess their roles, we have made two new simulations one without the regulation of non-mixing /adhesion properties and one without motility control by Sox2/Bra. Both simulations show strong defects in morphogenesis arguing that motility on its own is a key component of the system and that the non-mixing and adhesion properties are also important but not sufficient to drive morphogenesis (Figure 5F). Having the same non-mixing/adhesion and motility properties downstream of Sox2 and Bra in all our models allows us to isolate the phenomena we wish to study: the role of the distribution of cell -to cell heterogeneity in the PZ (Figure 6).

      4) The authors generate two mathematical models, differing in whether they start with a random distribution of Sox2 and Bra expression throughout the PZ or with prefigured opposing Sox2 and Bra expression gradients, somehow resembling the image observed in the embryo. The two models generated structures resembling the elongating embryo, although with small differences in the extension process and the extension rate. After analyzing the behavior of those models, they concluded that the random model fits better with the expectations from the in vivo characteristics in the embryo. I am however not sure that I agree with the authors' interpretation. First, because the gradient model includes a natural characteristic observed in the embryo, which the random model does not. Second, because one of the deciding characteristics, namely the slower extension rate observed in the gradient model, does not necessarily make it worse than the random model, as it is not possible to properly determine which extension rate actually resembles more accurately axial extension in the embryo. Third, because the observation that in the gradient model the PZ undergoes fewer transient deformations and self-corrective behaviour is in my view an argument to favor, instead of to disfavor the gradient model, both because the final result is at least as good as the one obtained with the random model and it is actually not clear that in the embryo the PZ undergoes such clearly visible deformations and self-corrections during axial extension. In addition, the gradient model generates a "pure" PZ (just yellow cells) in the posterior end of the structure, while in the random model the PZ contains some islands of NT cells, which is not what is observed in the embryo. According to the last features, the gradient model seems better than the random model.

      To answer the reviewer’s concern about similarity to the embryo, we have developed a new model that is clearly closer to the biological system because it integrates both the gradient and the random ratio distributions (new Figure 5). Interestingly, by comparing it to the two extreme models (random and gradient), we found that this more “natural” model combines the stability and fluidity brought by the gradient model and the random model, respectively. As pointed out by the reviewer, we found that graded distribution brings more stability to the system with a “purest” PZ. At the opposite, random distribution allows more tissue fluidity and cell rearrangements as well as tissue shape conservation (Figure 6). We want to thank the reviewer for his or her input; we think that the new model and the comparison with the two extreme cases allowed us to reveal more clearly properties that are specific to the two types of spatial distributions and therefore to point out what general morphogenetic properties could emerge from random- like heterogeneity in the embryo.

      Reviewer #2:

      In this manuscript, Romanos et al show firstly that there is extensive cell-to-cell heterogeneity in the relative levels of Sox2 and Bra in the region containing progenitors for neural and paraxial mesoderm, gradually resolving towards high Bra/low Sox2 in the mesoderm or high Sox2/low Bra in emerging neurectoderm. They then show that overexpression of Sox2/morpholino-based inhibition of Bra or vice versa lead cells to favour neurectoderm or mesoderm respectively. Next they show that cells expressing high Bra are more motile than those expressing Sox2, and show using mathematical modelling that these behaviours can explain many aspects of the eventual segregation of Sox2-high neurectoderm and Bra-high mesoderm.

      This interesting and well-presented work leads to the elegant and novel hypothesis that random cell motility induced by Bra and inhibited by Sox2 are sufficient to explain the segregation of NMps towards mesoderm and neurectoderm respectively. The work will be of broad interest to developmental and mathematical biologists interested in the cell biological basis of self-organising cell behaviours. Nevertheless there are some concerns to address in order to solidify the claims in the manuscript.

      1) The section where Sox2 and Bra levels are manipulated (line 152 onwards) is somewhat under-analysed. Results are presented as supporting a model where the two proteins mutually repress each other and lead to segregation of neural (high Sox2) and mesodermal (high Bra) cells. However the data presented does not unequivocally support the claims in the manuscript and would require further clarification.

      In the new version of our manuscript, we give more details on the analysis of Sox2 Bra levels manipulations. In particular, we provide data showing the tissue localization of manipulated cells on transverse sections (L. 192, Figure 2-figure supplement 3). We have also studied the effects of Sox2 and Bra ovexpression on cell fate maturation in the PZ and provide some evidence that progenitors do not yet express differentiation markers as they acquire specific motile properties in response to Sox2 or Bra overexpression (L. 267-273, Figure 4-figure supplement 1). According to our results and to the literature, we revised the text by removing mentions to Sox2 and Bra mutual repression (L 171, L 386, L389).

      2) The mathematical model may be an oversimplification of the role of these two genes in organising a balanced production of neurectoderm and mesoderm.

      In the new version of our manuscript, we have made significant efforts to better explain how non- mixing properties are taken into consideration in our models and thus, hopefully, to avoid an impression of oversimplification. We would like to point out that simulations performed to evaluate the impact of non-mixing properties on the elongation process, indicate that adhesion and non- mixing properties alone cannot account for the morphogenetic events we modelled (new Figure 5F), thus reinforcing the view that regulation of cell motility is a key element in the system. Furthermore, we have designed a new mathematical model, which is closer to the biological system because it integrates both graded and random distribution of Sox2/ Bra values (as observed in vivo) (new Figure 5). As explained above in response to reviewer 1, comparison of this model with our previous models, based on either graded or random distribution of the Sox2/ Bra values, points out the importance of random like cell-to-cell heterogeneity in this morphogenetic process.

      Reviewer #3:

      The manuscript by Romanos and colleagues examines how Sox2 and Brachyury control the behavior and cell fate of neuro-mesodermal progenitors (NMPs) in avian embryos. Using immunohistochemistry, the authors showed that the cells residing in the progenitor zone (PZ) display high variability in Sox2/Bra expression. Manipulation on the levels of the two transcription factors affected NMPs' choice to stay or exit the PZ and their future tissue contributions. This motivated the authors to employ an agent-based computational model and additional functional experiments to explore the importance of Sox2/Bra for cellular motility. The results led the authors to propose that (i) heterogeneity in Sox2/Bra ratio is important for the spatial organization of the PZ and its derivatives and that (ii) Sox2/Bra determine the fate of progenitor cells by controlling cellular movements.

      This is a technically sound report that combines single-cell analysis, in vivo functional experiments, and mathematical modeling to explore the link between cell motility and cell identity. While the model proposed by the authors is intriguing, I found that the study should provide evidence placing Sox2/Bra as primary regulators of cell motility in the context of the PZ. Given the extensively-studied role of these transcription factors in NMPs, it is challenging to decouple cellular behavior from cellular identity during tissue formation. The study would benefit from further demonstration that cell fate commitment is regulated by - and not a regulator of - cell migration of NMPs.

      We have now tested the effect of Sox2 and Bra overexpression on cell identity. We show that, 7 hrs after electroporation (a time at which we observe an effect on cell movement), no modification of the expression of neural (Pax6) and mesodermal (Msgn1) maturating markers. These data thus indicate that the effect on cell motility happens without a major acceleration of the maturation program (Figure 4 figure supplement 2). However, as mentioned in response to Reviewer 1, these experiments are correlative and do not demonstrate that the effect of Sox2 and Bra on neural and mesodermal differentiation programs are going only thought cell motility, therefore we have accordingly toned down our arguments in the new version of our manuscript.

      Strengths and Weaknesses:

      • The idea that heterogeneity in cellular behaviors within a progenitor field may act as a driver of morphogenesis is interesting and nicely supported by the agent-based model.

      We want to thank the reviewer for this comment. We believe that in the new version of the manuscript we go even further by developing a new model (Figure 5) which is closer to reality and by testing the influence of random versus gradient Sox2/Bra distribution on morphogenesis (Figure 6)

      • One of the premises of the model (Fig 4) is that Sox2/Bra ratio determines how much cells move, but this is not clear from the in vivo experiments and seems speculative. A clear demonstration of correlation between Sox2/Bra ratio and cellular motility is necessary for proper support of the model.

      The role of the Sox2 to Bra ratio on PZ cell motility is demonstrated in Figure 4. In the new version of the manuscript, these results are presented before the modelling section, we hope that it would help clarifying any doubt the reader can have on the fact that we do demonstrate clearly a role of Sox2 and Bra in controlling PZ cell motility in vivo.

      • The authors found that manipulation in the levels of the TFs results in changes in NMP motility, but it is not clear if this the cause or a consequence of commitment to a neural or mesodermal fate. Could Bra-High cell moving more because they have been specified to a mesodermal fate? Conversely, Sox2-High cells might migrate less since they get incorporated into the neural tube. Establishing the timing of cell fate commitment is necessary to resolve this issue

      We agree with the reviewer that it is an interesting issue; we have checked for expression of specification markers 7hrs after electroporation of Sox2 and Bra expression vectors, a time point at which electroporated cells did not yet leaved the PZ but have already changed their motility. In these conditions, overexpression of Sox2 and Bra had no discernable effect on expression of the neural marker Pax6 and on the PSM marker Msgn1, respectively (Figure 4 figure supplement 2).

      • The study's impact and novelty depend on the demonstration that the primary function of Sox2/Bra in NMPs is to drive cell movement. This is not sufficiently explored in the study, and there are no proposed mechanisms for how Sox2/Bra modulate cellular behavior.

      We do have shown that Sox2 and Bra act on progenitor motility in vivo (Figure 4). As a mechanism, we propose that Sox2 and Bra could act directly on motility or indirectly by regulating differential adhesion. Cell adhesion control by Sox2/Bra is part of our modeling assumptions and is therefore a hypothesis that will be the subject of future investigations in the lab. This hypothesis is part of the discussion in the new version of the manuscript (L.457).

    1. Author Response:

      Reviewer #1 (Public Review):

      [...] Strengths:

      1. The loss of ciliary GPR161 has a more robust phenotype in specific tissues (i.e., the limbs and face). As a result, the limb data (in Figure 6) and craniofacial data (in Figure 7) are well presented and clear. In these figures, the authors directly compare and highlight differences between primarily two genotypes (wt and Gpr161mut1/mut1 embryos) and quantify the changes (digit number and distance between nasal pits). Overall, these two figures support the existing GPR161 model, showcasing that a loss of ciliary GPR161 results in a tissue-specific loss of GLI3R (Figure 6D) and consequently the development of additional digits (Figure 6E) and craniofacial defects (Figure 7D and 7E).

      Thank you.

      Weaknesses:

      1. There is no data in the paper showing that Gli3 repressor function is affected preferentially compared to Gli Activator function. In Figure 4C, Gli3 FL/R ratios are not different between wt/wt and mut/mut embryos. The data can be explained by the fact that the mutant Gpr161 is a partial loss of function allele and the resultant weaker phenotypes (compared to the full KO) show some tissue specificity. Linking this allele to a specific biochemical mechanism is not justified by the data.

      We have now revised the title of the paper and the discussion emphasizing on these limitations. We have also added a new section in discussion on the limitations of our methods and other optogenetic/chemogenetic methods for generating cAMP in cilia. These limitations arise from the cilioplasm not being strictly restricted from the cytoplasm. Therefore, the second messengers cAMP and Ca2+ are freely diffusible between ciliary and extraciliary compartments (Delling et al., 2016; Truong et al., 2021). A paper published in Cell during revision of this study used optogenetic tools to show that ciliary, but not cytoplasmic, production of cAMP functions through PKA localized in cilia (Truong et al., 2021) to repress sonic hedgehog-mediated somite patterning in zebrafish (Wolff et al., 2003). We have also compared and discussed these results with our study. Our study highlights that the effects of ciliary loss of Gpr161 pools are tissue specific and dependent on the requirements of the tissues on GliR vs GliA in the morpho-phenotypic spectrum. Overall, our results using Gpr161mut1 allele are complementary to the optogenetic study by showing that lack of ciliary Gpr161 pools result in Hh hyperactivation phenotypes arising mainly from lack of GliR, in the limb buds, mid-face and intermediate neural tube.

      1. The authors use an endpoint assay based on overexpression in 293T cells to claim that cAMP production is unaffected by the Gpr161mut allele. However, weak effects (very likely given the weak phenotypes) may not be evident this assay. We also do not know if the mutant allele is defective in some other biochemical function or in localization to other places in the cell. One way to address this is to measure ciliary and extraciliary cAMP in their knock-in cells. In Gpr161mut1/mut1 cells, is ciliary cAMP reduced to levels comparable to Gpr161ko/ko cells? Is extraciliary cAMP unchanged compared to WT cells? Or, is cAMP able to diffuse into the cilia from GPR161mut1 localized to vesicles at the ciliary base (Figure 1B)? Many of the conclusions made in the paper equate a loss of ciliary GPR161 to a loss of ciliary cAMP, but this loss of ciliary cAMP is not definitively shown in the paper.

      As physiological ligands for Gpr161 are currently not known, we are unable to test extraciliary vs ciliary contribution of Gpr161 in cAMP production in a physiological context. Therefore, we resort to overexpression assays for constitutive cAMP production by Gpr161 and Gpr161mut1. Using these assays, we do not find a difference in constitutive activity among these variants.

      As the cilioplasm is not strictly compartmentalized from the cytoplasm, the second messengers cAMP and Ca2+ are freely diffusible between ciliary and extraciliary compartments (Delling et al., 2016; Truong et al., 2021). Thus, in any approach for generating subcellular pools of cAMP, be it genetic, optogenetic or chemogenetic (Guo et al., 2019; Hansen et al., 2020; Truong et al., 2021), extraciliary cAMP could diffuse into ciliary compartments. A recent paper using optogenetic and chemogenetic tools for cAMP production inside cilia or in cytoplasm show that there is free access of cytoplasmic cAMP to intraciliary compartments but is unable to reach critical thresholds in activating PKA (Truong et al., 2021). Thus, we would assume that the extraciliary cAMP produced by extra copies of Gpr161mut1 could diffuse to cilia but is likely to be less effective in activating downstream effectors. In addition, the PKA regulatory subunit-AKAP complexes are fundamentally important in organizing and sustaining PKA catalytic subunit activation to organize localized substrate phosphorylation in restrictive nanodomains (Bock et al., 2020; Zhang et al., 2020). The dual functions of Gpr161 in Gs coupling and as an atypical AKAP (Bachmann et al., 2016) is likely to further restrict cAMP signaling in ciliary or extraciliary microdomains.

      1. Compared to Figures 6 and 7, the data presented in Figures 3 and 5 are very confusing and difficult to interpret. On the one hand, this is understandable, the Gpr161mut/mut phenotypes are complex, and some tissues (like the developing spinal cord) are more resistant to change due to a loss of GliR. On the other hand, the data collected from the numerous genotypes analyzed could be easier to interpret by (i) providing a penetrance of the phenotypes and (ii) quantifying the phenotypes.

      Thank you for all the suggestions. We have now carried out these quantifications or tabulations, which have considerably improved the presentation of the datasets (Table 2 and Figure 5-figure supplement 1). Some of these experiments required additional experimental animals (Table 1), and we have updated the text accordingly.

      Below are a few examples of data that could be improved with quantifications:

      — In Figure 3, the authors are trying to convey that the Gpr161mut allele is partially functional and produces a milder phenotype than the Gpr161ko allele. However, the Gpr161ko/ko, Gpr161mut/ko, and Gpr161mut/mut phenotypes showcased in the figure all look quite severe, and it is difficult to appreciate the differences in the defects fully. An accompanying table summarizing the phenotypes and their penetrance in the affected genotypes would help to convey this point.

      We have added an accompanying Table 2 summarizing the phenotypes and penetrance for the respective genotypes, when present. Please note that rostral malformations such as exencephaly are similar between Gpr161 ko/ko and Gpr161 ko/mut1, whereas Gpr161 mut1/mut1 embryos have mid face widening. In the same line, Gpr161 ko/ko has no forelimbs, whereas Gpr161 ko/mut1 has smaller fore limb buds, whereas Gpr161 mut1/mut1 embryos have polydactyly.

      — In Table 1, the authors note that the Gpr161mut1/mut1 mouse is embryonic lethal by e14.5, but the analysis in Table 1 appears to be incomplete. In the table titled "breeding between Gpr161 mut1/+ parents," the authors indicate that they only assessed one litter of e14.5 and e15.5 embryos. Oddly, the authors note that additional litters were collected, but the embryos were not genotyped because the embryos exhibited no phenotypes. The absence of phenotypes could be due to an absence of viable Gpr161mut1/mut1 embryos; however, the embryos need to be genotyped and a chi-square analysis conducted to verify this. Death can be a measure of phenotype severity, but I think it is important to surmise why the embryos are dying. It is unclear whether the embryos are dying due to the heart defects mentioned in the discussion. If the embryos are dying due to the heart defect, then it would be important to know whether the heart defects are more severe in the Gpr161ko/ko embryos.

      Our apologies for the oversight. We have now analyzed additional timed pregnancies at E14.5, E14.75 and E15.5. We find that the embryonic lethality is seen fully by E14.75. Heart defects in Gpr161 ko/ko embryos are not apparent as they are E10.5 lethal. We do see apparent heart defect phenotypes in Gpr161 ko/mut1 vs Gpr161 mut1/mut1. These defects include pericardial effusion, outflow tract defects, A-V cushion abnormalities and smaller ventricles. These phenotypic descriptions are beyond the scope of the current paper. However, we have mentioned about pericardial effusion in the text and Table 2.

      — In Figure 5, quantifying the progenitor domains would greatly assist in discerning differences between the various genotypes. For example, a quantification would help readers assess differences in NKX6.1 across the various genotypes.

      We have now quantified the differences in Nkx6.1 across genotypes. The data is presented in Figure 5-figure supplement 1.

      On an unrelated note, the PAX7 staining of the Gpr161mut1/ko spinal cord looks very strange because the line adjacent to the image does not accurately represent the dorsal-ventral patterning of PAX7 seen in the image. This image would need to be replaced.

      Our apologies for the oversight. We have now revised this image.

      Reviewer #2 (Public Review):

      The premise of the entire study is predicated on GPR161mut1 failing to target to cilia and being WT in every other aspect. The Gs coupling of GPR161mut1 is examined. The ciliary localization ofGPR161mut1 is carefully assessed by conducting staining not just in WT cells but also in INPP5Ecells where GPR161 ciliary levels are known to be elevated. Another prediction is that GPR161mut1is found in an intermediate biosynthetic compartment. Some insights into the compartment whereGPR161mut1 is found would help interpret the phenotype of the GPR161mut1 animals. It would be important to know whether the GPR161mut1 mimics a pre-cilia targeted GPR161 (say at the plasma membrane) or whether it mimics a post-ciliary exit state (say recycling endosomes). In the past few years, work from the von Zastrow lab and others has shown that GPCRs keep activating their downstream partners after endocytosis from the plasma membrane. If GPR161mut1 were to mimic the post-ciliary exit state of GPR161, it may assume some of the signaling functions of ciliaryGPR161.

      Thank you for all the suggestions. We have now examined and extensively discussed the plausible source of extraciliary Gpr161 in mediating Hh repression. We already showed that Gpr161 localizes to the periciliary recycling endosomal compartment where it localizes in addition to cilia (Mukhopadhyay et al., 2013) and could activate ACs and PKA in proximity to the centrosome. We now show that Gpr161mut1 also localizes to similar compartments (Figure 1-figure supplement 3). We propose that this compartment could promote Gpr161 activity outside cilia in the in vivo settings in GliR formation (please see model in Figure 8D).

      We also compare our results with a recently published paper showing that ciliary, but not cytopasmic, production of cAMP functions through PKA localized in cilia to repress sonic hedgehog-mediated somite patterning in zebrafish (Truong et al., 2021). While this paper is an elegant demonstration of ciliary pools of cAMP in repressing Hh activity despite having no strict compartmentalization exclusively in cilia, it does not capture the roles of ciliary and extraciliary pools of Gpr161-mediated cAMP signaling in different tissues that we show are dependent on the requirements of the tissues on GliR vs GliA in the morpho-phenotypic spectrum.

      A second point that the authors may wish to address is whether GPR161mut1 may fail to enrich in cilia because it is hyperactive and undergoes constitutive exit from cilia. The hypothesis here is thatGPR161mut1 couples to beta arrestin better than WT GPR161. Blocking GPR161mut1 exit via depletion of beta arrestin or BBSome is a simple way to test this hypothesis.

      As advised by the reviewer, we have tested for Gpr161/Gpr161mut1 levels in cilia upon arrestin1/2 or BBSome loss. These experiments show that Gpr161mut1 is not present in cilia in arrestin1/2 (Arrb1/2) double ko MEFs (Figure 1-figure supplement 1) or upon RNAi of BBS4 (Figure 5-figure supplement 2). We previously also showed that knockdown of the 5’phosphpatase INPP5E that causes accumulation of Gpr161 in cilia does not show any accumulation of Gpr161mut1 in cilia. Based on all these experiments, we surmise that Gpr161mut1 does not transit through cilia.

      Finally, it would be good to learn about the levels of expression of GPR161mut1 compared to WTGPR161 using immunoblotting. If GPR161mut1 were to be expressed at much higher levels than WTGPR161, it may compensate for its lack of ciliary localization by elevated total cellular activity.

      We were unable to determine protein stability of the mutant receptor in the Gpr161mut1 embryos due to technical constraints in immunoblotting for endogenous levels. However, we note Gpr161mut1 in vesicles surrounding the base of cilia (Figure 1B) and constitutive cAMP signaling activity (Figure 1G, Figure supplements 1-3) in stable cell lines, suggesting that protein levels and activity of the mutant were comparable with wild type Gpr161. As suggested by the reviewer, we also tested LAP-tagged Gpr161mut1protein levels by tandem affinity purification and immunoblotting, with respect to LAP-tagged Gpr161wt in MEFs stably overexpressing these variants. We noted similar immunoblotting pattern from receptor glycosylation in both variants (Figure 2-figure supplement 2).

    2. Reviewer #1 (Public Review):

      The authors created a new GPR161 mutant mouse (Gpr161mut/mut) in which GPR161 does not localize to the primary cilium but is still cAMP signaling competent based on an over-expression assay in 293T cells. Through a detailed analysis of the Gpr161mut/mut mouse and its comparison to a previously generated Gpr161 knockout mouse (Gpr161ko/ko), the authors try to discriminate the ciliary and non-ciliary roles of GPR161. The current prevailing model is that GPR161 (localized to the primary cilium in the absence of Hh pathway activation) is constitutively active and elevates cAMP levels within the primary cilium. Elevated ciliary cAMP then activates ciliary (or ciliary adjacent) PKA, driving the processing of bifunctional GLI proteins into transcriptional repressors (GLIR). According to this model, the ciliary pool of GPR161 is critical for suppressing Hh signaling activity, and one would predict that the Gpr161mut/mut embryos would look identical to the Gpr161ko/ko embryos. However, this was not the case. Across multiple developmental tissues, the Gpr161mut/mut phenotype is less severe than the complete knockout, suggesting a role for non-ciliary GPR161 in suppressing Hh signaling activity. The observations made in this paper are interesting, but the data fails to make a clear distinction between the ciliary and non-ciliary roles of GPR161.

      Strengths:

      1. The loss of ciliary GPR161 has a more robust phenotype in specific tissues (i.e., the limbs and face). As a result, the limb data (in Figure 6) and craniofacial data (in Figure 7) are well presented and clear. In these figures, the authors directly compare and highlight differences between primarily two genotypes (wt and Gpr161mut1/mut1 embryos) and quantify the changes (digit number and distance between nasal pits). Overall, these two figures support the existing GPR161 model, showcasing that a loss of ciliary GPR161 results in a tissue-specific loss of GLI3R (Figure 6D) and consequently the development of additional digits (Figure 6E) and craniofacial defects (Figure 7D and 7E).

      Weaknesses:

      1. There is no data in the paper showing that Gli3 repressor function is affected preferentially compared to Gli Activator function. In Figure 4C, Gli3 FL/R ratios are not different between wt/wt and mut/mut embryos. The data can be explained by the fact that the mutant Gpr161 is a partial loss of function allele and the resultant weaker phenotypes (compared to the full KO) show some tissue specificity. Linking this allele to a specific biochemical mechanism is not justified by the data.

      2. The authors use an endpoint assay based on overexpression in 293T cells to claim that cAMP production is unaffected by the Gpr161mut allele. However, weak effects (very likely given the weak phenotypes) may not be evident this assay. We also do not know if the mutant allele is defective in some other biochemical function or in localization to other places in the cell. One way to address this is to measure ciliary and extraciliary cAMP in their knock-in cells. In Gpr161mut1/mut1 cells, is ciliary cAMP reduced to levels comparable to Gpr161ko/ko cells? Is extraciliary cAMP unchanged compared to WT cells? Or, is cAMP able to diffuse into the cilia from GPR161mut1 localized to vesicles at the ciliary base (Figure 1B)? Many of the conclusions made in the paper equate a loss of ciliary GPR161 to a loss of ciliary cAMP, but this loss of ciliary cAMP is not definitively shown in the paper.

      3. Compared to Figures 6 and 7, the data presented in Figures 3 and 5 are very confusing and difficult to interpret. On the one hand, this is understandable, the Gpr161mut/mut phenotypes are complex, and some tissues (like the developing spinal cord) are more resistant to change due to a loss of GliR. On the other hand, the data collected from the numerous genotypes analyzed could be easier to interpret by (i) providing a penetrance of the phenotypes and (ii) quantifying the phenotypes. Below are a few examples of data that could be improved with quantifications:

      — In Figure 3, the authors are trying to convey that the Gpr161mut allele is partially functional and produces a milder phenotype than the Gpr161ko allele. However, the Gpr161ko/ko, Gpr161mut/ko, and Gpr161mut/mut phenotypes showcased in the figure all look quite severe, and it is difficult to appreciate the differences in the defects fully. An accompanying table summarizing the phenotypes and their penetrance in the affected genotypes would help to convey this point.

      — In Table 1, the authors note that the Gpr161mut1/mut1 mouse is embryonic lethal by e14.5, but the analysis in Table 1 appears to be incomplete. In the table titled "breeding between Gpr161 mut1/+ parents," the authors indicate that they only assessed one litter of e14.5 and e15.5 embryos. Oddly, the authors note that additional litters were collected, but the embryos were not genotyped because the embryos exhibited no phenotypes. The absence of phenotypes could be due to an absence of viable Gpr161mut1/mut1 embryos; however, the embryos need to be genotyped and a chi-square analysis conducted to verify this. Death can be a measure of phenotype severity, but I think it is important to surmise why the embryos are dying. It is unclear whether the embryos are dying due to the heart defects mentioned in the discussion. If the embryos are dying due to the heart defect, then it would be important to know whether the heart defects are more severe in the Gpr161ko/ko embryos.

      — In Figure 5, quantifying the progenitor domains would greatly assist in discerning differences between the various genotypes. For example, a quantification would help readers assess differences in NKX6.1 across the various genotypes. On an unrelated note, the PAX7 staining of the Gpr161mut1/ko spinal cord looks very strange because the line adjacent to the image does not accurately represent the dorsal-ventral patterning of PAX7 seen in the image. This image would need to be replaced.

    1. If we should simply found a few professorships, of such a nature as to attract attention on account of a special degree of distinction attached to them, it would go far to remove the prejudice which now exists against the idea of college professorships held by women. The plan that I have in mind is this: Instead of waiting for the colleges to offer professorships to our young doctors of philosophy, I would suggest that we offer our young doctors of philosophy as professors to the colleges -- and not in the way of founding fixed professorships in any given college, but rather of establishing what may be called peripatetic professorships, to be held, in any particular case, by our most available young woman and at the college or the university which shall best fulfil certain requirements of ours which I shall state in a moment.

      While Franklin has a great suggestion for how the professorship should be setup, I think she makes a great point that, like men, women should be sought after to fill these positions. So, instead of waiting for someone to stop in to claim the position, they should instead seek out the brilliant minds to fill the position.

    1. Reviewer #1 (Public Review):

      1) The user manual and tutorial are well documented, although the actual code could do with more explicit documentation and comments throughout. The overall organisation of the code is also a bit messy.

      2) My understanding is that this toolbox can take maps from BigBrain to MRI space and vice versa, but the maps that go in the direction BigBrain->MRI seem to be confined to those provided in the toolbox (essentially the density profiles). What if someone wants to do some different analysis on the BigBrain data (e.g. looking at cellular morphology) and wants that mapped onto MRI spaces? Does this tool allow for analyses that involve the raw BigBrain data? If so, then at what resolution and with what scripts? I think this tool will have much more impact if that was possible. Currently, it looks as though the 3 tutorial examples are basically the only thing that can be done (although I may be lacking imagination here).

      3) An obvious caveat to bigbrain is that it is a single brain and we know there are sometimes substantial individual variations in e.g. areal definition. This is only slightly touched upon in the discussion. Might be worth commenting on this more. As I see it, there are multiple considerations. For example (i) Surface-to-Surface registration in the presence of morphological idiosyncracies: what parts of the brain can we "trust" and what parts are uncertain? (ii) MRI parcellations mapped onto BigBrain will vary in how accurately they may reflect the BigBrain areal boundaries: if histo boundaries do not correspond with MRI-derived ones, is that because BigBrain is slightly different or is it a genuine divergence between modalities? Of course addressing these questions is out of scope of this manuscript, but some discussion could be useful; I also think this toolbox may be useful for addressing this very concerns!

    1. One of the first material scientists I spoke to about making things that last for thousands of years offered a compelling insight: “Everything is burning, just at different rates.” What he means is that what we perceive as aging is actually oxidisation, like rusting. When we imagine materials that may last for thousands of years, most people think of stone or precious metals like gold – because they don't oxidise readily. But even bodies can be preserved for millennia if stored in the right chemical environment, as the mummies of Egypt demonstrate.

      A fascinating take on "everyone is dying"

    1. Anne: What was family life like with you and your brother and your mother and father? Did you guys speak English at home? Did you do American things, activities? Do they work a lot? Tell me a little bit about family life.Juan: Right now, my dad, he's always been the boss of the family. He's always worked, he works in construction, and as you know, Utah, with the climate change, it snows, it rains, all of the climates. Since he works in construction, he does work outside all the time, so even if it snows or even if it rains, even if it's minus five degrees outside, he still goes out and works because nobody's going to give him the money to provide for his family.Juan: In a way, my dad, you can say he's one of those hard working men who doesn't look out for himself, but rather looks out for his family. In my house we spoke Spanish all the time because of my mom. To this day, she doesn't want to learn English even though we tell her to learn English. My little sister, she doesn't speak Spanish, she speaks more English and with her it's different. We tell her, "You have to learn Spanish because it's going to help you," but she doesn't want to learn.Anne: Is she a citizen?Juan: Yes, she was born in the US. So my parents didn't really adapt to the American culture. They always wanted to follow Mexican traditions, even when it's Mother's Day over there … I think here it's May 10th but over there, when is Mother's Day?Anne: I think it's the second Sunday of May, so it could be different days.Juan: We could take that as an example. They'd rather follow Mother's Day here in Mexico than over there. Also Christmas, I guess the one thing they did adapt to was Thanksgiving. We don't celebrate that here in Mexico, but they do celebrate there, and they did adapt that. Another thing, Easter day. You go out with your family, you hide the eggs as a tradition, no? They adapted to that, but here in Mexico they don't do that. They don't even know about that. In a way they wanted to keep their Mexican culture alive even though they were in the US, but they also wanted to adapt to the things that they did there.

      Time in the US, Homelife, Mexican traditions, Holidays, Spanish language, US traditions, Holidays

    1. Anne: I see.Ben: I mean it's a nice house. It's up in the mountains and I had a lot of family members, including my wife go, "Why are you leaving? Why are you going to Mexico City? You don't need to.” I go, "Well one I'm going, I want to be involved in helping these people. I gotta go out and do something, I know I can still do something, I need a job. I need a job, I need a real job.” Raising goats and sheep is fine and it was common people and stuff, but I'm a busy body and I need to do something.Ben: And then I became aware of New Comienzos and when I seen that, that's what I want to do. I want to go down there, I want to be involved in that. I want to be involved in that because that's something that I know I can help and contribute to. And at the same time, I can get me a job down there and I'll stay put. I'll come back and visit every now and then, but I'm a city person [Laughs].Anne: Yeah. So, did you fight the detention or no?Ben: No. When my first, I was detained when I was 19—well no, I got in trouble when I was 19, detained at 27. That time, I signed away, I didn't fight it. So, this time, I had no rights. I could not fight anymore because I'd already signed away. This time around, I probably would've fought it, because I had the money this time. Even if I knew I was going to lose, at least I knew I had the money for the bond and I could put it off two, three, four years. But, the first time I didn't have the money. So, I said, “Sit here two years and wait and then probably get deported? No.” Unfortunately, this time, I just, there was no rights that I could—Anne: And have your kids or your wife been to visit you?Ben: Yes, they have up there. Hopefully once I get settled here. My wife was supposed to come here in May, like around my birthday, which was the week before last. But when my son got this scholarship, well he said, "We gotta go,” so her and my daughter both drove him down to Orlando and they went to Disney, like we used to always go to Disney World. We would go at least twice a year. There was one year that I had two projects that ran over a year down there and I bought them season passes, because it was easier for them to fly down on the weekend and come see me. And when they come down, if you buy three individual park tickets, it's more expensive then the season pass.Anne: Yeah.Ben: But they're still keeping up the traditions [Laughs]. They're still going to Disney.Anne: And you spent a lot of time volunteering while you were in the states.Ben: Yes.Anne: So, it seems like, does that make it a good fit to try it here?Ben: Oh yes. Yes, it's voluntary here, it's a different theme here. It's a stronger, I feel it's a stronger theme. Not that my volunteer work back over there wasn't, but my volunteer… Like helping out at the school whenever I was in town, I would let them know that I would be in town and I was available to substitute if one of the teachers needed a break or was going to be missing. And I was qualified to take the classes on.Ben: But I also was a volunteer English teacher when they started, they started a Spanish church. When that Spanish church started, it was actually my father that was the preacher. My father was at another church, but when they wanted to do that, I talked to my father to see if he would, because they asked me to, but I was honest, I go, "You know I'm not that knowledgeable of the Bible, to be able to. I don't want to stumble over myself.” And you know when people are barely getting into a church and you say one thing but then you contradict yourself, you're going to destroy their faith.Anne: Don't want to do that.Ben: No. And I did a lot of volunteer work there at the church and the school. It was great. And they've been right by my family's side, they're still going to church there and anytime that they need anything, they're right there. But good thing …. they've been fine. My wife, she's got a pretty good job. She worked for a mortgage company, so she does pretty well. And my daughter helps out too now that she's making money. It's been a long ride. [Laughs].Anne: So, we hear a lot of stories about young men who come over as babies or toddlers and then for some reason get caught up in gangs or crime. What was different for you? Why do you think that never happened?Ben: Well, I can tell you that I think, probably the single most important thing, the most important thing in a person's life is environment. Parenting is important, but you can have the best parents in the world, but if you have them in a bad environment, your parenting is not going to supersede the environment. And that's one of the things that I focus with my wife is that—well my parents, they provided a good environment. And when I got married from my life experiences, I stepped that up a bit. I told a lot of other relatives, this is one thing I've told a lot of other relatives, this happens a lot in America—not just with Mexicans or Central Americans, Blacks or whatever—is a lot of people yell out racism or discrimination.Ben: And I sincerely believe that sometimes we discriminate ourselves, that we put it on ourselves, because we teach that to our children, because weekends we all want to go get together with other relatives, other friends of our own ethnicity. And that's not really what America's about and that's not what I taught my children because that's not how I lived my life. I was out with everybody, congregating with everybody, and that's the environment that we brought our children up in. We brought them up in their church—I was talking to you earlier, our church and the school that they went to was part of the church. We were the only Hispanics.Ben: But that doesn't mean that we didn't allow them or try to get them to forget who they were. We didn't, because we brought them around our relatives, but we let them see that environment and so that they felt comfortable. So, when they got out into the world, they're comfortable around anybody and they're not looking at colors or whatever. And they don't feel like they're different and they don't feel different. I honestly, I think I felt more different when I got back here [Laughs].Anne: Right.Ben: Because it was really kind of weird. But over there I didn't, but I think environment is one of the most important things. If you put a good person in a bad situation, in a bad environment, sooner or later he'll break. If you get a bad person that's never known what life is really supposed to be about, guide him a little bit and give him a little time, and if he's willing—Anne: It might work out.Ben: Yeah, it might work out.Anne: Interesting. So, you achieved your dreams in America.Ben: Oh yeah.Anne: Do you have dreams now for yourself here?Ben: Yeah. My dream here is, one, to help here and I can't say it's a goal that's going to be met. And the other is I'm going to have here what I had over there and I'm confident that I can make that happen.Anne: And will you make it through construction business, or will you make it through…?Ben: Right now, I think that there's other areas here that I could probably succeed in without jumping into the construction business. We have land back here (in the family home) and a buy little bit of cattle, make some money here. There’s just several different ideas. But I know that I can excel in a job here, because there's several people here that are making some pretty high incomes and just, some pretty much as telemarketers, but just there's some call centers with some good bonuses. You're not going to get rich there, but you can make a good living.Anne: Right.Ben: But there's some opportunities right now.

      Return to Mexico, Jobs, Community, Opportunity, Family Relationships, Feelings, Dreams; Reflections, Mexico, The United States

    1. psychology may be defined rigidly so as to include only a scientific description of mind, of mental activity, or of mental products

      This does not seem to be such a rigid definition to me. I think we use psychology in combination with closely related subjects, such as sociology, and it can become easy to mix the two. I think "a scientific description of the mind, of mental activity, or of mental products" seems like a reasonable definition for psychology.

    1. Peer Reviewed and recommended by Peer Community in Evolutionary Biology

      Recommendation<br> Separating adaptation from drift: A cautionary tale from a self-fertilizing plant<br> by Christoph Haag based on reviews by Jon Agren, Pierre Olivier Cheptou and Stefan Laurent.

      In recent years many studies have documented shifts in phenology in response to climate change, be it in arrival times in migrating birds, budset in trees, adult emergence in butterflies, or flowering time in annual plants (Coen et al. 2018; Piao et al. 2019). While these changes are, in part, explained by phenotypic plasticity, more and more studies find that they involve also genetic changes, that is, they involve evolutionary change (e.g., Metz et al. 2020). Yet, evolutionary change may occur through genetic drift as well as selection. Therefore, in order to demonstrate adaptive evolutionary change in response to climate change, drift has to be excluded as an alternative explanation (Hansen et al. 2012). A new study by Gay et al. (2021) shows just how difficult this can be.

      The authors investigated a recent evolutionary shift in flowering time by in a population an annual plant that reproduces predominantly by self-fertilization. The population has recently been subjected to increased temperatures and reduced rainfalls both of which are believed to select for earlier flowering times. They used a “resurrection” approach (Orsini et al. 2013; Weider et al. 2018): Genotypes from the past (resurrected from seeds) were compared alongside more recent genotypes (from more recently collected seeds) under identical conditions in the greenhouse. Using an experimental design that replicated genotypes, eliminated maternal effects, and controlled for microenvironmental variation, they found said genetic change in flowering times: Genotypes obtained from recently collected seeds flowered significantly (about 2 days) earlier than those obtained 22 generations before. However, neutral markers (microsatellites) also showed strong changes in allele frequencies across the 22 generations, suggesting that effective population size, Ne, was low (i.e., genetic drift was strong), which is typical for highly self-fertilizing populations. In addition, several multilocus genotypes were present at high frequencies and persisted over the 22 generations, almost as in clonal populations (e.g., Schaffner et al. 2019). The challenge was thus to evaluate whether the observed evolutionary change was the result of an adaptive response to selection or may be explained by drift alone.

      Here, Gay et al. (2021) took a particularly careful and thorough approach. First, they carried out a selection gradient analysis, finding that earlier-flowering plants produced more seeds than later-flowering plants. This suggests that, under greenhouse conditions, there was indeed selection for earlier flowering times. Second, investigating other populations from the same region (all populations are located on the Mediterranean island of Corsica, France), they found that a concurrent shift to earlier flowering times occurred also in these populations. Under the hypothesis that the populations can be regarded as independent replicates of the evolutionary process, the observation of concurrent shifts rules out genetic drift (under drift, the direction of change is expected to be random).

      The study may well have stopped here, concluding that there is good evidence for an adaptive response to selection for earlier flowering times in these self-fertilizing plants, at least under the hypothesis that selection gradients estimated in the greenhouse are relevant to field conditions. However, the authors went one step further. They used the change in the frequencies of the multilocus genotypes across the 22 generations as an estimate of realized fitness in the field and compared them to the phenotypic assays from the greenhouse. The results showed a tendency for high-fitness genotypes (positive frequency changes) to flower earlier and to produce more seeds than low-fitness genotypes. However, a simulation model showed that the observed correlations could be explained by drift alone, as long as Ne is lower than ca. 150 individuals. The findings were thus consistent with an adaptive evolutionary change in response to selection, but drift could only be excluded as the sole explanation if the effective population size was large enough.

      The study did provide two estimates of Ne (19 and 136 individuals, based on individual microsatellite loci or multilocus genotypes, respectively), but both are problematic. First, frequency changes over time may be influenced by the presence of a seed bank or by immigration from a genetically dissimilar population, which may lead to an underestimation of Ne (Wang and Whitlock 2003). Indeed, the low effective size inferred from the allele frequency changes at microsatellite loci appears to be inconsistent with levels of genetic diversity found in the population. Moreover, high self-fertilization reduces effective recombination and therefore leads to non-independence among loci. This lowers the precision of the Ne estimates (due to a higher sampling variance) and may also violate the assumption of neutrality due to the possibility of selection (e.g., due to inbreeding depression) at linked loci, which may be anywhere in the genome in case of high degrees of self-fertilization.

      There is thus no definite answer to the question of whether or not the observed changes in flowering time in this population were driven by selection. The study sets high standards for other, similar ones, in terms of thoroughness of the analyses and care in interpreting the findings. It also serves as a very instructive reminder to carefully check the assumptions when estimating neutral expectations, especially when working on species with complicated demographies or non-standard life cycles. Indeed the issues encountered here, in particular the difficulty of establishing neutral expectations in species with low effective recombination, may apply to many other species, including partially or fully asexual ones (Hartfield 2016). Furthermore, they may not be limited to estimating Ne but may also apply, for instance, to the establishment of neutral baselines for outlier analyses in genome scans (see e.g, Orsini et al. 2012).

      References

      Cohen JM, Lajeunesse MJ, Rohr JR (2018) A global synthesis of animal phenological responses to climate change. Nature Climate Change, 8, 224–228. https://doi.org/10.1038/s41558-018-0067-3

      Gay L, Dhinaut J, Jullien M, Vitalis R, Navascués M, Ranwez V, Ronfort J (2021) Evolution of flowering time in a selfing annual plant: Roles of adaptation and genetic drift. bioRxiv, 2020.08.21.261230, ver. 4 recommended and peer-reviewed by Peer Community in Evolutionary Biology. https://doi.org/10.1101/2020.08.21.261230

      Hansen MM, Olivieri I, Waller DM, Nielsen EE (2012) Monitoring adaptive genetic responses to environmental change. Molecular Ecology, 21, 1311–1329. https://doi.org/10.1111/j.1365-294X.2011.05463.xISTEX

      Hartfield M (2016) Evolutionary genetic consequences of facultative sex and outcrossing. Journal of Evolutionary Biology, 29, 5–22. https://doi.org/10.1111/jeb.12770

      Metz J, Lampei C, Bäumler L, Bocherens H, Dittberner H, Henneberg L, Meaux J de, Tielbörger K (2020) Rapid adaptive evolution to drought in a subset of plant traits in a large-scale climate change experiment. Ecology Letters, 23, 1643–1653. https://doi.org/10.1111/ele.13596

      Orsini L, Schwenk K, De Meester L, Colbourne JK, Pfrender ME, Weider LJ (2013) The evolutionary time machine: using dormant propagules to forecast how populations can adapt to changing environments. Trends in Ecology & Evolution, 28, 274–282. https://doi.org/10.1016/j.tree.2013.01.009

      Orsini L, Spanier KI, Meester LD (2012) Genomic signature of natural and anthropogenic stress in wild populations of the waterflea Daphnia magna: validation in space, time and experimental evolution. Molecular Ecology, 21, 2160–2175. https://doi.org/10.1111/j.1365-294X.2011.05429.xISTEX

      Piao S, Liu Q, Chen A, Janssens IA, Fu Y, Dai J, Liu L, Lian X, Shen M, Zhu X (2019) Plant phenology and global climate change: Current progresses and challenges. Global Change Biology, 25, 1922–1940. https://doi.org/10.1111/gcb.14619

      Schaffner LR, Govaert L, De Meester L, Ellner SP, Fairchild E, Miner BE, Rudstam LG, Spaak P, Hairston NG (2019) Consumer-resource dynamics is an eco-evolutionary process in a natural plankton community. Nature Ecology & Evolution, 3, 1351–1358. https://doi.org/10.1038/s41559-019-0960-9

      Wang J, Whitlock MC (2003) Estimating Effective Population Size and Migration Rates From Genetic Samples Over Space and Time. Genetics, 163, 429–446. PMID: 12586728

      Weider LJ, Jeyasingh PD, Frisch D (2018) Evolutionary aspects of resurrection ecology: Progress, scope, and applications—An overview. Evolutionary Applications, 11, 3–10. https://doi.org/10.1111/eva.12563

      Reviews.

      Revision round #2.<br> 2021-04-19.<br> Author's Reply.<br> Download author's reply (PDF file) Download tracked changes file.

      Dear Dr Haag,

      Thanks for handling the review of our manuscript. We agree that the comments of Jon Agren have further improved the quality of this manuscript and we tried to answer to all of them (see the point-by-point reply below). We provide a track-changes version where the changes in the main text and supplementary files are highlighted in bold. The new version is also available online on Biorxiv : https://www.biorxiv.org/content/10.1101/2020.08.21.261230v3.

      We hope that you will find this updated version of our manuscript suitable for recommendation by PCIEvolBiol and would be happy to take any further comments if you judge it would improve the manuscript.

      Laurène Gay, on behalf of all the coauthors

      Decision round #2.

      Dear Dr Gay,

      Your revised preprint "Evolution of flowering time in a selfing annual plant: Roles of adaptation and genetic drift" has now been reconsidered by two of the original reviewers. As you will see, while one of them is satisfied with the new version, the other is positive but recommends an additional round of minor revision. From my own reading, I agree that the suggestions by the reviewer will likely further strengthen the manuscript. Therefore, before reaching a final decision, I would like to ask you to consider these suggestions, and to revise the manuscript accordingly. When you submit the revised version, please include a letter in which you describe how you have responded to each of the referees comments.

      Best wishes, and many thanks for submitting to PCI Evol Biol,

      Christoph Haag

      Preprint DOI: https://www.biorxiv.org/content/10.1101/2020.08.21.261230v2

      Reviewed by Jon Agren, 2021-04-16 11:54.<br> I think the presentation has benefitted from the revisions made by the authors. Below is a list of comments on details regarding terminology and presentation that the authors may want to consider.

      p. 1, Abstract first sentence. Resurrection experiments can detect correlations between trait modifications and changes in the environment, but this is not really a test of a causation, is it? Or is the argument here that simultaneous parallel changes in many populations indicate a change in the environment acting over a large area? This could be indicated with a slight rewording.

      p. 1, Abstract first sentence. Change “traits modifications” to “trait modifications”.

      p. 2, first paragraph. Not fully clear what the important difference is between experimental and natural populations. In both cases, an estimate of effective population size is required.

      p. 2, right column, line 3. What does “>0.5” refer to? A broad-sense heritability estimate?

      p. 2, right column, line 27. Insert “selfing” after “predominantly”.

      p. 2, right column, line 36, “across 22 generations”. Does this species have any seed bank that may affect “effective generation time”?

      p. 2, right column, line 47, “taking into account the multilocus genotypic composition…”. Unclear how this should be understood. Reword?

      p. 2, right column, line 50, “for neutrality”. I suggest the authors indicate how this is achieved. – By using estimates of genotypic values for flowering time and assuming flowering time is a neutral trait?

      p. 3, first paragraph. I still find the procedure for building “families of full sibs” unclear: I suggest the authors state explicitly whether the families multiplied in 2011 each originated from a different pod collected in the field, or whether the families originated from seeds that had been randomly selected from pooled samples of seeds from 1987 and 2009, respectively.

      p. 3, right column, paragraph “Temporal changes in sensitivity to vernalization”, “measured as the slope…” This needs some more explanation. Are differences calculated between all possible pairs of plants in the two treatments?

      p. 4, third paragraph, “good approximation of the additive genetic covariance”. What about maternal environmental and genetic effects?

      p. 4, right column, first paragraph. State explicitly that the individuals analysed represented 145 different families?

      p. 5, second paragraph, “As a preliminary step,…”. To me the argument would make more sense in the reverse order, as the changes in flowering time and MLG frequency between 1987 and 2009 are the most direct estimates of evolutionary change. In other words, starting from the observation of the changes in flowering time and MLG frequency, one can examine the strength of the association between flowering time and MLG in the greenhouse, and whether the change is consistent with selection observed in the greenhouse. I see no a priori reason why selection on flowering time in the greenhouse should mirror that at the site of the focal population. To make this order of logic clear, the authors may want to move the description of the selection gradient analyses to after this argument has been formulated.

      p. 5, second paragraph, “whether selection in quantified in the greenhouse is likely to mirror selection in the field at present and 22 years ago”. To be strict , it would only need to mirror the predominant selection between 1987 to 2009 to be correlated with the change observed, right? Current selection in the field should matter little?

      p. 5, second paragraph, “We then measured…”. I like this approach! The authors should indicate which measure of flowering time was used in this analysis. The legend of Fig. 3 speaks about “average flowering time”. The sensitivity to vernalization treatment varied among genotypes. Are the results of this analysis essentially the same if the analysis is conducted separately for treatment 1 or 2, or separately for estimates of flowering time obtained based on the seed sample from 1987 and from 2009, respectively?

      p. 6, first paragraph; Table 3. Since a single line was sampled in each population, it is a bit misleading to call the examined effect a “population effect”. Change to “line effect”?

      p. 7, first paragraph, “predict an evolution of towards earlier flowering”. Since estimates of selection and heritabilities are specific to a given environment, this prediction is valid for the greenhouse and not necessarily for other environments.

      p. 7. Was there an effect of year of sampling on estimates of flowering time for MLGs sampled in both 1987 and 2009?

      p. 7, right column, second paragraph, “were persistent through time”. Change to “were observed in both years” to make the fact that altogether 5 lines were observed in both the 1987 and 2009 sampling more obvious?

      p. 7, right column, second paragraph, “Fig. 3A, regression only significant…”. Add sample size (i.e., number of family means included in this regression).

      p. 11, second paragraph, “Munguia-Rosas et al.”. Note that selection estimates considered in this meta-analysis largely ignores the effect of variation in number of flowers and plant size, suggesting that many of them rather reflect a correlation between plant condition and fitness.

      Finally, I suggest the authors somewhere add a caveat regarding possible G x E interactions for flowering time (greenhouse vs. field), when discussing the possible association between flowering time as expressed in the greenhouse and fitness and evolutionary change in the field.

      Reviewed by Stefan Laurent, 2021-03-20 17:00.

      I am satisfied with the answers to my comments and with the modifications to the main text. The qqplots should be added to the supplementary figures linked to main figure 3.

      Revision round #1.<br> 2020-10-26.

      Author's Reply.<br> Download author's reply (PDF file).<br> Download tracked changes file.

      Dear Dr Haag, Please find enclosed a revised version of our manuscript. We are very grateful to you and the reviewers for the comments and suggestions that have improved the manuscript substantially. We tried to answer to all of them (see the point-by-point reply below). We provide a track-changes version with line numbers, where the changes in the main text and supplementary files are highlighted in bold. We also added a revised version that you can find after the track-changes (starting page 19). We hope that you will find this updated version of our manuscript suitable for recommendation by PCIEvolBiol and would be happy to take any further comments if you judge it would improve the manuscript.<br> Laurène Gay, on behalf of all the coauthors.

      Decision round #1.<br> Dear Dr Gay, Thank you for submitting your preprint "Evolution of flowering time in a selfing annual plant: Roles of adaptation and genetic drift" to PCI Evol Biol. Your work has now been considered by three reviewers, whose comments are enclosed. As you will see, the reviews are largely positive, and, based on these reviews as well as my own reading, I am happy to further consider your preprint for recommendation. However, before reaching a final decision, I would like you to revise your manuscript according to the recommendations by the reviewers. Besides the more minor points (which also should be considered carefully), I think there are two main issues that need particular attention:

      • First, the introduction (and perhaps also some other sections) would profit from some streamlining. In my opinion, this does not mean that you should entirely drop the discussion of the effects of selfing on the efficacy of selection. But this section should be reduced in length and care should be taken to clearly state the objective of the study early on without raising issues (e.g., comparison between selfers and outcrossers) that are not subsequently addressed. Incidentally, from my own reading, I also think that the last part of page 1 (where you give some more detail on the different possible approaches to investigate the influence of selection on phenotypic change) would profit from some reformulation: I found this part difficult to follow and its purpose is not entirely clear to me: Do you want to provide details on some of the approaches or do you want to explain why you used only some bot not others in your study? Moreover, the statement that natural populations cannot be replicated may also need to be nuanced (replication might in principle be possible across different populations or using independent samples from the same population).
      • Second, the analysis of the frequency changes of the multilocus genotypes needs some clarification, both in terms of potential effects of excluding rare genotypes and in terms of confidence intervals given (likely) non-normal distribution of residuals. If you submit a revised version, please include a letter in which you describe how you have responded to each of the referees’ comments. Best withes, and apologies again for the delayed decision, Christoph Haag

      Additional requirements of the managing board:<br> As indicated in the 'How does it work?’ section and in the code of conduct, please make sure that: -Data are available to readers, either in the text or through an open data repository such as Zenodo (free), Dryad or some other institutional repository. Data must be reusable, thus metadata or accompanying text must carefully describe the data. -Details on quantitative analyses (e.g., data treatment and statistical scripts in R, bioinformatic pipeline scripts, etc.) and details concerning simulations (scripts, codes) are available to readers in the text, as appendices, or through an open data repository, such as Zenodo, Dryad or some other institutional repository. The scripts or codes must be carefully described so that they can be reused. -Details on experimental procedures are available to readers in the text or as appendices. -Authors have no financial conflict of interest relating to the article. The article must contain a "Conflict of interest disclosure" paragraph before the reference section containing this sentence: "The authors of this preprint declare that they have no financial conflict of interest with the content of this article." If appropriate, this disclosure may be completed by a sentence indicating that some of the authors are PCI recommenders: “XXX is one of the PCI XXX recommenders.”

      Preprint DOI: 10.1101/2020.08.21.261230

      Reviewed by Pierre Olivier Cheptou, 2020-10-20 11:18.<br> The study by Gay et al. reports empirical data on the evolution of flowering time in a highly selfing species: Medicago truncatula. The authors used several approach to investigate the question. In particular, they used a resurrection approach with seeds from 1987 and 2009. The aim of the study is to disentangle the role of drift and selection in the shift observed as well as estimating selection gradient of flowering time. The study is interesting and the different experiments (pop centered, regional) is consistent with a shift in flowering time. Below, my comments:

      1-The introduction discuss the question of adaptation face to environmental change. While the text is rich and well referenced, I found that the introduction is a bit long. There is a long discussion on whether outcrossing/selfing traits influences adaptation. The logical consequence would be to compare outcrossing/selfing populations. Since the study does not compare outcrossing and selfing populations, I think this part should be greatly reduced. Also, the statement that bottlenecks are more frequent in selfers (if true !!) would be more striking if the references were reporting empirical data. To my knowledge, Schoen and Brown (1991) and Ingvargsson 2002 hypothesize that it is the case but did not demonstrated that selfers suffer from higher bottlenecks. In the following paragraph, I found confusing to assert that “self-fertilization mays have facilitated adaptations to agricultural practices” when discussing the role of mating system on adaptation. Is it because the traits were preadapted or because the genetic architecture of selfers facilitates adaptation? In short, the introduction should be more focused to introduce the question short term adaptation of flowering time in the face of warming.

      2-Sum of temperature. The individual flowering time is converted in sum of temperature. The basal temperature is assumed to be 5°C, based on Moreau et al (2007). Would it be possible that Tb has evolved during the two decades? Would the conclusions be different if flowering time were measured as the number of days? At least, the possibility of a shift in Tb should be discussed as I found contradictory to evaluate adaptation to warming but keeping Tb constant.

      3-Maternal effects. If I understood well, the results on the studied populations are corrected for maternal effects (one generation to refresh seeds stock) but the results of regional analysis are based on the F1 generation (without correcting for maternal effects). I was interested by the amplitude of the shift: two days in the cape Corsica populations but five days in the regional analysis. This may be a “true result” or an effect of correcting for maternal effects. Did the authors measure the flowering date in the F1 of the cape Corsica populations. I would suggest to mention this result in the discussion. Is it possible that the difference in flowering date reported have changed in Cape Corsica population because of the F1 generation in greenhouse? My feeling is that these results are, as such, interesting. We often see this pattern of a lower amplitude after one generation. If it was only noise, the first generation should exhibit either lower or higher difference than the F2. Epigenetic components of flowering could have played a role in adaptation to warming and these effect cannot be distinguished from true quantitative genetic effects if parts of these effects last more than one generation. Do the same MLG (from 1987 and 2009) have the same fitness? Because the authors have the chance to have the same MLG, it would be interesting to look at this relationship to investigate maternal effects.

      4-Genetic analysis. If I understood well, the test for selection versus drift is based only on conserved multilocus genotypes, i.e. a fraction of the population. Why doing this choice? Why not using a Qst/Fst approach that would take into account all the individuals? (the design allows to estimate Qst, doesn’t it?). In addition, I see a potential bias because it assumes that the population behaves as a fully selfing populations, which is not the case. While the authors point the potential differential selective response of outcrossers versus selfers, the results reported are based only on the full selfing fraction of the population, which I found contradictory.

      Overall, I found the ms interesting and such long term dataset is rare. However, the ms would benefit from being more focused (particularly the introduction) in order to highlight the results and their biological interpretation.

      Reviewed by Jon Agren, 2020-10-19 15:12.<br> This study uses a resurrection experiment and simulations to explore the possible causes of changes in flowering time and genetic composition of a Medicago truncatula population across 22 generations. In the resurrection experiment, plants grown from seeds collected 22 years apart were raised in the greenhouse to produce selfed lines. These lines were then used to document possible changes in flowering time and to quantify selection on flowering time in the greenhouse. Changes in genetic composition were characterized by scoring 20 microsatellite loci (16 kept after filtering) and documenting changes in the frequencies of multilocus genotypes. The paper is well written and addresses interesting problems of wide general interest. However, I think the authors need to (a) motivate their approach to use estimates of selection obtained in the greenhouse to infer selection in the field, (b) provide more detail on the distribution of multi-locus genotypes and the power of their analysis of change in genetic composition, and (c) clarify a few details when it comes to sampling procedure (see below).

      Main comments:

      The authors appear to assume that selection quantified in the greenhouse is likely to mirror selection in the field at present and 22 years ago. This needs to be motivated.

      I suggest the authors provide more detail on the distribution multilocus genotype (MLG) frequencies, and that this information is given already at the start of the third paragraph on p. 7. They report that 60 different MLGs were detected in their sample of 145 individuals. Two MLGs were common, and 12 MLGs were shared between the two sampling years. This suggests that most MLGs were rare and perhaps only represented by a single plant? The authors may want to discuss whether their sample sizes are sufficient to characterize changes in genetic composition of a population with such skewed distributions of MLGs.

      I suggest the authors clarify a few details regarding sampling:

      (a) For the resurrection experiment, “100 seeds per sampling were replicated” (p. 3, second paragraph). Were these seeds from 100 different pods and thus sampled from 100 plants, or were they a random sample of 100 seeds from a pooled seed sample from each year?

      (b) For the genetic analysis, leaves were sampled from “the multiplication generation in the greenhouse” (p. 4, fifth paragraph), and after filtering 145 individuals remained in the data set to be analysed. Please, state explicitly that the “multiplication generation” refers to the plants derived from the 200 field-collected seeds (presumably representing seeds from 200 plants(?); see previous comment). Were seeds from the two sampling occasions equally represented among the 145 individuals included in the analysis?

      Minor corrections:

      Abstract, line 11 from bottom. Change “population” to “populations”

      p. 7, first paragraph, second line from bottom, “in both years”. From this wording, you easily get the impression that selection was quantified in two years. I suggest you add a few words to indicate that this rather refers to a similar negative relationship being observed among lines derived from each of the two years.

      To make text in graphs readable, font size should be increased in Figures (in particular in Figs. 3-5).

      Reviewed by Stefan Laurent, 2020-10-16 11:05.<br> In this study, the authors test whether flowering time evolved in an experimental population of Medicago truncatula and whether this change could represent an adaptation to varying environmental conditions. For this, they measure changes in flowering time in a natural population over 22 generations (2 timepoints), they quantify the association between flowering time and fitness (as approximated by the number of seeds produced), they track changes in haplotype frequencies characterized by different approximated fitness values, and finally they also measure changes in flowering time in 17 populations from the same geographical region that have been sampled twice over a comparable time range.

      The authors report a significant reduction in flowering in the main population and in the regional analysis that appears to be consistent with the specific effects of climate change in the Mediterranean region (i.e. limiting summer drought occurs earlier in the year). They also report a significant association between flowering time and seed production. However, the evidence for the effect of positive selection obtained by analyzing the changes in haplotypes is at best marginal; even if the authors do a good job in describing some of the uncertainty associated with this analysis, I think that one more aspect should be exposed.

      Besides my major comment, I find the manuscript clearly written, the analyses carefully conducted and presented, and the intro and discussion very well written and informative, at least for the non-expert.

      Major comment

      My only major criticism refers to the results presented in Figure 3. The selection gradients measured here seem to be heavily influenced by two outlier points with low seed production and early flowering. As a result, the linear models (especially the one for MLG found in 1987) appear to be a poor fit to the data, as can probably be seen by inspecting the residuals, which are unlikely to be normally distributed. I think that the authors should report the uncertainty around the slopes and that this uncertainty should be further considered in the analyses presented in figure 4, which will likely cause the observed selection gradient to be non-significant under a larger range of Ne values. I am not sure about the best way to obtain confidence intervals for the selection gradients but I imagine that a bootstrap approach should be applicable.

      Minor

      I agree with the authors that the N_e value estimated from the temporal Fst is very likely underestimated. Comparing the expect heterozygosity under Ne=19 with the observed He would further support the idea that larger Ne values are indeed realistic. How does the observed heterozygosity in the population compares to the theoretical expectations given by Nordborg and Donnelly (1997)? Rescaling the census number (>2000) by 1/(1+F) would lead to a less conservative Ne value for the test for selection and may allow a putative selection signal to be detected even after considering the uncertainty around the observed selection gradient.

    1. Author Response

      We are grateful for the thorough and thoughtful comments provided by the reviewers, and we appreciate their support for the design and implications of this study. We have addressed the major points raised by the reviewers as follows.

      Major Concerns:

      1) Limitations of extrapolation to human health and disease.

      From Reviewer 2: Though I found the work largely beyond critique technically, I would have appreciated additional discussion of the limitations of the use of a captive non-human primate to model human dietary response.

      From Reviewer 3: However, my major concern is the suitability of these results to explain human relevance and how far they can address the actual evolutionary significance. I think they should tone down a little. For example, is there really any strong reason to assume that macaques will mimic dietary responses in humans? I appreciate the fundamental importance of macaque-specific responses, but I am unclear how captive primates can model human effects─ how do authors factor their (obvious?) fundamental differences between different immune response profiles activated against similar cues and standing microbiome, warranting divergent interactions with the said dietary manipulations. I think these are caveats that need to be carefully discussed to avoid building over expectations among readers.

      From Reviewer 3: Could there be more discussion on the relevance of differentially expressed macaque genes in humans?

      We appreciate the concern regarding possible overinterpretation of results. There is an extensive body of literature demonstrating the utility of the cynomolgus macaque model to explore influences of diet on numerous phenotypes including atherosclerosis and cardiovascular disease, bone metabolism, breast and uterine biology, and other phenotypes (Adams et al., 1997; Clarkson et al., 2004, 2013; Cline et al., 2001; Cline & Wood, 2006; Haberthur et al., 2010; Lees et al., 1998; Mikkola et al., 2004; Mikkola & Clarkson, 2006; Naftolin et al., 2004; Nagpal, Shively, et al., 2018; Nagpal, Wang, et al., 2018; Register, 2009; Register et al., 2003; Shively & Clarkson, 2009; Sophonsritsuk et al., 2013; Walker et al., 2008; Wood et al., 2007). The cynomolgus model was remarkably accurate in predicting effects of hormone therapies on both cardiovascular disease and breast cancer later demonstrated in the very large Women’s Health Initiative (Adams et al., 1997; Clarkson et al., 2013; Naftolin et al., 2004; Shively & Clarkson, 2009; Wood et al., 2007). Cynomolgus macaque responses to other therapies (tamoxifen, selective estrogen receptor modulators, blood pressure medications, etc.) also have shown great similarities to those in humans (Cline et al., 2001). We have added additional text to the Abstract (lines 51-52), Introduction (lines 136-141), and Discussion (lines 531-542) to situate the current work in the extensive literature that uses cynomolgus macaques as a model to understand human health. We have also included discussion regarding the limitations of extrapolating these results to humans in lines 543-545 of the Discussion

      We also tested the overlap of differential gene expression induced by the Western diet with genes implicated in human complex traits (Zhang et al., 2020). Genes implicated in numerous traits associated with cardiometabolic health were enriched in Western genes, while no traits were enriched in Mediterranean genes. We describe these findings in lines 206-215 of the Results section and in Figure 1—figure supplement 1, which depicts traits relevant to human health and disease identified by previous groups where gene expression profiles overlapped with the “Western genes” in the current study. Lines 668-672 of the Materials and Methods detail the statistical approach used.

      2) Limitations of this experimental design to test the evolutionary mismatch hypothesis.

      From Reviewer 2: My worry is that macaques are so ill-adapted to the Western human diet that the behavioral and inflammation differences seen are explained by this macaque-Western diet mismatch, which dwarfs the human-Western diet mismatch that likely nonetheless exists. This concern can be partially mitigated by careful discussion of this study limitation.

      From Reviewer 2: One critique of dietary interventions that attempt to correct the evolutionary mismatch (which would be useful to address when discussing human-macaque differences) is that human evolution continuing to the present day has been marked by putative selection regime changes associated with multiple major dietary shifts, including meat eating and those arising from cooking and domestication of plants and animals. Such selection may have differentiated humans from macaques in key ways that influence macaque suitability as a dietary model.

      From Reviewer 2: My recommendations for strengthening the work are minor, besides those outlined above to include caveats concerning the differences between macaques and humans that will hopefully prevent lay readers from over-interpreting the results. Specifically, species-level differences which warrant mention include gross differences in "natural" diet between the species, as well as known recent selection on diet-related genes in humans (reviewed in, e.g., Luca et al. 2010; doi:10.1146/annurev-nutr-080508-141048) and gut microbiome differences between the species (e.g., Chen et al. 2018; doi:10.1038/s41598-018-33950-6).

      From Reviewer 2: A simple analysis that begins to address this point analytically would be to compare what results exist for humans (e.g., Camargo et al, 2012; doi:10.1017/S0007114511005812) to those of your study.

      From Reviewer 2: Additionally, one could check whether the DE genes you identify are known to be selected in humans.

      We appreciate the suggestion to strengthen our discussion of the macaque model of human health. As with early hunter-gatherer humans, macaques are omnivorous in the wild, eating a variety of plants and animals. In addition, the cynomolgus macaque often co-exists with human populations, and in that respect may have co-evolved in many ways. Furthermore, cynomolgus macaques have been used in studies of dietary influences on chronic prevalent human disease for 50 years (Malinow et al., 1972), and nearly 700 papers in a Pubmed literature search support the idea that cynomolgus responses to diet are remarkably similar to those of humans in all systems studied. Some of these studies are identified above. With respect to the microbiome, previous work by others has demonstrated that the gut microbiome of omnivorous nonhuman primates is similar to that of humans living a modern lifestyle (Ley et al., 2008), and we previously reported similarities in patterns of microbiome responses to Mediterranean vs. Western diets between humans and NHPs in the present study (Nagpal, Shively, et al., 2018). We have added discussion of the above and note limitations of extrapolation to humans due to species-level differences in natural diets and the role that selection may plan in responses of humans to Western or Mediterranean dietary patterns (lines 543-545). Similarities between humans in DE genes are noted in responses above. In addition, we already had noted that our studies complement and extend the findings of Camargo (line 399), and we added more detail that we found similar effects of diet on expression of IL6 and NF-kB pathway members (line 397).

      3) Lack of control group maintained on a standard chow diet.

      From Reviewer 2: In future studies, it would be useful to have samples from proper control monkeys fed a standard primate diet.

      From Reviewer 3: Also, this is slightly unfortunate because there is no full control treatment where macaques are maintained in their regular diet (i.e., standard monkey chow) and then compared with groups switched to the Mediterranean vs western diet to estimate the relative deviations from their expected physiological processes and behavioural traits.

      We appreciate the concern regarding the lack of a standard monkey chow diet control group. All monkeys ate chow during the baseline phase and were thoroughly phenotyped, exhibiting minimal differences in monocyte gene expression profiles between groups subsequently assigned to the two diets, which involved stratified randomization based on key baseline characteristics while consuming the same diet. Importantly, monkey chow is unlike any historic or current human or nonhuman primate diet as is apparent in Table 1. It is quite low in fat, and rich in soy protein and isoflavones, which are known to alter physiology and immune system function. Therefore, parallel assessments of health measures in monkeys consuming chow long term do not provide data relevant to diet effects on human health. We have added discussion of the strengths of the study (lines 136-141, 531-542), which was designed in order to be able to draw causal inference about the diet manipulation, and we acknowledge limitations to assess directionality of changes (i.e. which experimental diet is driving a particular observed difference) in lines 545-553.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): In this manuscript, using in vivo infection of Zebrafish embryos with Mycobacterium marinum and THP1-derived macrophages infected with Mycobacterium tuberculosis, the authors show that these pathogenic mycobacteria trigger an increase of K+ concentration through the expression of OXSR1. The ESX1 secretion system that is essential for the virulence of M. marinum is required for the expression of OXSR1 and SPAK. OXSR1 and SPAK are involved in the WNK signaling pathway and are cytoplasmic serine/threonine protein kinases that regulate the function of a series of sodium, potassium and chloride co-transporters via phosphorylation. Given that K+ efflux is now accepted as the main inducer of NLRP3 inflammasome, the authors report that this infection-induced OXSR1 expression restrains the protective NLRP3 inflammasome response leading to IL-1b maturation and secretion. Il-1b as a very potent pro-inflammatory triggers TNF-a production and the authors demonstrate that infection-induced OXSR1 expression suppressed host protective TNF-a and cell death early in fection. It appears therefore that virulent mycobacteria induce OXSR1 expression to reduce inflammasome activation by maintaining high intracellular K+. The results presented by the authors are convincing and the conclusions raised by the authors are well supported by the data. In zebrafish embryos, OXSR1 knockdown nicely reduces mycobacteria burden. Based on their conclusions that infection-induced OXSR1 expression reduces NLRP3 inflammasome activation, NLRP3 inflammasome activation has therefore a protective effect against bacterial infection. My main concern is that surprisingly, nlrp3 or il1b knockdown has no effect on bacterial burden in comparison to control embryos. Lane 256, as an explanation, the authors wrote "This may have been because we were using mosaic F0 CRISPR knockout, which is not a complete removal". The removal using mosaic F0 CRISPR knockout is nevertheless sufficient to observe a decrease in bacterial burden following OXSR1 knockdown. Would it be possible that OXSR1 also regulates immunity independently of NLRP3 inflammasome?

      Yes, we will add text to the discussion to address potential NLRP3-independent mechanisms that connect OXSR1 to immunity against mycobacterial infection.

      The lack of effect of il1b knockdown on M. marinum burden has been corroborated by independent laboratories including a publication from the Elks lab in Journal of Immunology: Ogryzko et al 2019. The Ogryzko study found no effect of il1b knockout on M. marinum burden.

      **Other comments:** OXSR1 WB in extended Data 3 is really poor quality so that it is hard to see the increased expression of OXSR1 following infection.

      The western blot will be repeated for cleaner images.

      Figure 2C. It is not shown but I guess that similar results should be obtain using M. tuberculosis.

      Material leaving our BSL3 facility must be decontaminated which makes this suggested analysis impossible in our facility.

      Figures 5D and 5E. To confirm the involvement of NLRP3, in addition of using MCC950, NLRP3 knock down using siRNA should be also performed. NLRP3-deficient THP-1 cells are also commercially available if the siRNA-mediated knock down of NLRP3 is not convincing enough.

      We will purchase NLRP3 deficient THP-1 cells and use our existing shRNA vector to create NLRP3 and OXSR1 deficient cells. We will repeat the experiments in 5D and 5E in these cells to confirm NLRP3 involvement.

      **Minor comments:** How do the authors think that mycobacterium induces OXSR1 expression following infection? It has not been investigated and it is not discussed.

      In Fig1A we showed upregulation of oxsr1a transcription and in Fig2A we showed upregulation of OXSR1 protein. In line 204 of the discussion we described our hypothesis that oxsr1a transcription is responsive to the mycobacterial ESX1 secretion system.*

      *

      Reviewer #1 (Significance (Required)): The observations reported in this manuscript are interesting since for the first time, it is described that virulent mycobacteria induce OXSR1 expression to reduce NLRP3 inflammasome activation by maintaining high intracellular K+. This is quite a significant advance in the field. To escape immune control, many successful intracellular pathogens have evolved methods to limit inflammasome activation. While it is known that potassium efflux is a trigger for inflammasome activation, the interaction between mycobacterial infection, potassium efflux and inflammasome activation was not explored. My field of expertise is the regulation of inflammasome activation. As far as I remember, I've never reviewed a paper using zebrafish embryos but here, the explanations and data are clear so that it was easy to understand and to evaluate. Likewise, I did not know the WNK signaling pathway but the literature clearly shows that it is involved in intracellular ionic balance.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): Hortle et al, in this study evaluated the role of WNK kinases SPAK and OXSR1 during M. marinum and M. tuberculosis infection. These two kinases inhibit the KCC channels which have a tendency to export potassium out of the cell. Since potassium efflux is a known stimulator of NLRP3 inflammasome activation, this raises the possible role of these kinases in inflammation and infection. Authors showed that inhibiting OXSR1 genetically and chemically reduced the mycobacterium survival in cells and zebrafish model, thus proposing OXSR1 as a host-directed therapeutic candidate. They showed that knockdown of OXSR1a leads to NLRP3 inflammasome mediated IL1B induction, which results in increase in TNFa and suppression of mycobacterium growth. Furthermore, reduction in mycobacterium growth in OXSR1a KD zebrafish embryos was found to be dependent on ESX1 machinery of Mycobacterium. The role of potassium in regulating Mycobacterium host response is novel. However there are few things which are missing from this interesting work. **Main comments**

      1. Since OXSR1 is known to inhibit KCC channels, which will lead to increase intracellular potassium. Why in infected control cells there is no increase potassium, Fig 2C. What would be the role of potassium in OXSR1 mediated control of Mtb growth?

      We will perform more experiments with altered levels of extracellular potassium to determine if infected control cells have increased intracellular potassium compared to OXSR1 knockdown cells.

      Does addition of extracellular potassium restricts mycobacterium in OXSR1-KD cells?

      We will perform additional experiments with the addition of potassium to the cell culture medium to address this concern.

      Since OXSR1 is known to inhibit KCC channels, What happens to the activity of these channels in OXSR1 KD cells? This is important, because authors could not find any difference in intracellular potassium between uninfected control and uninfected OXSR1 KD cells (Fig 2C). It will be good to add the flowcytometric histogram or dot plots of potassium staining in the main figure or in extended figures.

      We have data showing that although there is minimal difference in basal K+ level in OXSR1 KD cells, there is significantly lower K+ level when the cells are placed in High K+ media, or osmotic shock. We will include this data in the revised manuscript. We will amend the figures to include Flow plots.

      Acquisition of potassium stained cells - In methodology it has been mentioned that ion K+ Green stained undifferentiated THP1 cells were acquired using PE channel while differentiated THP1 cells were acquired using FITC channel. Furthermore in methods its mentioned that Leica Sp8 microscope was used to acquire images, however I do not see any of this data in the manuscript.

      Ion K+ green emits into both the PE and FITC channels. Our choice to use the FITC or PE channel depended on whether the cells were also infected with red fluorescent bacteria which “contaminates” the PE channel.

      Fig 2E and 3D - Meaning of "Normalized CFU/ml"? Each dot represents what? How many times this experiment was performed, please add in the legend.

      Normalized CFU/ml means that the CFU at 3 day post infection were normalized to the 0 day post infection intracellular bacterial burden, to adjust for any differences in phagocytosis of bacteria. Each dot represents the CFU from an infected well in a single representative experiment and the experiment was repeated 3 times. This information will be added to the figure legend.


      Fig 1D - What could be the reason of no statistical significant difference between wild type and homozygous oxsr1a-KO fish?

      This data is from two experimental replicates. We are currently growing more breeding fish to generate embryos for experimental replicates.

      Good to have a schematic model showing the finding s of the study

      We will add a schematic model to the manuscript.

      TNFa is double edge sword and can lead to pathology. Hence treatment of chronically infected animals (say mice) by Compound B, will be needed to confirm the HDT activity of OXSR1.

      Yes, we will add discussion of this point as a caveat to our future direction of using OXSR1 inhibition as a HDT.

      Reviewer #2 (Significance (Required)): This study showed role of kinases, which regulate trafficking of potassium, in mycobacterium-host interaction. Since kinases are draggable, so this opens a new area for developing host-directed therapies for TB. Reviewer #3 (Evidence, reproducibility and clarity (Required)): In this study, the authors suggest to have evidence for OXSR1 to inhibit NLRP3 inflammasome activation by limiting potassium efflux during mycobacterial infection. To my opinion, the study lacks important results supporting their main conclusions. In many instances, the authors have over-interpreted their data and I therefore do not support publication of this study. **Main comments:** Activation of the NLRP3 inflammasome upon OXSR1 knockdown was not convincingly demonstrated.

      We will address the activation state of the NLRP3 inflammasome with NLRP3 KO and OXSR1 KD cells as also suggested by reviewer 1: We will purchase NLRP3 deficient THP-1 cells and use our existing shRNA vector to create NLRP3 and OXSR1 deficient cells. We will repeat the experiments in 5D and 5E in these cells to confirm NLRP3 involvement.

      Clearance of bacteria in an organism, herein zebrafish, involves mechanisms in different cell types including downstream of inflammasome activation. Thus, bacterial clearance experiments in THP-1 cells might not necessarily be related to in vivo experiments in an organismal context. Finally, a mechanism as to how mycobacteria enhance OXSR1 expression to block a NLRP3-mediated response has not been addressed.

      We are not able to perform in depth analysis of the bacterial side of this host-pathogen interaction as my lab will close in the next 4 months. We have shown that transcriptional upregulation of oxsr1a is ESX1-dependent. We will include data on OXSR1 protein expression with WT and ESX1 mutant bacteria when we repeat the western blots in Extended data 3.

      **Specific comments:**

      1. The author showed that the M. marinum ESX1 secretion system induced OXSR1 expression to inhibit the NLRP3 inflammasome activation. This is contradictory to another recent study (PMID: 18852239), which showed that the ESX1 secretion system activated the NLRP3 inflammasome. These effects are not mutually exclusive. The ESX1 secretion system has a “deliberate” purpose in exporting mycobacterial effector proteins to subvert cellular immunity while also having an “accidental” role in exposing the host cell cytosol to vacluolar contents that can activate cellular immunity. We do not assert that mycobacteria completely inhibit all NLRP3 activation – rather that attempts to stop full activation via inducing the expression of host OXSR1. This can be seen in the IL-1b data in figure 3E, where infected WT cells release more IL-1b than MCC950 treated cells, but less than OXSR1 KD cells.

      In line 102, based on Data shown in Fig 1D, the authors concluded that homozygous, but not heterozygous, oxsr1asyd5 embryos showed reduced bacterial burden. However, in Fig 1D, the difference among the genotypes is not significant.

      This concern will be addressed with additional replicates.

      In line 196, the authors stated that "We present evidence that pathogenic mycobacteria increase macrophage K+ concentration by inducing expression of OXSR1." However, the authors did not provide evidence for this.

      We will soften this phrase in the discussion to replace “by inducing” with “and induce”.

      Based on Extended data 3, the authors concluded that infection increases the expression of OXSR1. However, this is not evidenced in the Western Blot. In addition, in panel B, the OXSR1 blot showed many non-specific bands with decreased intensity in OXSR1 knockdown conditions suggesting that there is unequal protein loading making it impossible to interpret these results.

      We will repeat the western blots as per Reviewer 1’s comment as well.

      The authors concluded that infection-induced OXSR1 expression suppressed inflammasome activity to aid mycobacterial infection. Experiments with Compound B, that inhibits OXSR1 phosphorylation, are used in support of the above conclusion. I do not really see a connection between OXSR1 expression and the inhibitor experiment.

      We will reword “expression” to “activity” in regards to the inhibitor experiment.

      In line 187, "Knockdown of tnfa reduced the amount of infection-induced tnfa promoter-driven GFP produced around sites of infection ....". How can a knockdown of tnfa affect the GFP expression driven by the tnfa promoter ?

      The promoter fragment used in the TgBAC construct contains target sites for two of our guide RNAs. We will also include qPCR validation of the knockdown.

      Reviewer #3 (Significance (Required)): Mechanism underlying decreased intracellular potassium level is of great interest in the inflammasome field. However, their observation is not in line with published studies. Audience in the pathogen-host interaction field will be interested. Expertise: dissection of signalling pathway regulation, molecular and cellular mechanism underlying NLRP3 inflammasome activation. We are not using zebrafish model.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In this manuscript, using in vivo infection of Zebrafish embryos with Mycobacterium marinum and THP1-derived macrophages infected with Mycobacterium tuberculosis, the authors show that these pathogenic mycobacteria trigger an increase of K+ concentration through the expression of OXSR1. The ESX1 secretion system that is essential for the virulence of M. marinum is required for the expression of OXSR1 and SPAK. OXSR1 and SPAK are involved in the WNK signaling pathway and are cytoplasmic serine/threonine protein kinases that regulate the function of a series of sodium, potassium and chloride co-transporters via phosphorylation. Given that K+ efflux is now accepted as the main inducer of NLRP3 inflammasome, the authors report that this infection-induced OXSR1 expression restrains the protective NLRP3 inflammasome response leading to IL-1b maturation and secretion. Il-1b as a very potent pro-inflammatory triggers TNF-a production and the authors demonstrate that infection-induced OXSR1 expression suppressed host protective TNF-a and cell death early in fection. It appears therefore that virulent mycobacteria induce OXSR1 expression to reduce inflammasome activation by maintaining high intracellular K+.

      The results presented by the authors are convincing and the conclusions raised by the authors are well supported by the data.

      In zebrafish embryos, OXSR1 knockdown nicely reduces mycobacteria burden. Based on their conclusions that infection-induced OXSR1 expression reduces NLRP3 inflammasome activation, NLRP3 inflammasome activation has therefore a protective effect against bacterial infection. My main concern is that surprisingly, nlrp3 or il1b knockdown has no effect on bacterial burden in comparison to control embryos. Lane 256, as an explanation, the authors wrote "This may have been because we were using mosaic F0 CRISPR knockout, which is not a complete removal". The removal using mosaic F0 CRISPR knockout is nevertheless sufficient to observe a decrease in bacterial burden following OXSR1 knockdown. Would it be possible that OXSR1 also regulates immunity independently of NLRP3 inflammasome?

      Other comments:

      OXSR1 WB in extended Data 3 is really poor quality so that it is hard to see the increased expression of OXSR1 following infection.

      Figure 2C. It is not shown but I guess that similar results should be obtain using M. tuberculosis.

      Figures 5D and 5E. To confirm the involvement of NLRP3, in addition of using MCC950, NLRP3 knock down using siRNA should be also performed. NLRP3-deficient THP-1 cells are also commercially available if the siRNA-mediated knock down of NLRP3 is not convincing enough.

      Minor comments:

      How do the authors think that mycobacterium induces OXSR1 expression following infection? It has not been investigated and it is not discussed.

      Significance

      The observations reported in this manuscript are interesting since for the first time, it is described that virulent mycobacteria induce OXSR1 expression to reduce NLRP3 inflammasome activation by maintaining high intracellular K+. This is quite a significant advance in the field. To escape immune control, many successful intracellular pathogens have evolved methods to limit inflammasome activation. While it is known that potassium efflux is a trigger for inflammasome activation, the interaction between mycobacterial infection, potassium efflux and inflammasome activation was not explored.

      My field of expertise is the regulation of inflammasome activation. As far as I remember, I've never reviewed a paper using zebrafish embryos but here, the explanations and data are clear so that it was easy to understand and to evaluate. Likewise, I did not know the WNK signaling pathway but the literature clearly shows that it is involved in intracellular ionic balance.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required):

      In this project, authors develop a colorimetric and luminescence assay for the detection of SARS-CoV-2 RNA in vitro. They design an RNA based sensor that will be triggered by target RNA then release the ribosome binding site and a translation start site followed by a reporter gene. The released sequence will then trigger the production of reporter protein by transcription-translation coupled assay. Authors also introduce an RNA amplification step in order to increase the sensitivity of this assay.

      **Strengths:**

      This assay provides a simple, rapid way to detect SARS-CoV2 and it is an elegant way to incorporate transcription-translation coupled assay for SARS-CoV-2 RNA detection and identify SARS-CoV-2 patient samples. It is a nice assay and the performance is comparable with the existing method.

      **Weaknesses:**

      However, the positioning of this assay is not very clear. The readout of this assay could be recorded by camera whereas it includes several steps such as RNA extraction, amplification, transcription-translation coupled assay and reporter reaction. The limitations of the existing methods (RT-PCR, paper strip) and the advantages of this assay haven't been demonstrated by the experiments. The stability of RNA may also restrict the application of the proposed assay on site.

      **Major comments:**

      Authors are suggested to design an experiment to show the advantage of this assay compared with the existing method.

      Response: We thank the reviewer for pointing this out. In Fig 5, we show a comparison of our assay with the bench mark in COVID-19 diagnostics, which is the RT-qPCR assay. We specifically correlate the Ct- values obtained for RT-qPCRs with the amount of color or luminescence obtained through our assay. From these experiments we note that the sensitivity of our assay is a lttle less than the RT-qPCRs where our assay does not detect Ct-values in the 36 to 38 range (very low viral loads). This comparative experiment highlights that our assay bears clear advantages over the RT-qPCR in terms of ease of assay set up, ease of color detection, amenability to cell-phone imaging and no requirement of sophisticated equipment or technical training to interpret results. The full details of these comparisons are discussed in the manuscript.

      This is consistent with the literature on COVID-19 diagnostics where new assays are routinely bench-marked against the “gold-standard” RT-qPCR assay ((Corman et al., 2020; Pearson et al., 2021).

      What is the limit of detection of this assay using LacZ and Luciferase reporter respectively?

      Response: The limit of detection of the assay as shown in Fig 4B and Fig 4C-D, was found to be 100 copies of RNA, which translates to a concentration of 8 attomolar RNA. In this case, we find the limit of detection to be the same for both LacZ (Fig 4B) and Luciferase (Fig 4C-D) reporter.

      The calculations of copy number and sensitivity were made using a commercial source of synthetic CoV-2 RNA (Twist Biosciences) that is used in several studies about COVID-19 diagnostics (Joung et al., 2020; Rabe & Cepko, 2020; Wu et al., 2021). The RNA copy numbers are taken from the product details provided by the manufacturer. These details are now clearly stated in the manuscript. The commercial RNA is provided at 106 copies per ul. From this we take as low as 100 copies per 20ul of NASBA reaction, which we are able to detect using our assay. Hence our sensitivity comes to 8 attoMolar. We have clarified this in the manuscript. We noticed a typo in the original submission where we refer to a sensitivity of 80 attomolar in the Discussion. This is corrected to 8 attomolar. With this sensitivity we are within the range to detect RNA in patient samples, as confirmed by our patient data.

      Authors have not examined the selectivity of this assay. What is the specificity, selectivity for each of these variants? Does altering target RNA change the specificity?

      Response: We thank the reviewer for raising this point. As recommended by the reviewer, we have now examined the selectivity of this assay through new data (See new Fig S3, new Fig S4 and new Fig S8, also shown below).

      We have examined selectivity in 3 different ways.

      1. Is our sensor selective to the said region of the SARS-CoV-2 genome? To address this, we generated 19 different Target (Trigger) RNAs spread across the SARS-CoV-2 genome. These were tested against Sensor 12 to examine for their ability to trigger the sensor. We find that our sensor is highly selective for its target RNA and does not show any detectable response to the other regions of SARS-CoV-2 (see new Fig S3).

      Next, we asked if our assay is selective to SARS-CoV-2 versus other related human corona viruses. For this, we first examined the sequence of the target RNA (Amplicon RNA 12) that is sensed by Sensor 12. We selected equivalent regions of RNA from a different coronavirus, the HKU1 human coronavirus family. We generated these RNA sequences in vitro and performed IVTT. These new data are shown in new Fig S4 and below. We find that the human coronavirus (HKU1) RNAs are not able to turn on our sensor, whereas the cognate SARS-CoV-2 RNA is able to.

      We then asked if our assay can detect a current prominent variant of SARS-CoV-2. A major cause of concern is the ability of SARS-CoV-2 to accumulate mutations in its genome, resulting in different variant strains of SARS-CoV-2. Of these variants, the Delta variant (B.1.617.2) is not only highly contagious but has been noted as a possible vaccine breakthrough mutant of SARS-CoV-2. For this, we obtained RNA from the patient nasopharyngeal swab samples from the NCBS-inStem Covid-19 testing Center, Bangalore, India. RNA was isolated in the BSL-3 facility at the testing center. RNA samples were sequenced and confirmed to be the Delta variant- B.1.617.2 (sequences deposited in GASIAD). RNA extracted from these patient samples were tested against Sensor 12 using NASBA followed by IVTT. We find that our assay can efficiently detect the Delta variant SARS-CoV-2 RNA from patient samples with a build up of color, but no color was observed from control samples. These new data are shown below and in new Fig 5F and new Fig S8. The ability to detect the Delta variant of SARS-CoV-2 is an important feature of our sensor since this variant is now of global concern and extensively found in the population, even becoming the dominant variant in several countries (Callaway, 2021; O’Dowd, 2021; Torjesen, 2021).

      In Figure 2C-F, sensor 17 showed higher fold change and sensitivity. Why was sensor 12 selected for further study in Figure 3

      Response: The reviewer rightly notes that sensor 17 responds to 1012 copies of RNA and hence appears to be inherently more sensitive than sensor 12, which responds to 1013 copies of RNA. However, neither of these sensitivities are good enough to detect the levels of viral RNA found in patient samples. Hence we coupled these sensors with a step of NASBA amplification. The screen to identify pairs of NASBA primers gave us great hits for sensor 12 right off the bat, where we could detect down to 100 copies of RNA. Hence we moved forward with sensor 12 for further experiments. This has now been clarified in the manuscript.

      Authors should show the error bar in all plots. Authors should also indicate what the error bar means (SD, S.E.M. etc.) throughout the manuscript.

      Response: This is an important point. We have added the error bars and statistical analyses to all relevant plots. We have included the description of these statistical parameters in the figure legends throughout the manuscript, where relevant. Alternatively, experimental replicates are indicated and shown in the revised manuscript. Specifically in Figures 2 and 3 and 4D we have performed statistical analysis to include p-values to show significance of the data. For the data in Figure 4 B-C we include the experimental replicates as a new Supplementary Figure (see new Fig S5). Data in Figure S5 is now updated to include the experimental replicates. For the patient data in Figure 5, we have included details of specificity and sensitivity analysis for clinical samples (see new Fig 5C).

      **Minor comments:**

      "This method is relatively faster but may generate false positives due to non-specific amplification and primer interactions." Reference is needed.

      Response: We have now added the following references in support of this statement. (Gadkar, Goldfarb, Gantt, & Tilley, 2018; Sahoo, Sethy, Mohapatra, & Panda, 2016)

      "using the softwares Primer 3 and NUPACK." Reference is needed.

      Response: We have now added the following references (Untergasser et al., 2012; Zadeh et al., 2011)

      Reference 15 belongs to CRISPR-CAS based assay but it was cited under RT-LAMP assay.

      Response: This has now been corrected. We thank reviewer for this.

      Reviewer #1 (Significance (Required)):

      This paper will be of interest to scientists interested in developing diagnostic tools for the detection of SARS-CoV2 in viral and host pathogenic sequences; genetic disorders and development of precision medicine.

      Reviewer works in the field of Chemical Biology and Nanotechnology including sensor development and the application in diagnosis, cell physiological studies.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, Charkravarthy et al. report a new method for detecting SARS-CoV-2 RNA in both in vitro and human saliva and nasal samples. The new detection method, PHANTOM, is capable of detecting as few as 100 copies of the SARS-CoV-2 genome. The method is demonstrated to reproducible over a large range of viral titers and results in a binary report on CoV-2 infection. From my perspective the results are strong and fairly convincing (please see comments below). There is clear, logical, flow to the experiments and engineering of the PHANTOM system. The collaborative work is well organized and logical. The work is clearly of high significance and certainly merits expedited review and publication. I would like to unambiguously state that support publication of this manuscript in its current form in the non-peer reviewed context of this journal, would be more than happy to provide further peer review of this manuscript upon submission to another journal, and would be more than happy to provide further comments if requested by the authors.

      My personal background is broad in range, however, I have a long track record of research in RNA folding, structural biology, biosensor development, and bioinformatics. Given this knowledge base, I found the manuscript rather easy to read and digest. The manuscript is well written and clear. In order to expedite the process of review I will not give a detailed review which would include grammatical errors (there are are very few). Rather, I will touch on the most pressing issues I see.

      **Major concerns:**

      1) There a number of figures that do not show a statistical measure of significance (e.g. error bears, ANOVA, etc.). It is essential that these be included in the final peer reviewed publication. (See Figure 2A, Figure 3D, Figure 4B, Figure 4C, Figure 5A, Figure 5C, Figure 5D).

      Response: This is an important point. We have added the error bars and statistical analyses to all relevant plots. We have included the description of these statistical parameters in the figure legends throughout the manuscript, where relevant. Alternately, experimental replicates are indicated.

      Specifically in Figures 2 and 3 and 4D we have performed statistical analysis to include p-values to show significance of the data. For the data in Figure 4 B-C we include the experimental replicates as a new Supplementary Figure (see new Fig S5). Data in Figure S5 is now updated to include the experimental replicates. For the data in Figure 5, we have included details of specificity and sensitivity analysis for clinical samples (see new Fig 5C).

      2) There are some important points that do not include references within the manuscript. I believe that the authors should reference Abdolahzadeh et al. RNA 2019 in the introduction. This manuscript describes another NASBA viral detection system using fluorescent RNA reporters (also see Trachman et al. Q. Rev. Biophys 2019, for reference on fluorescent aptamers). Also see the ROSALIND method (Jung et al. 2020 Nature Biotechnology) for detecting water contaminants using visual identification by fluorescent aptamers.

      Response: We have added the above mentioned references to the manuscript as suggested by the reviewer.

      3) The discussion states that "The overall sensitivity in the attomolar range ensures detection of infection in the majority of Covid-positive patients in a population". Please provide a reference to support this and explicitly state the concentration of viral RNA in patient samples. There are a number of times that the copy number of viral genomes and sensitivity of the measurement is stated throughout the manuscript. There should also be a reference and statement about concentration.

      Response: The reviewer has raised multiple connected points here, which we address in the revised manuscript.

      1. Concentration of RNA in patient samples: We have added the references (Pujadas et al., 2020; Wyllie et al., 2020) where the authors report that the typical concentration of viral RNA in patient nasopharyngeal swab samples lies in the range of 104 to 105 copies of RNA per ml. This translates to a concentration range of 10 to 100 attoMolar. This reference is now added to the manuscript. For the patient samples used on our study, we refer to the Ct- values obtained from the RT-PCR tests and correlate Ct values to the readout from our assay, consistent with other reports on COVID-19 diagnostics ((Joung et al., 2020; Vogels et al. 2020; Wu et al., 2021).

      Copy number and sensitivity: As the reviewer notes, we refer to viral genome copy number and sensitivity of our assay in the manuscript. These calculations of copy number and sensitivity were made using a commercial source of synthetic CoV-2 RNA (Twist Biosciences) that is used in several studies about COVID-19 diagnostics (Joung et al., 2020; Rabe & Cepko, 2020; Wu et al., 2021). The RNA copy numbers are taken from the product details provided by the manufacturer. These details are now clearly stated in the manuscript. The commercial RNA is provided at 106 copies per ul. From this, we take as low as 100 copies per 20ul of NASBA reaction, which we are able to detect using our assay. Hence our sensitivity comes to 8 attoMolar. We have clarified this in the manuscript. We noticed a typo in the original submission where we refer to a sensitivity of 80 attomolar in the Discussion. This is corrected to 8 attomolar. With this sensitivity we are within the range to detect RNA in patient samples, as confirmed by our patient data.

      Reviewer #3 (Significance (Required)):

      I think this is a significant advancement in the field. The introduction of smartphone technology to this robust diagnostic is very attractive. The work is of high significance since the researchers demonstrated robust responses against SARS-CoV-2 variants. As well all now know these are on the rise and cheap robust detection methods are essential for containing this virus.

      Response: We thank the reviewers for the positive comments.

  4. Jun 2021
    1. very few traditional humanists would call their source material “data.” You may have seen this piece in the LA Review of Books in October 2012. While the language is pretty hyperbolic, I do think it helps to convey how uncongenial many humanists feel the notion of data is to the work that they actually do.

      This point about where to draw the line between data and artifacts is interesting considering that digital humanities is built upon the very concept of turning artifacts into data. This connects to the concepts in the article by Trevor Owens about the various properties of data. If we view data as an artifact which can also serve as a source of evidence, then we can preserve the integrity and multifaceted nature of the dataset while still using it so serve the purpose of providing a specific source of numerical evidence. It seems to me that this idea is very important to the digital humanities considering the susceptibility to losing the integrity and humanist nature of original data sources when viewing them as sources for discrete data sets.

    1. I worry that social justice may become simply a “topic du jour” in music education, aphrase easily cited and repeated without careful examination of the assumptions and actions itimplicates.

      I completely agree with this statement, and I think that it's become a buzzword (like Alex said) in schools in general, not even just in the field of music education. Our district hired an Equity Officer about 2 years ago, and I was really hoping that they would have a strong presence in our district, at curriculum review meetings, providing PD, etc....I think I have seen them once since I was hired and it was at New Teacher Orientation. We have someone there that could be helping us to fully understand some of these terms/topics instead of assuming we know what it is, its implications, its assumptions, etc. but it feels as if they're not being fully utilized.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Responses to reviewers’ comments

      We thank the reviewers for their encouraging comments and helpful suggestions.

      Reviewer #1

      (Evidence, reproducibility and clarity (Required)):

      Sanchez et al report several new findings about the adhesive protrusions on Plasmodium falciparum infected erythrocytes. Using super resolution microscopy and correlation analysis, they tracked associations between the knob protein KAHRP and erythrocyte membrane cytoskeleton proteins. They have expanded on and improved previous work on the unusual spiral structure of the knobs, which appears to be a spiral ribbon or blade and have shown a developmental pathway for the association of KAHRP with the cytoskeleton. They have localised KAHRP close to the spiral and determined its abundance in the knobs. They have also used cryo electron tomography and subtomogram averaging to get an improved 3D view of the knob structure.

      The work appears to be carefully and thoroughly done, and the paper is clearly written, though non specialists in the optical methods may find it challenging to navigate through the many super resolution images and correlation plots.

      Comment 1: The writing needs minor editing to fix a variety of small linguistic errors and typos. For example, line 97 "sideway positions" (they presumably mean lateral location), line 980 typo overlay, line 366 "then could reorganizes", line 435, "a predict volume".

      We apologize for the linguistic errors and typos. These have been corrected in the revised manuscript.

      (Significance (Required)):

      Comment 2: The study provides a distinct advance on the previous state of knowledge of the structure and biochemistry of the knobs. The knobs play a key role in virulence of P. falciparum and they are quite poorly understood. Although this paper does not represent a major breakthrough in determining the molecular structure or mechanistic role of the knobs, e.g. the biochemical identity of the spiral remains unknown, the new information is valuable and likely to be important in understanding the pathogenic actions of P falciparum.

      We thank the reviewer for appreciating the importance of our study. We believe that our first-time observations on the dynamics of KAHRP are a very important advance in the field and that revealing the mechanistic basis is a great challenge that at the current stage has to be left to future work.

      Comment 3: The interpretation shown in Figure 7 seems fine, except for the proposal that the actin cytoskeleton is reorganised. There is no evidence for that. The cryo tomograms of the cytoskeleton in Watermeyer et al addressed this point and did not find any evidence for reorganisation of the cytoskeleton other than the insertion of the knobs.

      In two previous studies we could show that actin is indeed reorganized by the parasite. It is mined from the protofilaments to generate long actin filaments that connect the knobs with the Maurer’s clefts and which are used for trafficking of cargo vesicles from the Maurer’s clefts to the erythrocyte plasma membrane (Cyklaff et al. Hemoglobins S and C interfere with actin remodeling in Plasmodium falciparum-infected erythrocytes. Science. 2011 334:1283-1286; Cyrklaff et al. Oxidative insult can induce malaria-protective trait of sickle and fetal erythrocytes. Nat Commun. 2016 7:13401). Moreover, a life-cycle resolved AFM-study of the cytoplasmic side of iRBCs by the group of CT Lim has demonstrated dramatic coarsening of the spectrin network, which must be accompanied by changes to the actin component of the skeleton (Shi, Hui, et al. "Life cycle-dependent cytoskeletal modifications in Plasmodium falciparum infected erythrocytes." PLoS One 8.4 (2013): e61170). Coarsening of the actin-spectrin network would imply a decrease of the amount of actin in the network, which is consistent with its use in the parasite-derived long actin filaments.

      \*Referee Cross-commenting***

      I also agree with the other 2.

      Reviewer #2

      (Evidence, reproducibility and clarity (Required)):

      Malaria parasites replicate within circulating red blood cells (RBC). During parasite maturation, the parasite coordinates extensive modification of the host cell, including structural modifications of the RBC cytoskeleton and surface membrane. These host cell alterations play crucial roles in the pathology of malaria, including vascular adhesion by parasitised cells and avoidance of splenic clearance, and so are of great interest. This interesting manuscript describes a detailed examination of the role in these RBC modifications of a well-described parasite protein called KAHRP. Using a combination of cutting-edge super-resolution microscopy, cryo-electron tomography, immuno-EM, SEM and parasite mutagenesis, the authors provide evidence that KHARP localisation alters during parasite maturation but eventually becomes closely associated with the previously-described spiral structures that underlie infected RBC surface membrane protrusions called knobs. The authors provide improved resolution of the spiral formations, generate a quantitative estimate of the number of KAHRP molecules per knob, and provide a model for the role of KAHRP in attaching other proteins to the spirals based on their observations.

      In general, this study is thorough and well-performed, and the conclusions drawn are well-supported by the data. Although the work does not advance understanding of knob function or the parasite components that form the bulk of the spirals, it provides an interesting and useful contribution to understanding of the manner in which this important pathogen manipulates its host cell.

      We thank the reviewer for appreciating the importance of our study and in acknowledging that it is an important intermediate step towards a complete understanding of skeleton remodelling by the parasite.

      I have just a few minor suggestions that should improve the manuscript.

      Comment 1: Line 91 (Page 2 paragraph 2). It would be greatly helpful here if the authors could provide a more detailed background on the makeup of the RBC cytoskeleton, and in particular the interactions between beta-spectrin and the actin protofilaments of the junctional complexes. The authors should make it clear that the actin-binding domain of beta-spectrin comprises 2 calponin like domains, and that these are attached to the end of the tandem spectrin repeat domains that make up the bulk of the molecule.

      We thank the reviewer for this helpful suggestion and have added a new paragraph to the results section providing detailed background information on the makeup of the RBC membrane skeleton. The new text reads as follows:

      “Major components of the red blood cell membrane skeleton are spectrin and actin filaments (Fig. 1B). The spectrin filaments consist of α- and ß-spectrin, which form α2ß2 heterotetramers by head-to-head association of two αß dimers (Lux, 2016; Machnicka et al., 2014). The N-termini of the ß-spectrin subunits are positioned at the tail ends of the heterotetramer and contain two calponin homology (CH) domains for binding to actin protofilaments consisting of 6 to 8 actin monomers in each of the two strands (Lux, 2016; Machnicka et al., 2014). Protein 4.1R strengthens the spectrin actin interaction (Lux, 2016; Machnicka et al., 2014). Groups of up to six spectrin heterotetramers can attach to an actin protofilaments, resulting in a pseudohexagonal meshwork (Lux, 2016). Ankyrin binds to the C-terminal domain of ß-spectrin and connects integral membrane proteins with the actin spectrin network in an ankyrin complex (Lux, 2016; Machnicka et al., 2014).”

      Comment 2: Line 97 "These values are slightly larger than the reported physical dimension of the protofilament...". Please provide these reported dimensions here, as well as relevant references.

      The requested information is now provided. The sentence now reads as follows:

      “These values are slightly larger than the reported physical dimension of the protofilaments of ~37 nm (Lux, 2016) and might be explained by the lateral localization of the spectrin binding sites and the additional sizes of the primary and secondary antibody trees used to detect the two targets.”

      Comment 3: Line 366 "reorganize"

      The spelling mistake has been corrected.

      (Significance (Required)):

      Comment 4: This is a useful technical advance in understanding of the structure of the P. falciparum-infected red blood cell, and builds on the work of Watermeyer et al. (2016). The study should certainly be of interest to most malaria researchers, particularly those interested in the pathobiology of the organism.

      We thank the reviewer for supporting our study.

      \*Referee Cross-commenting***

      I fully agree with and endorse the comments of the other 2 reviewers.

      Reviewer #3

      (Evidence, reproducibility and clarity (Required)):

      The binding of P. falciparum infected erythrocyte (iRBCs) to the endothelium is mediated by protuberances (knobs). These knobs are assembled by a multi-protein complex at the iRBC surface. It acts as a scaffold for the presentation of the major virulence antigen, P. falciparum Erythrocyte Membrane Protein-1 (PfEMP1). The knob-associated histidine-rich protein (KAHRP) is an essential component of the knobs and therefore essential for the binding of iRBC to the endothelium under physiological conditions. This manuscript focusses on the knob architecture and KAHRP localization.

      Comment 1: It is, at least for this reviewer - hard to assess how the "preparation of exposed membranes by hypotonic shock" and the analysis of the "inverted erythrocyte membrane ghosts" is i) reflective of the physiological architecture within the iRBC and ii) how the authors exclude remnants from Maurers clefts (MCs) in their preparation. The latter appears especially important for the interpretation of dynamic KAHRP repositioning, as MCs are mobile in early stages and non-mobile later on (e.g. McMilian et al. 2013, Grüring et al. 2011) and the authors observed at least some MAHRP1 signal (Figure S8), which is hard to interpret by the single representative image provided.

      We understand the reviewer’s concerns, but are convinced that we have done the necessary controls to evaluate our approaches. For example, we evaluated the exposed membrane approach by investigating uninfected erythrocytes and comparing the findings with literature reports (see Figure 1). A high degree of agreement was observed. We further would like to point out that the exposed membrane approach has been successfully used by several other studies referenced in the manuscript (Dearnley et al., 2016; Looker et al., 2019; Shi et al., 2013). Please also allow us to explain why we have used exposed membranes instead of whole cells. The reason is that the hemozoin produced by the parasite interferes with STED microscopy, resulting in a quick and strong build-up of resonance energy in the specimen and, eventually, in the disruption of the cell.

      With regard to the question of whether remnants of Maurer’s clefts are present in our preparations, we do not think so, at least we never observed membrane profiles reminiscent of Maurer’s clefts in SEM images of exposed membranes (see figure at the end of the response letter). Irrespectively, we will double check this result using STED imaging of exposed membranes treated with an antibody against the established Maurer’s clefts marker SBP1. These data could be added to a revised manuscript.

      Comment 2: line 173: Please provide a detailed description about parasite synchronization (also absent in the methods section).

      A detailed description including references are now added to the methods section:

      “For synchronization of cultures, schizont-infected erythrocytes were sterile purified using a strong magnet (VarioMACS, Miltenyi Biotec) (Staalsoe et al., 1999) and mixed with fresh erythrocytes to high parasitaemia. 5000 heparin units (Heparin-sodium 25000, Ratiopharm) were added and the cells were returned to culture for 4 hrs (Boyle et al., 2010). Following the treatment with heparin, cells were washed with pre-warmed supplemented RMPI 1640 medium and then returned to culture for 2 hrs to allow for re-invasion of erythrocytes. Subsequently, cells were treated with 5% sorbitol to remove late parasite stages (Lambros and Vanderberg, 1979).”

      Comment 3: line 136: Please re-check nomenclature of "PHIS1605w" (mixed nomenclature used throughout the manuscript). I suggest to use either LyMP or the up-to-date ID PF3D7_0532400.

      We apologize for the oversight and now consistently use the ID PF3D7_0532400.

      Comment 4: Please provide source and references for PfEMP1, MAHRP1 and "PHIS1605w" antibodies that are used. I cannot find them in the methods section or in Table S1.

      We apologize for the oversight and now provide the requested information in the amended Table S1.

      Comment 5: line 165: Warncke et al. (2016) appears to be misplaced as an appropriate MAHRP1 reference.

      We now cite the original MAHRP1 publication by Spycher et al. 2003.

      Comment 6: line 159: the sentence "The strong cross-correlation between KAHRP and actin is consistent with previous cryo-electron tomographic analysis showing long actin filaments connecting the knobs with Maurer's clefts in trophozoites (Cyrklaff et al., 2012; Cyrklaff et al., 2011; Cyrklaff et al., 2016)" could be moved to the discussion section.

      The sentence was indeed redundant with a section in the discussion and was removed.

      Comment 7: line 199: The text refers to Fig. 9AB - but should refer to 4AB or suppl. 11.

      We are sorry for this mistake and now refer to the correct figures in the revised manuscript.

      Comment 8: Fig. 4: A solid average for the number of subtomograms, but please provide information about what the arrowheads (4E) indicate.

      Thank you for this comment. The arrowheads indicate peripheral crown-like densities. We have updated the figure legend to clarify this issue.

      Comment 9: The "flexible periphery" is likely a combination of flexibility and occupancy as the average was made from subtomograms with varying number of turns in the spiral. As occupancy is likely a significant contributing factor to the average that should be discussed or at least mentioned.

      Thank you for this important comment. Indeed, a significant variation was observed between the individual knobs. The spirals have variable diameter, and the number of peripheral proteins also varied. We added measurements to the supplementary figure 11D. In addition, we update the text and extended the discussion.

      Comment 10. On that note, did the authors try and classify based on number of turns prior to averaging and if so did the authors see any differences in structures between few turn and many turn spirals?

      We attempted several classifications on the full knobs with variable masks. However due to a limited number of particles in the dataset we could not converge to stable solutions. Instead, we decided to adopt the subboxing strategy where locally ordered segments at the periphery could be analyzed. This showed several structural snapshot at the periphery of the knobs.

      Comment 11. What size mask was used? Was it a soft sphere around the core or big enough for the knobs with multiple spiral turns?

      While we attempted several alignments and classifications with variable masks, the final refinement and measurement of FSC was performed with a soft contour mask mask. We overlaid it with the structure in Figure S11F and uploaded it as a part of the EMDB deposition. We further show the masks used in this study in a new Figure S14.

      Comment 12. It might be useful for readers who are not familiar with Dynamo to provide a little bit more information about how the initial reference was produced. Additionally more information about the sub-boxing strategy ie: spacing etc. would helpful.

      Thank you very much for the suggestion. For the initial reference we manually aligned all the particles, summed them up and low-pass filtered them. We now describe it in the methods section.

      For the subboxing procedure we added more description to the main text:

      “40 segments were extracted at the radius of the 2nd and 3rd spirals followed by their classification into structural classes.”

      We further extended and simplified the description in the results section (line ~221).

      Comment 13: Fig. 5 Additional (earlier) maturation stages of the iRBC with Ni2+NAT-gold-labelling would be a nice add on - this could help confirm the model and would itself be a control for the later stage labelling.

      We thank the reviewer for this insightful suggestion. We are currently performing the proposed experiment and will include it in a revised version of the manuscript.

      Comment 14: line 637: DMSI typo and please provide the supplier for DMSI (DSM1).

      We corrected the typographic error and now provide the name of the supplier.

      Comment 15: Figure 7: Please provide what the purple arrows indicate.

      The figure legend has been updated.

      Comment 16: Fig S11D: The labels X, Y and Z are confusing, describing the slicing axis as "XZ, YZ and XY" view is more intuitive.

      Done as suggested by the reviewer.

      Comment 17: Figure S13 B: WBs are cropped. Please provide un-cropped WB.

      Uncropped Western blots will be provided in the revised manuscript.

      (Significance (Required)):

      In general, I highly appreciated the solid data and its thorough analysis of the microscopy data. The authors investigate the structural organization of knobs in iRBCs using high-resolution imaging techniques including STED and PALM super-resolution microscopy-based approaches and electron tomography. The beauty of this paper is that it does nicely re-investigate knob architecture in iRBC (e.g. Watermeyer et al., 2016, Cutts et al., 2017, Looker et al., 2019, McHugh et al., 2020) and provides some intriguing KHARP co-localization with cytoskeleton components. The downside of it is that - by nature - it is descriptive (and the data rather confirmative) and as it stands does not provide us with a deeper molecular dissection of the knob associated structure and its cellular function.

      We thank the reviewer for appreciating our study and would like to emphasize the following novelties in our study:

      • We show that the association of KAHRP with membrane skeletal components is highly dynamic and changes as the parasite matures. Our results on the dynamics of KAHRP organization reconciles conflicting reports in the literature, and establish for the first time a dynamical model for KAHRP organization.
      • We further show that KAHRP finally assembles at remnant actin-junctional complexes devoid of the actin-capping factors adducin and tropomodulin.
      • We further quantified the number of KAHRP molecules per knob and show that KAHRP is present as 60 copies per knob, a number one order of magnitude greater than previously thought.
      • Last but not least, we provide a 35 Å map of the spiral scaffold underlaying knobs and show that KAHRP associates with the spiral scaffold.
      • We conclude by providing a novel model on the biological function of KAHRP by proposing that KAHRP acts as a glue that connects spectrin and parasite-remodeled actin filaments with the knob spiral.

        \*Referee Cross-commenting***

      Fully agreed.

      Boyle, M.J., Wilson, D.W., Richards, J.S., Riglar, D.T., Tetteh, K.K., Conway, D.J., Ralph, S.A., Baum, J., and Beeson, J.G. (2010). Isolation of viable Plasmodium falciparum merozoites to define erythrocyte invasion events and advance vaccine and drug development. Proc Natl Acad Sci U S A 107, 14378-14383.

      Lambros, C., and Vanderberg, J.P. (1979). Synchronization of Plasmodium falciparum erythrocytic stages in culture. J Parasitol 65, 418-420.

      Lux, S.E.t. (2016). Anatomy of the red cell membrane skeleton: unanswered questions. Blood 127, 187-199.

      Staalsoe, T., Giha, H.A., Dodoo, D., Theander, T.G., and Hviid, L. (1999). Detection of antibodies to variant antigens on Plasmodium falciparum-infected erythrocytes by flow cytometry. Cytometry 35, 329-336.

    1. Author Response:

      Reviewer #1 (Public Review):

      This paper examines muscle activity at single muscle level during Drosophila ecdysis (adult hatching) behavior. The premise is that quantifying behavior or motor neuron activity is insufficient to understand how the CNS generates behavior - it is also critical to quantify muscle activity. They show that abdominal body wall muscles generate stereotyped patterns of activity during four developmental stages; (phase 0, stochastic activity; phase 1-3, each with different patterns of activity. Co-active groups of muscles form "syllables" which are used in different combinations to generate the stereotyped activity seen in phases 1-3. This analysis was facilitated by use of a convoluted neural network. Interestingly, they found examples where muscle contraction did not match muscle activity (GCaMP elevation), showing the importance of measuring both attributes.

      In addition to mapping the stereotyped muscle activity at single muscle resolution in the generation of ecdysis behavior, they find that phase 1 and 3 are quite variable, and speculate that other constraints on the CNS output (e.g. during larval locomotion) may prevent a sharpening up of muscle patterns. They show that the hormone ETH is required for initiating phase 1, and the neuromodulators bursicon and CCAP are required for initiating phase 2. Failure to initiate either phase is lethal. Lastly, they show that in addition to initiating phase 1 or 2, the hormone/neuromodulators result in more coherent muscle activity.

      Overall this study sets the stage for a detailed analysis of motor neuron function in driving muscle activity patterns, and then further into the CNS to understand the role of premotor neurons. Ecdysis behavior has the potential to be a powerful system for understanding how the CNS generates behavior at the single muscle /single motor neuron level, as well as for understanding how neuromodulators act to regulate muscle/motor neuron activity.

      The figures are almost all too small to see the salient information, and the color scheme is often difficult to resolve. Please enlarge the key aspects of the figures; and try to use more distinctive colors where critical comparisons need to be made. Some examples: left/right colored lines in 1G; panel 3D; lines in 3E; all data in 5G (this is the worst for tiny data); 6C,D,J; all of 7.

      Thank you for your thoughtful review and your suggestions on how to improve the manuscript. Some figure panels (e.g. 5G) have been completely replaced. The others mentioned have been divided into multiple figures or panels, which allowed us to enlarge the material in each. Fig. 7 was deleted from the revised manuscript because it was generally found unhelpful. We also felt that the other revisions rendered this figure unnecessary. The revised manuscript now has 11 main figures and 9 figure supplements with more generous layouts for individual panels so that details are more easily resolved. In addition, we attempted to improve the color scheme to facilitate clarity, using the color palette recommended for the color-blind. Other specific changes are referenced in our responses to individual concerns below.

      Reviewer #2 (Public Review):

      The manuscript by Diao et al. is an important extension of their eLife paper of 2017. Their development of new tools that allow them to follow Ca2+ transients in single muscle fibers over the whole animal through the behavioral sequence and also to independently monitor the Ca2+ transients in the endplates of the motor neurons that innervate these muscles. Their goal is to break down the movements that control the ecdysis sequence into elemental "syllables" and then to defined the role of these syllables in constructing progressively complex behavioral programs and as targets of neuropeptide modulation.

      A crucial behavior that occurs during P1 in higher flies is the movement of the gas bubble but this event is largely ignored in the paper. Prior to pupal ecdysis, gas is expelled into the posterior puparial space and then actively translocated, via muscular contractions of the body wall, to the anterior end of the puparium during the latter portion of P1 (shown nicely in the author's 2017 Video). A detailed study by C.G. Chadfield & J.C.Sparrow (1985. Dev. Genetics 5: 103) of pupal ecdysis in Drosophila emphasized the importance of this translocation for head eversion. When they simply removed the operculum at the start of bubble movement, then the gas bubble could not push the animal backwards in the puparial case and head eversion could not occur. However, they saw normal pupation and head eversion if the removed operculum was immediately replaced and sealed down with petroleum jelly.

      During translocation, the bubble moves in a fragmented fashion between the pupal cuticle and the puparium. Ignoring this movement leads to statements like on line 378 "Because pupal ecdysis is independent of environmental factors and executed in the absence of competing physiological needs, it is likely that its variability is intrinsic to the ecdysis network." For the pupating animal, its "environment" is the inside of the puparial case and the moving bubble is an unpredictable variable in this environment. The trajectory and route of bubble movement is not fixed, and it is likely that variation in sensory feed-back from the gas movement explains the motor variability and reduced stereotypy during P1. The role for proprioception during this phase is likely to inform the CNS of the progression of the bubble fragments. The author's finding that the blockage of proprioceptors suppresses the behavior progression could mean that this sensory information is needed to signal that an anterior space has been produced, and without this signal, the behavior does not progress to its next phase. This should be addressed in the text if not experimentally.

      We very much appreciate the reviewer’s point that the environment within the puparium may affect the pupa’s motor performance. We have now amended our comment on environmental influences to include this point (ll. 479-481 [515-517]), and we elaborate in the Discussion on conditions within the puparium that may influence movement and sensory processing (ll. 457-477 [493-513]). Following the reviewer’s advice, we note that the gas bubble and its dispersion during P1 must be considered a possible determinant of pupal movement. In addition, we mention other possible determinants that we did not previously discuss, namely substrate and surface tension interactions between the body wall, puparium, and residual molting fluid. In line with the Reviewer’s point that understanding the environment of the puparium is critical, we stress the need to account for all external forces acting on the pupal body to achieve a complete understanding of the pupal motor output. In the Discussion, we also now mention the Reviewers’ interesting hypothesis that creation of the anterior space at the end of P1 may provide sensory information necessary for progression of the behavioral sequence (ll. 534-535 [601-602])

      Another aspect of the background that is missing is considering earlier studies on the ontogeny of behaviors leading up to ecdysis/hatching. Notable are studies of the progressive construction of the flight motor program during metamorphosis in moths (Kammer & Rheuben 1976 J. Exp. Biol. 65:65.) and a similar feature of assembly of motor programs prior to hatching in Drosophila (Crisp et al., 2008 Development 135:3707). In the moth studies, complex motor programs were gradually assembled during ontogeny with motor neurons firing but without muscle contraction (as the authors see in prepupae during P0 - Fig 2C). A lack of excitation-contraction coupling in the moth prevents muscle movement through most of development. This suppression of contraction is essential because prior to production of adult cuticle, muscle contraction would rip the developing animal apart. The same requirement to suppress muscle contraction would be seen in fly prepupa until sufficient pupal cuticle has been secreted to prevent rupture from actual muscle contractions! This should be addressed in the text.

      We thank the reviewer for his comments and for the references on motor program assembly. We agree that this is topic deserved more attention than it was originally given. We have now amended our discussion of P0 to contextualize our observations, pointing to the previous literature on both suppressed muscle activity and latent motor programs observed in other developing animals (ll. 487-500 [523-536]).

      Besides not being explicit about how the syllables combine to build the eight basic movements, it is not clear how these basic movements then combine to support the major behaviors of each phase. This is seen in P1, where we see that swing and brace movements can co-occur (e.g., Fig 3D) but is a swing on one side always associated with a brace on the other? What are their phase relationships? Does their temporal association remain stable as the bouts progress? Another example is in Phase 3. There appear to be 5 basic behaviors associated with bouts in Phase 3. The example in Fig 1H shows double peak bouts in phase 3, and the bulk Ca data show a preponderance of double peaks. The different shapes suggest that there are different movements during the two peaks. Their discussion of P3 movements (around line 273), though, does not address this feature of the double peaks. The example in Fig 7A suggests that some movements, like the PostSwing occur at half the frequency of other movements such as the PostCon and AntComp. Is this the basis of the double peaks and how is that reflected in the movements that are finally produced? This should be addressed in the text.

      We regret the confusion on these points. As described there, we have made numerous changes to the manuscript to clarify how elements of behavior at one level (e.g. movements) derive from lower-level elements (e.g. syllables) and are used to build higher-level elements (e.g. phases). We describe the phase relationships at all levels for P1 and P2 and summarize the more variable constituents of P3 movements in the text (Figs. Fig. 7D, E and ll. 247-275 [274-302]). The specific questions raised by the reviewer are also now answered in the text. In brief, early P2 bouts (roughly those prior to head eversion) differ from later bouts in containing only a Swing. Later bouts contain in addition to the Swing a Brace performed concomitantly on the contralateral side of the body (l. 182-183 [197-199]). The movements contributing to the peak-double peak motif common to P3 are now more carefully described at ll. 351-360 [383-393])

      One approach that I did not find useful was dividing the analysis into compartments - anterior versus posterior and dorsal-lateral-ventral. This may provide a way of generating some statistical analysis, but it did not illuminate anything about the behavior. The line between anterior and posterior segments seems to be arbitrary. Of course, it is important to know if there is directionality of movement [waves going anteriorly versus posteriorly], but beyond that, I am not sure what it adds. [Indeed, it made Fig 7 very confusing!] Also, I could not see a rationale for considering separate dorsal-lateral-ventral compartments. This should be addressed in the text.

      We thank the reviewer for this question, which we now address in a revised section of the Discussion on the topic of neuromodulation and compartmentalization (ll. 539-588 [606-655]). To briefly expand upon our explanation there, we think that compartmental activity allows a useful coarse-grained description of the sequential body wall contractions that give rise to movement as indicated by the SequenceMatcher similarity scores (Fig. 6E in the revised manuscript). Second, and more important, we think that how activity flows across compartments provides clues about both the central organization and the neuromodulatory control of ecdysis behavior. Both ETHRB and CCAP neuron suppression exert selective effects on A-P compartments. ETHRB neuron suppression blocks the Lift, a movement of the posterior compartment, while suppressing CCAP neurons prematurely terminates the first (and only) swing-like movement by blocking its progression into the anterior compartment. Additionally, the distribution of CCAP-R appears to reflect mechanisms for selectively regulating distinct D-V compartments. Myotopic maps of larval motor neuron dendrites show that MNs innervating dorsal and ventral muscles are spatially segregated from those innervating lateral muscles and have distinct inputs. This suggests distinct regulation of activity in D-V and L compartments and likely distinct functions. Importantly, CCAP-R is expressed only in motor neurons of the D and V compartments, but in the L compartment it is expressed in muscles. As we suggest, this may allow the different regulatory mechanisms of compartmental regulation to synergize during P2. Finally, our subdivision of the A-P axis at the boundary between segments 5 and 6 has both anatomical and functional importance. At the pupal stage, selective muscle loss imposes differences in muscle composition of segments anterior and posterior to this boundary. Most importantly, anterior segments contain M12, which is a major contributor to behavior only after P1 and is targeted by neuromodulatory Type III terminals containing CCAP and Bursicon. In addition, the A-P boundary also conforms to the functionally and neuroanatomically defined “hinge” region of Tastekin et al. (2018, eLife,), which regulates the switch from forward to backward movement in the larva. Because the compartmental subdivisions we define conform with neuroanatomical differences and appear to underlie functional differences, our working hypothesis is that they will be important landmarks for mapping behaviorally relevant CNS activity as we begin to image it in the next phase of our work.

    1. Author Response:

      Reviewer #2 (Public Review):

      Summary:

      Frey et al develop an automated decoding method, based on convolutional neural networks, for wideband neural activity recordings. This allows the entire neural signal (across all frequency bands) to be used as decoding inputs, as opposed to spike sorting or using specific LFP frequency bands. They show improved decoding accuracy relative to standard Bayesian decoder, and then demonstrate how their method can find the frequency bands that are important for decoding a given variable. This can help researchers to determine what aspects of the neural signal relate to given variables.

      Impact:

      I think this is a tool that has the potential to be widely useful for neuroscientists as part of their data analysis pipelines. The authors have publicly available code on github and Colab notebooks that make it easy to get started using their method.

      Relation to other methods:

      This paper takes the following 3 methods used in machine learning and signal processing, and combines them in a very useful way. 1) Frequency-based representations based on spectrograms or wavelet decompositions (e.g. Golshan et al, Journal of Neuroscience Methods, 2020; Vilamala et al, 2017 IEEE international workshop on on machine learning for signal processing). This is used for preprocessing the neural data; 2) Convolutional neural networks (many examples in Livezey and Glaser, Briefings in Bioinformatics, 2020). This is used to predict the decoding output; 3) Permutation feature importance, aka a shuffle analysis (https://scikit-learn.org/stable/modules/permutation_importance.htmlhttps://compstat-lmu.github.io/iml_methods_limitations/pfi.html). This is used to determine which input features are important. I think the authors could slightly improve their discussion/referencing of the connection to the related literature.

      Overall, I think this paper is a very useful contribution, but I do have a few concerns, as described below.

      We thank the reviewer for the encouraging feedback and the helpful summary of the approaches we used. We are happy to read that they consider the framework to be a very useful contribution to the field of neuroscience. The reviewer raises several important questions regarding the influence measure/feature importance, the data format of the SVM and how the model can be used on EEG/ECoG datasets. Moreover, they suggest clarifying the general overview of the approach and to connect it more to the related literature. These are very helpful and thoughtful comments and we are grateful to be given the opportunity to address them.

      Concerns:

      1) The interpretability of the method is not validated in simulations. To trust that this method uncovers the true frequency bands that matter for decoding a variable, I feel it's important to show the method discovers the truth when it is actually known (unlike in neural data). As a simple suggestion, you could take an actual wavelet decomposition, and create a simple linear mapping from a couple of the frequency bands to an imaginary variable; then, see whether your method determines these frequencies are the important ones. Even if the model does not recover the ground truth frequency bands perfectly (e.g. if it says correlated frequency bands matter, which is often a limitation of permutation feature importance), this would be very valuable for readers to be aware of.

      2) It's unclear how much data is needed to accurately recover the frequency bands that matter for decoding, which may be an important consideration for someone wanting to use your method. This could be tested in simulations as described above, and by subsampling from your CA1 recordings to see how the relative influence plots change.

      We thank the reviewer for this really interesting suggestion to validate our model using simulations. Accordingly, we have now trained our model on simulated behaviours, which we created via linear mapping to frequency bands. As shown in Figure 3 - Supplement 2B, the frequency bands modulated by the simulated behaviour can be clearly distinguished from the unmodulated frequency bands. To make the synthetic data more plausible we chose different multipliers (betas) for each frequency component which explains the difference between the peak at 58Hz (beta = 2) and the peak at 3750Hz (beta = 1).

      To generate a more detailed understanding of how the detected influence of a variable changes based on the amount of data available, we conducted an additional analysis. Using the real data, we subsampled the training data from 1 to 35 minutes and fully retrained the model using cross-validation. We then used the original feature importance implementation to calculate influence scores across each cross-validation split. To quantify the similarity between the original influence measure and the downsampled influence we calculated the Pearson correlation between the downsampled influence and the one obtained when using the full training set. As can be seen in Figure 3 - Supplement 2A our model achieves an accurate representation of the true influence with as little as 5 minutes of training data (mean Pearson's r = 0.89 ± 0.06)

      Page 8-9: To further assess the robustness of the influence measure we conducted two additional analyses. First, we tested how results depended on the amount of training data - (1 - 35 minutes, see Methods). We found that our model achieves an accurate representation of the true influence with as little as 5 minutes of training data (mean Pearson's r = 0.89 ± 0.06, Figure 3 - Supplement 2A). Secondly, we assessed influence accuracy on a simulated behaviour in which we varied the ground truth frequency information (see Methods). The model trained on the simulated behaviour is able to accurately represent the ground truth information (modulated frequencies 58 Hz & 3750 Hz, Figure 3 - Supplement 2B)

      Page 20: To evaluate if the influence measure accurately captures the true information content, we used simulated behaviours in which ground truth information was known. We used the preprocessed wavelet transformed data from one animal and created a simulated behaviour ysb using uniform random noise. Two frequency bands were then modulated by the simulated behaviour using fnew = fold β ysb. We used β=2 for 58Hz and β=1 for 3750Hz. We then retrained the model using five-fold cross validation and evaluated the influence measure as previously described. We report the proportion of frequency bands that fall into the correct frequencies (i.e. the frequencies we chose to be modulated, 58 Hz & 3750 Hz).

      New supplementary Figure:

      Figure 3 - Supplement 2: Decoding influence for downsampled models and simulations. (A) To measure the robustness of the influence measure we downsampled the training data and retrained the model using cross-validation. We plot the Pearson correlation between the original influence distribution using the full training set and the influence distribution obtained from the downsampled data. Each dot shows one cross-validation split. Inset shows influence plots for two runs, one for 35 minutes of training data, the other in which model training consisted of only 5 minutes of training data. (B) We quantified our influence measure using simulated behaviours. We used the wavelet preprocessed data from one CA1 recording and simulated two behavioural variables which were modulated by two frequencies (58Hz & 3750Hz) using different multipliers (betas 2 & 1). We then trained the model using cross-validation and calculated the influence scores via feature shuffling.

      3)

      a) It is not clear why your method leads to an increase in decoding accuracy (Fig. 1)? Is this simply because of the preprocessing you are using (using the Wavelet coefficients as inputs), or because of your convolutional neural network. Having a control where you provide the wavelet coefficients as inputs into a feedforward neural network would be useful, and a more meaningful comparison than the SVM. Side note - please provide more information on the SVM you are using for comparison (what is the kernel function, are you using regularization?).

      We thank the reviewer for this suggestion and are sorry for the lack of documentation regarding the support vector machine model. The support vector machine was indeed trained on the wavelet transformed data and not on the spike sorted data as we wanted a comparison model which also uses the raw data. The high error of the support vector machine on wavelet transformed data might stem from two problems: (1) The input by design loses all spatial relevant information as the 3-D representation (frequencies x channels x time) needs to be flattened into a 1-D vector in order to train an SVM on it and (2) the SVM therefore needs to deal with a huge number of features. For example, even though the wavelets are downsampled to 30Hz, one sample still consists of (64 timesteps 128 channels 26 frequencies) 212992 features, which leads the SVM to be very slow to train and to an overfit on the training set.

      This exact problem would also be present in a feedforward neural network that uses the wavelet coefficients as input. Any hidden layer connected to the input, using a reasonable amount of hidden units will result in a multi-million parameter model (e.g. 512 units will result in 109051904 parameters for just the first layer). These models are notoriously hard to train and won’t fit many consumer-grade GPUs, which is why for most spatial signals including images or higher-dimensional signals, convolutional layers are the preferred and often only option to train these models.

      We have now included more detailed information about the SVM (including kernel function and regularization parameters) in the methods section of the manuscript.

      Page 19:To generate a further baseline measure of performance when decoding using wavelet transformed coefficients, we trained support vector machines to decode position from wavelet transformed CA1 recordings. We used either a linear kernel or a non-linear radial-basis-function (RBF) kernel to train the model, using a regularization factor of C=100. For the non-linear RBF kernel we set gamma to the default 1 / (num_features var(X)) as implemented in the sklearn framework. The SVM model was trained on the same wavelet coefficients as the convolutional neural network*

      b) Relatedly, because the reason for the increase in decoding accuracy is not clear, I don't think you can make the claim that "The high accuracy and efficiency of the model suggest that our model utilizes additional information contained in the LFP as well as from sub-threshold spikes and those that were not successfully clustered." (line 122). Based on the shown evidence, it seems to me that all of the benefits vs. the Bayesian decoder could just be due to the nonlinearities of the convolutional neural network.

      Thanks for raising this interesting point regarding the linear vs. non-linear information contained in the neural data. Indeed, when training the model with a linear activation function for the convolutions and fully connected layers, model performance drops significantly. To quantify this we ran the model with three different configurations regarding its activation functions. We (1) used nonlinear activation functions only in the convolutional layers (2) or the fully connected layers or (3) only used linear activation functions throughout the whole model. As expected the model with only linear activation functions performed the worst (linear activation functions 61.61cm ± 33.85cm, non-linear convolutional layers 22.99cm ± 18.67cm, non-linear fully connected layers 47.03cm ± 29.61cm, all layers non-linear 18.89cm ± 4.66cm). For comparison the Bayesian decoder achieves a decoding accuracy of 23.25cm ± 2.79cm on this data.

      Thus it appears that the reviewer is correct - the advantage of the CNN model comes in part from the non-linearity of the convolutional layers. The corollary of this is that there are likely non-linear elements in the neural data that the CNN but not Bayes decoder can access. However, the CNN does also receive wider-band inputs and thus has the potential to utilize information beyond just detected spikes.

      In response to the reviewers point and to the new analysis regarding the LFP models raised by reviewer 1, we have now reworded this sentence in the manuscript.

      Page 4: The high accuracy and efficiency of the model for these harder samples suggest that the CNN utilizes additional information from sub-threshold spikes and those that were not successfully clustered, as well as nonlinear information which is not available to the Bayesian decoder.

    1. Author Response:

      Reviewer #1 (Public Review):

      This manuscript presents new data and a model that extend our understanding of color vision. The data are measurements of activity in human primary visual cortex in response to modulations of activity in the L- and M-cone photoreceptors. The model describes the data with impressive parsimony. This elegant simplification of a complex data set reveals a useful organizing principle of color processing in the visual cortex, and it is an important step towards construction of a model that predicts activity in the visual cortex to more complex visual patterns.

      Strengths of the study include the innovative stimulus generation technique (which avoided technical artifacts that would have otherwise complicated data interpretation), the rigor of experimental design, the clear and even-handed data presentation, and the success of the QCM.

      The study could be improved by a more thorough vetting of the QCM and additional discussion on the biological substrate of the activation patterns.

      We thank the reviewer for the thoughtful summary of our work, for highlighting the strengths of our methodology and analysis, and for noting that our study will make a worthy contribution to understanding the organizing principles of visual cortex.

      Reviewer #2 (Public Review):

      The goal of this work is to advance knowledge of the neural bases of color perception. Color vision has been a model system for understanding how what we see arises from the coordinated action of neurons; detailed behavioral measurements revealed color vision's dependence upon three types of photoreceptors (trichromacy) and three second stage retinal circuits that compute sums and differences of the cone signals (color opponency). The processing of color at later, cortical stages has remained poorly understood however, and studies of human cortex have been hampered by methodologies that abandoned the detailed approach. Typical past work simply compared neural responses in two conditions, the presentation of colorful (formally, chromatic) vs grayscale (luminance) images. The present work returns to the older tradition that proved so successful.

      The project's specific goals were to measure functional MRI responses in human cortex to a large range of colors, and equally importantly, capture the pattern responses with a quantitative model that can be used to predict response to many additional colors with just a few parameters. The reported work achieved these goals, establishing both a comprehensive data set and a modeling framework that together will provide a strong basis for future investigations. I would not hesitate to query the data further or to use the QCM model the paper provides to characterize other data sets.

      The strengths of the work include its methodological rigor, which gives high confidence that the goals were achieved. Specifically:

      1) The visual presentation equipment was uniquely sophisticated, allowing it to correct for possible confounds due to differences in photoreceptor responses across the retina.

      2) The testing of the model was quite rigorous, aided by distinct replications of the experiment planned prior to data collection.

      3) The fMRI methods were also state of the art.

      The work was well-situated within the literature, comparing its findings to past results. The limitations and assumptions of the present work were also clearly stated, and conclusions were not overstated.

      Weaknesses of the current draft are relatively minor, however, I believe:

      1) The data could be presented in a way to make them more comparable to prior fMRI work, e.g. by using percent change units in more places, comparing the R^2 of model fits reported here to those reported in other papers, and explaining and exploring how the spatially uniform stimuli, used here but not in other fMRI studies, limited responses in visual areas beyond V1.

      2) Comparison between the two models, the GLM and QCM is not quite complete.

      3) The present results are not discussed in context with past results using EEG, and Brouwer and Heeger's model of fMRI responses to color.

      4) Implications of the basic pattern of response for the cortical neurons producing the data are discussed less than they could be.

      We thank the reviewer for this clear summary of the paper, calling to attention our detailed approach to studying cortical color processing, and enthusiasm regarding the impact of our data and computational modeling.

      Reviewer #3 (Public Review):

      The authors describe a method for fitting a simple, separable function of contrast and cone excitation to a set of fMRI data generated from large, unstructured chromatic flicker stimuli that drive the L- and M- cone photoreceptors across a range of amplitudes and ratios. The function is of the form of a scaled ellipse – hereafter referred to as a 'Quadratic Color Model' (QCM). The QCM fits 6 parameters (ellipse orientation, ellipse elongation, and 4 parameters from a non-linear, saturating (Naka-Rushton) contrast response curve. The QCM fits the dataset well and the authors compare it (favorably) to a 40-parameter GLM that fits each separate combination of chromatic direction and contrast separately.

      The authors note three things that 'did not have to be true' (and which are therefore interesting):

      1) The data are well-fit by a separable ellipse+contrast transducer - consistent with the idea that the underlying neuronal computations that process these stimuli combine relatively independent L-M and L+M contrast.

      2) The short axis of the QCM tends to align with the L-M cone contrast directing (indicating that this direction is one of maximum sensitivity and the L+M direction (long axis) is least sensitive. This finding is qualitatively consistent with psychophysical measurements of chromatic sensitivity.

      3) Fit parameters do not change much across the cortical surface – and in particular they are relatively constant with respect to eccentricity.

      This is a technically solid paper – the data processing pipeline is meticulous, stimuli are tightly-calibrated (the ability to apply cone-isolating stimuli to fovea and periphery simultaneously is an impressive application of the 56-primary stimulus generator) and the authors have been careful to measure their stimuli before and after each experimental session. I have a few technical questions but I am completely satisfied that the authors are measuring what they think they are measuring.

      The analysis, similarly, is exemplary in many ways. Robust fitting procedures are used and model performance and generalizablility are evaluated with a leave-run-out and leave-session-out cross validation procedures. Bootstrapped confidence intervals are generated for all fits and analysis code is available online.

      The paper is also useful: it summarises a lot of (similar) previous findings in the fMRI color literature going back to the late 90s and points out that they can, in general, be represented with far fewer parameters than conditions. My main concerns are:

      1) Underlying mechanisms: The QCM is a convenient parameterization of low spatial-frequency, high temporal-frequency L-M responses. It will be a useful tool for future color vision researchers but I do not feel that I am learning very much that is new about human color vision. The choice to fit an ellipse to these data must have been motivated at least in part by inspection. It works in this case (possibly because of the particular combination of spatial and temporal frequencies that are probed) but it is not clear that this is a generic parametric model of human color responses in V1. Even very early fMRI data from stimuli with non-zero spatial frequency (for example, Engel, Zhang and Wandell '97) show response envelopes that are ellipse-like but which might well also have additional 'orthogonal' lobes or other oddities at some temporal frequencies.

      2) Model comparison: The 40-parameter GLM model provides a 'best possible' linear fit and gives a sense of the noisiness of the data but it feels a little like a strawman. It is possible to reduce the dimensionality of the fit significantly with the QCM but was it ever really plausible that the visual system would generate separate, independent responses for each combination of color direction and contrast? I suspect that given the fact that the response data are not saturating, it would be possible to replace the Naka-Rushton part of the model with a simple power function, reducing the parameter space even further. It would be more interesting to use the data to compare actual models of color processing in retina/V1 and, potentially, beyond V1.

      3) Link to perception. As the authors note, there is a rich history of psychophysics in this domain. The stimuli they choose are also, I think, well suited to modelling in the sense that they are likely to drive a very limited class of chromatic cells in V1 (those with almost no spatial frequency tuning). It is a shame therefore that no corresponding psychophysical data are presented to link physiology to perception. The issue is particularly acute because the stimulus differs from those typically used in more recent psychophysical experiments: it flickers relatively quickly and it has no spatial structure. It may, however, be more similar to the types of stimuli used prior to the advent of color CRTs : Maxwellian view systems that presented a single spot of light.

      We thank the reviewer for their detailed comments on our paper and for highlighting our careful methodological approach and modeling of the data. We address the specific points.

    1. Author Response:

      Evaluation Summary:

      This paper compares the properties of UV cone output synapses in different regions of the zebrafish retina using a combination of electron microscopy, quantitative imaging and computational modeling. They relate these differences to ultrastructural differences in synaptic ribbons and evaluate them using a previously-developed biophysical model for the operation of the synapse. The finding of regional differences in ribbon behavior is novel and suggests an under-appreciated degree of control of release by ribbon structure and behavior. The presentation of some of the results, particularly the model, could be strengthened.

      We thank the reviewers for their valuable inputs. In response, we have substantially extended and restructured the description of preprocessing steps and modelling to aid clarity. Moreover, we include new analysis of “old” GCaMP6f data to show the similarity of calcium dynamics across retinal regions. Additionally, we worked on the description of the simulation-based inference method and provided more intuitive explanations. Finally, we updated the discussion of the model results. We hope to have addressed the helpful critique of the reviewers and strengthened our conclusions and the whole manuscript.

      Reviewer #1 (Public Review):

      Preprocessing of glutamate traces. The bulk of the analysis in the paper uses "scaled and denoised" traces. It is important to verify that this process did not either introduce or obscure any differences across regions. This should include some validation of the assumptions that go into the scaling process (such as whether a sufficiently low calcium level is achieved to use that as a standard). An example of a how this concern could impact the conclusions is that the AZ glutamate traces look less rectified than the others, perhaps due to an elevated baseline, as suggested in the text. But the conclusion about the elevated baseline relies on the scaling process creating a proper alignment such that it is accurate to superimpose the traces as in Figure 3a.

      Thank you for giving us the opportunity to clarify this point. AZ UV-cones indeed have an elevated baseline, as explicitly shown in our previous publication (Yoshimatsu et al. 2020 Neuron). The scaling process recapitulates this baseline shift, as expected. In this previous work we also show how the lower rectification of AZ cones is directly linked to this baseline shift, and it includes experiments specifically designed to find the “true” minimum calcium levels achievable in UV-cones in different parts of the eye, as suggested by the reviewer.

      However, we fully agree that the scaling/denoising process could be described more clearly, and we expanded the explanation in the method section and added a figure (Fig. S3) to visualize all steps explicitly.

      Model fitting. Some key aspects of the model fitting were difficult to evaluate and follow. For example, is the loss function the same as the discrepancy defined in the methods (I assumed that is the case - if not the loss function needs to be defined)? The definition of the discrepancy could be clearer (e.g. be careful about using x here and as the offset of the calcium trace). Related, the results would benefit from a more intuitive description of the fitting, rather than just a reference to the methods (which is a bit dense to go through for that intuitive-level explanation of the model development).

      We added an overview of the simulation-based inference method to the main section of the manuscript. Additionally, we updated the definition of the loss function and tried to give more intuitive explanations. We hope that these changes will help the reader to better understand the computational methods used.

      Some statements seem too strong given the state of current knowledge. E.g. lines 79-80 I think goes too far about the functional role of the ribbon. Similarly lines 97-98 are quite explicit about the connection to prey capture. Lines 276-279 are a particularly important example; I would argue that the statement there requires showing uniqueness of the model.

      We agree that the mentioned statements were perhaps quite strong and we have toned them down in the revised manuscript.

      Could fixation of the retina for EM change the distribution of vesicles in different compartments? I realize this may not be answerable, but a caution about that possibility might be warranted.

      We are not aware of such an effect in previous works. As the reviewer notes it may not be answerable. However, in a way we have an “internal control” for such a possibility, since the different eye regions were treated equally for fixation, yet vesicle distributions differ across eye regions. It seems unlikely that the fixation would have disproportionately distorted vesicle distributions in one eye region without also affecting the others. This is now noted when first discussing the EM approach.

      Line 159: it is not clear how similar the calcium signals are. Specifically, could differences in calcium signal get amplified when passed through simple nonlinearity (e.g. due to the calcium dependence of transmitter release) to account for the differences in glutamate output? Maybe rewording here to leave open that possibility unless you have reason to reject it.

      We agree that this statement was perhaps too strong at this point of the manuscript. We softened it and included a detailed analysis of additional calcium data later to investigate the regional differences of the calcium signal (Fig. 3k-n)

      Can you quantify the fits in Figure 4f,g? For example, can you give a probability of a particular experimental trace or summary parameters for that experimental trace given the parameter probability distributions from the same area and from a different area?

      A quantification of the fits is shown in Fig. S4b,c (previously S3b,c). As we perform “likelihood-free inference”, we cannot give probabilities for the model traces, but we show two different loss functions for the model fits as well as for the linear model: the relevant loss, on which the models are optimized (which is based on the summary statistics) and for comparison the MSE to the experimental traces. We apologize if this was not clearly mentioned in the manuscript. We added it more prominently in the revised version.

      Reviewer #2 (Public Review):

      This study images synaptic calcium and glutamate release from larval zebrafish UV-sensitive cones in vivo. They also study the ultrastructure of ribbon synapses from UV cones in different regions of the retina. They find differences in ribbon dimension and light-evoked glutamate release from cones in different regions of the retina. Cones from dorsal retina show a more pronounced transient component of glutamate release than those from nasal retina. Those in the acute zone in the center of the retina showed intermediate kinetics. Ultrastructural reconstructions of UV-sensitive cones from those regions showed fewer and small ribbons in dorsal cones vs. those in the nasal region or acute zone zone. Light-evoked changes in the kinetics of synaptic calcium were not significantly different suggesting that differences in release kinetics may be related to differences in ribbon behavior in cones from different regions. To relate these different measurements to one another, the authors modified an existing model of cone release to incorporate a simulation-based Bayesian inference approach for estimating best-fit parameters. The model suggested that the differences in glutamate release kinetics could be explained by differences in the rates of transfer between vesicle pools on and off the ribbon. By fixing different parameters, the authors then used the model to explore the parameter space and general properties of ribbon tuning. They also provide a link to the model for others to use.

      The main new experimental finding is that glutamate release properties differ among cones in different regions. The finding that kinetics of glutamate release and ribbon ultrastructure vary systematically in different regions of the retina is interesting. They relate these data using a model of ribbon release. While the model is not novel in its general design, the incorporation of Bayesian inference is new. The most interesting finding from the model is that the kinetic differences in release between cones are not due to calcium kinetics but arise primarily from differences in transitions between vesicle pools. Nevertheless, using the model, the authors show that calcium levels and kinetics matter, since if they hold other parameters fixed, calcium levels and kinetics are the most important factors in shaping response detectability and response kinetics. This is consistent with a lot of earlier work that calcium kinetics are important for shaping response kinetics at ribbon synapses.

      1) The measured changes in glutamate and calcium are small and noisy and there is considerable overlap in the data from cones in different regions. While the example waveforms show considerable differences, the scatter in the data is less persuasive. If I understand correctly, the imaging data comes from 30 AZ, 16 dorsal, and 9 nasal UV cones. With such noisy data, 9 cones seems like particularly small sample. With imaging data, it should be possible to record from dozens or hundreds of cells and a larger sample would strengthen the conclusions.

      We agree that the sample size is quite small, however the dual color experiments are technically extremely challenging. This is part-related to the laser wavelength compromise that needs to be reached for concurrent excitation of red and green fluorescent probes, and the fact that red probes generally give comparatively poor SNR. Notably, to our knowledge concurrent 2P imaging of presynaptic calcium and consequent glutamate release in an in vivo scenario is quite novel, and still very much on the edge of experimental possibilities.

      The green glutamate recordings based on iGluSnFR which are particularly central to our work do have a reasonably high SNR, rather the “problem” is more obviously linked to the calcium recordings. For a better understanding of the calcium handling, we therefore now reanalysed an “old” dataset from Yoshimatsu et al., 2020, Neuron (see Fig. 3k-n) that was recorded with SyGCaMP6f, which provides much higher SNR (and is a little faster albeit also more nonlinear). Notably, the SyGCaMP6f calcium dynamics were also analysed in some detail in Yoshimatsu et al., 2020, Neuron, and we built on these conclusions.

      We hope that the analysis of the additional calcium dataset which is now included in the manuscript adds to more persuasive conclusions.

      2) Calcium and iGluSnfr measurements are both single wavelength measurements and thus sensitive to differences in expression of the indicator. In Fig. 3, the authors show that dorsal cones exhibit larger calcium responses than nasal cones (3c) and that AZ cones show larger glutamate responses than nasal cones (3d). Please address the potential impact of differences in expression on these measurements.

      Thank you for this comment. In Yoshimatsu et. al, 2020, Neuron we compared “live 2p” and “fixed confocal” data of the same sample to show that biosensor expression in UV-cones was uniform across regions, and that the different brightness levels were rather a result of variations in calcium levels. We extrapolated this knowledge to the used biosensors in the new experiments. We now note this explicitly in the revised manuscript.

      3) Please describe controls performed to assess the potential for spectral overlap between the red and green channels. Is there any bleed-through of one dye into the other channel?

      The expression profile of the two indicators is very different, the red fluorescence signal appears in cones, the green in HCs. We illustrated this separation in an additional figure (Fig. S2a,b) which shows that there was no obvious spectral mixing of the two fluorescence channels. We clarified this now in the revised manuscript.

      4) I am not a modeler and while I understand the general approach used for the model, I am not competent to critique specific details of the implementation, particularly the Bayesian inference. However, the fact that the linear statistical model seems to perform just as well as the more ornate model is comforting since it says that the Bayesian inference approach didn't lead the model into an unrealistic parameter space. However, while to my eye the linear model appears to perform just as well as the fancier model, the text says otherwise (Figure 4, lines 270-273). Please clarify.

      Indeed, the linear model captures the general shape of the glutamate response. However, it fails to recover adaptational processes, more precisely the transient components and adaptation over several steps. The model performances are quantified in Fig. S4 (previously S3), and especially with respect to the relevant loss, which is measuring the relevant features, the biophysical model outperforms the linear model. We expanded the discussion on these points in the manuscript and made a more prominent reference to the quantification figure.

      5) Adding a diagram to show where the different regions (dorsal, nasal, acute zone) are located in the eye would be helpful. Is there a difference in the number or size of UV cones from different regions of the retina in larval zebrafish?

      A diagram has been added to Figure 1 as requested. Regarding UV-cone numbers, indeed they do vary across the eye to specifically peak in the acute zone, and to a lesser extent also nasally. This relationship was explored in some detail in

      Zimmermann et al. 2018 Curr Biol, and also touched upon in Yoshimatsu 2020 Neuron. This known density difference is now noted in the introduction.

      6) Are differences in ribbon morphology, glutamate responses or calcium changes retained in adult zebrafish retina? While it may not be feasible to perform similar experiments in adult, some discussion of possible differences and similarities with adult retina would be helpful for putting the results in a more general context.

      The reviewer raises an interesting point. Adult zebrafish display a much broader array of visual behaviours than larvae, and moreover have a rather different diet (meaning that the UV-dependence of prey capture - see Yoshimatsu et al., 2020 Neuron - may be different). Unfortunately, the visual ecology of adult zebrafish remains poorly explored so at this point we can only speculate. Notably, unlike larvae, adults also feature a crystalline mosaic of all cones, meaning that at least numerical anisotropies in cones as they occur in larvae (Zimmermann et al. 2018) are not expected. However, this does not preclude the possibility that UV-cones have different properties across the retina, perhaps it would be the most straightforward way to regionally tune outer retinal outputs in adults. Accordingly, we fully agree that this topic would be exciting to explore, however it would go beyond what could be achieved within a reasonable revision cycle.

      We now added a summarising note of the above into the discussion section.

      Reviewer #3 (Public Review):

      The strengths of the manuscript: It contains a thorough characterization of the anatomical and physiological differences of UV cone ribbons at different locations using the state-of-art techniques including Serial-blockface scanning EM reconstruction and dual-color, simultaneous calcium and glutamate imaging. The Bayesian simulation-based inference model captured the key features of the calcium responses and glutamate release dynamics and provided distributions for each biophysical parameters, which gave insights of their interactions and their impacts on ribbon function. The online tool for ribbon synapse modeling is quite useful. Overall, it is a great effort to understand the function of ribbon synapse with a suitable system that allows multi-facet data collection and a new modeling approach.

      The weaknesses of the manuscript: 1) Overall the writing/formatting of the manuscript can be much improved - there are many imprecise, hard to understand descriptions in the manuscript; figure legends/descriptions are often inadequate for easy understanding; inconsistencies between description in the main text and methods; and above all, the descriptions of model itself and the results from the model are not communicated in a way that facilitates the understanding of process and implications. In contrast, the previous papers from the same group employing similar modeling approaches are much better explained. 2) Based on the intuitions from the modeling, there has not been a strong connection established between the anatomical data and the functional data to which the model is built to fit. More clearly identifying the consistencies and discrepancies between the data and the model will help the readers to understand the pros and cons of the model and the limitations of the generalizations from the model.

      Specific questions and recommendations for the authors:

      1) It will be helpful to have a retina diagram indicating the locations of three different regions.

      The requested diagram has been added to Figure 1.

      2) Fig 1d,e,f (and other figure panels in general) there is no need to mark n.s. On the other hand, in the Statistical Analysis section, GAMs models are mentioned only for Fig 1g, but not other results - needs a clarification.

      We find the “n.s.” labels useful, in part because in some panels none of the differences were significant and the label makes this quite explicit. Accordingly, we have opted to retain them. GAMs were indeed only used for Figure 1g - this is motivated by the difference in data structure of this panel compared to others (i.e. a comparison between continuous rather than discrete distributions). We now clarified this in the methods and added a short paragraph on the used testing procedure.

      3) Fig 1h is quite confusing, with a mixture of 3D and 2D plot, schematic drawing and statistical marks. What comparisons are these marks for? The legend is not specific and the Suppl Fig S1 doesn't clarify much.

      The asterisks are meant to indicate a statistically significant difference in the indicated property (e.g. ribbon size/number) relative to the acute zone. We apologise for not making this clear in the previous version, it is now directly noted in the panel. Regarding the 2D/3D representation, we agree that it may be a little confusing, but we cannot think of a “better” way of summarising all properties analysed by EM in a single panel, so we opted to keep it. We did however expand on the related explanation in the legend to further clarify what is shown.

      4) It will be good to discuss the properties of the calcium sensor. Deconvolution of the calcium signal (lines 617-619) notwithstanding, presumably, the sensor has neither the temporal nor spatial resolution to catch the nano-domain calcium peak near the vesicles in RRP, which is critical for the release of RRP.

      This point seems to link to the ongoing debate on to what extent release from ribbons is driven by micro- and/or nano-domain calcium signalling. It is our understanding that this debate remains unresolved in a truly general sense. Rather, it seems to be non- mutually exclusive (i.e. both micro and nano-domain signals working together), and moreover quite specific to each ribbon synapse in question. In larval zebrafish cones, the pedicle has a rather small cytoplasmic volume, there is only one invagination from postsynaptic processes, and all ribbons inside the cone are opposed to this single invagination. Accordingly, on a possible “sliding scale” of micro- vs nano-domain dominance, we think it is likely that in larval zebrafish cones microdomains will have a notable impact on release. While we are not aware of any data directly looking at this question in zebrafish larval UV-cones, there is good data available from systems that are perhaps quite similar, such as mammalian rods (which also have a single invagination site). For example, from Thoreson et al., 2004, Neuron, Figure 3.

      Already at low micromolar concentrations of calcium that are readily achieved at the level of bulk calcium in the terminal (e.g. 1-2 microM), release is driven to a substantial degree.

      However, we fully agree that we cannot detect possible nano-domain calcium signalling with our imaging method (in fact we are unsure that with currently available technology it is technically possible in an in-vivo preparation). We therefore now further emphasise the possibility of nanodomains acting on release in the discussion.

      Notably, we do already allow exploring the possible influence of nanodomain-type calcium kinetics in the online model, and we think this usefully adds to our exploration of links between calcium signalling and glutamate release.

      5) Likewise, the kinetics of iGluSnFR and of glutamate concentration in the cleft. Admittedly, figs 2a, 3c etc. show that the glutamate signal drops rapidly following the transition from dark to light, however, the rates of vesicle pool replenishment are a topic in the field-some discussion of how glutamate clearance from the cleft and the kinetics of the sensor will influence your estimates of replenishment rates would help future readers better interpret your findings in the context of their own observations.

      We agree that there are technical limitations as to what the iGluSnFR signal can tell us about the exact dynamics of glutamate in an unperturbed situation. Likely this will never be fully addressable. Rather, we use the iGluSnFR signals in a comparative fashion across eye regions, where presumably any distortion of the signals as alluded to by the reviewer would be approximately equal. Following the reviewer’s suggestion, we now explain this more directly in the main text.

      6) In Fig 2d, the rising phase kinetics of the Glu for that nasal cone is strikingly different from that of the acute zone cone. However, such difference is not seen in Fig 3. Therefore, the one in Fig 2d may not be a good representation?

      Thanks, we agree. We have replaced the nasal example with a more representative trace.

      7) In Fig 3a, c.u. and v.u. (only defined in Fig 4 in the context of the model) were used here but not S.D. as in Fig 2, any explanation?

      After scaling, SD adopts arbitrary units. For consistency with the model later we decided to use c.u. and v.u. Here (i.e. “calcium units”, and “vesicle units”). We agree that this could be explained better, and have now rephrased as follows: “We show the rescaled traces in c.u. (calcium units) and v.u. (vesicle units) respectively, to be consistent with the used units in the model later.”

      8) Lines 186-188, how were traces "normalized with respect to the UV-bright stimulus periods"?

      The traces were rescaled such that the UV-bright stimulus periods had a mean of zero and a standard deviation of one. We included this missing piece of information and expanded additionally the explanation of the pre-processing.

      9) Lines 194-195, "In addition, the glutamate release baseline of AZ UV-cones was increased during 50% contrast at the start of the stimulus" - it is unclear whether higher glutamate baseline occurred during the adaptation step (i.e. it increased during that period) or said increase was the level during adaptation compared to that during bright periods?

      Thank you, we meant the former (i.e. glutamate release “is” higher during the adaptation step). This is now clarified in the text.

      10) Lines 219-220, "a sigmoidal non-linearity with slope k and offset x0 which drives the final release" - this sentence is not clear, needs to clarify that it is referring to the relationship between calcium and release.

      Thanks, this is now clarified in the manuscript.

      11) Lines 230-232, "x0 can be understood as the inverted calcium baseline (see Methods)" - Methods don't cover this point, though it is described in the f(Ca) equation, but it isn't obvious how x0 should be the inverted baseline, as if Ca=x0, f(Ca) = 0.5 (i.e., the point of half-release probability). Please clarify this. In general, there are places where explanations of model found in methods don't match those described in the main text (also see some of the points below). Please go over carefully to ensure consistency.

      x0 can be seen as an inverted baseline as it shifts the whole linearity to a different operating point: the smaller x0 the less additional calcium is needed to trigger vesicle release. If we assume a fixed calcium affinity this implies an increased baseline level. We apologise for having omitted these explanations in the initial manuscript, we have expanded the explanation in the Methods of the revised manuscript.

      12) Fig 4e suggests a 5-10 times difference in RRP size between acute zone and nasal UV cones, which is not in line with the anatomical data (Fig 1h). Some discussions and clarifications will be helpful. As we note in the manuscript, it is difficult to quantitatively link anatomical structures to functional data. However, the small RRP size in the nasal zone inferred by the model (Fig. 4e) matches very well to the low vesicle densities at a small distance from the ribbon in the nasal zone in Fig. 1. Our model thus picks up the right trends for an anatomical structure from pure functional recordings, which is in our opinion already remarkable given the experimental noise and fine-grained differences. We commented on this point in the revised manuscript.

      13) From Fig 4h, and Fig S3b,c, the linear model doesn't look too bad (unless I misunderstand the figure panels, which are not explained in great detail). The explanation in lines 272-274 needs some work to make it clearer.

      Compared to the “best model”, the linear model clearly lacks in accuracy, perhaps most intuitively visible when looking at adaptation kinetics. This is especially the case for the relevant loss, which is based on the summary statistics. We extended the mentioned lines and hope to clarify it now in the manuscript.

      14) Sobol indices and their explanation are lacking. Are they computed using Ca2+ and glutamate signals, or just glutamate? It is hard to parse their relative "contributions" to model behavior as described in the text, when the methods caution against interpreting this analysis as determining the "importance" of parameters (lines 805-806).

      The first order Sobol indices measure the direct effect of each parameter on the variance of the model output. More specifically, it tells us the expected reduction in relative variance of the output if we fix one parameter. For the computation, broadly speaking, many parameters were drawn from the posterior distribution and the model was evaluated on these parameters. Afterwards the reduction in variance of the model evaluations was computed if one dimension of the parameter space was fixed. We agree that they are non-intuitive to interpret for a single time point, however its temporal changes give us insight into the time dependent influence on the model output. Often Sobol indices are computed by drawing random samples from a uniform distribution on a high dimensional cuboid [r1,s1] x … x [rn,sn] where each interval [ri,si] is simply defined by the mean+-10% of the parameter fit, where the definition of 10% leaves much room for interpretation and could not be meaningful in the same way for all parameters. We believe that the inferred posterior distributions are a much better suited probability distributions as they encode all parameter combinations which agree with the experimental data.

      We expanded our explanation on this point in the manuscript.

      15) The sensitivity analysis suggests that vesicle transitions are more important than pool sizes or their calcium dependence. Thus, it appears that one intuition from the model is that ribbon size - the main anatomical difference of the UV cone ribbons from different regions - is not very important for the functional difference observed (also see discussion in lines 438-439). Although, it has been discussed that ribbon size does not necessarily correlate with IP or RRP size, but this appears to be the hallmark of the acute zone.

      As the reviewer notes, one potentially interesting hint from our work is that ribbon size does not necessarily translate 1:1 to vesicle pool sizes, or their relative transition rates. One particularly clear example of this might come from comparing Figs. 1d-f and Fig. 1h, between nasal and acute zone. Both have similar ribbon geometry (Fig. 1d-f), but nasal ribbons nevertheless appear to pack fewer vesicles (Fig. 1h). Linking with our functional data and modelling, it then appears that perhaps on top of that, vesicles simply move at different rates between the pools, a property that is impossible to pick up from a static EM reconstruction.

      More generally, as mentioned in the manuscript and discussed in the previous point, it is difficult to judge the overall importance of a parameter from the sensitivity analysis. However, we clearly see time dependent effects of the different parameters and especially the RRP size matters for the transient component, which can be seen in Fig. 5. Indeed, the pattern for IP size seems to be different and it may be that case that the used stimulus is not optimal to infer this parameter from functional recordings.

      How the ribbon size relates to different vesicle densities and how these densities could potentially influence the changing is however still an open question and cannot be answered in the scope of this manuscript.

      16) Lines 460-461, intuitively, a slower RRP refill rate will result in more transient response - after the depletion of RRP, less refilled vesicles to give the sustained component of the response. This is the opposite of what model predicted (a faster RRP). Some explanation and discussion will be helpful.

      The RRP refill rate indeed influences the transience in the mentioned way. However, its influence already starts earlier and is also influencing the overall amplitude (if some minimal background activation is assumed). It is therefore especially influencing the sustained component. However, for the nasal model already the inferred RRP size is the smallest and it seems that a small RRP refill rate is sufficient to produce the sustained response behaviour which we see in Fig. 4f. We thank the reviewer for this thoughtful comment and mentioned this behaviour in the discussion.

      17) Also, the model simplifies vesicle transition rates by removing their calcium dependence. The Methods section indicates that this choice resulted from early fitting results that essentially "dialed out" the calcium dependence. Given the relative freedom that the model seems to have in finding suitable solutions, how is the lack of calcium dependence justified, and what potential impact might it have on the modeling results?

      Identifying model (mis-)specification is a non-trivial task in general. The presented model is complex enough to replicate the recorded data but can easily be extended to more complex dynamics (e.g. more complex calcium handling) in future studies, as it is publicly available online. Further added components could even act as “distractors” to compare the other parameters across zones and we thus decided to use an “as simple as possible” model. Interestingly our previous study (Schröder et al., 2019, Approximate bayesian inference for a mechanistic model of vesicle release at a ribbon synapse, NeurIPS.) showed that even at a temporal resolution of single released glutamate vesicles, it was not necessary to include calcium dependency for the refilling of the vesicle pools. This study thus supports our model choice.

      18) Lines 503-508, "In combination with the approximately equal and opposite effects of calcium baseline on the detectability of On- and Off-events (Fig. 7b,f), this suggest(s) that the calcium baseline may present a key variable that enables ribbons to trade-off the transmission of high frequency stimuli against providing an approximately balanced On- and Off- response behaviour." - what will be the physiological relevance for such conditions, perhaps the level of adaptation? Any existing data or predictions?

      The reviewer raises an interesting but ultimately perhaps unanswerable point, given the scarcity of available data on temporal natural image statistics in the UV band across the larval zebrafish visual field. It is of course tempting to speculate that the ecological need to tune kinetics and On/Off preferences might be linked (e.g. detecting a “dark looming predator” might disproportionately benefit from a rapid Off response). However, to truly understand this idea at a useful level of detail would likely be a rather involved study in its own right. Accordingly, we here prefer to simply point at the possibility to “tune” the ribbon using calcium baseline, and what effects this might have on kinetics if all else was kept equal.

      19) I am slightly skeptical of the predictions that the model might make about the ribbon's frequency tuning (Fig. 7) in light of the fact that the AZ model in particular seems unable to reliably capture the fast transient response to dark flashes (Fig. 4c,f).

      The noted effect in the fast transient components in Fig. 4c,f is partially due to the slow calcium recordings which act as an input for the model in Fig. 4. As mentioned, and discussed above, there is an ongoing discussion to what extent nanodomain or more global calcium concentration drives the release. For this reason, we added a simple calcium model for the simulations for Fig. 7 which includes a variable time constant for calcium (nanodomains would presumably have much faster calcium transients than used for the model default). This allows us to explore the influence of different possible calcium handlings. Although this extrapolation to new stimuli is based on the fitted model, it allows for varying all essential parameters. In the online simulation it can be observed that for fast calcium handlings the ribbon is able to also follow higher frequency stimuli. However, we agree that experimentally testing the influence of different ribbon configurations on frequency tuning is an interesting research direction but goes beyond the scope of this manuscript.

    1. Perhaps a tool for thought isn’t so much a tool for collecting answers, as a tool for asking questions? Can a tool offer new ways to uncover the important questions we can’t yet articulate? I think so.

      Better still an Engine Discovery a Serendipity Engine for Questions too.. Not by the machine, but helping to bring to the human mind a constellation of ideas that may point to the adjacent possible questions arising from the 'clues' pointing to 'clues' 'Clue' is what TrailMarks Pages composed of. The primary means of Combination. By constructions 'Clues' can be assigned identities Human readable permanent Identities. The fundamental Means of Abstraction in TrailMarks. In turn Clues contains listicles comprising mixtures of plain text, HTML mashups, and further nested clues, recursively. It is Clues all the way up, ever extending the unending frontier of knowledge. Bringing into perview new things that we did not know about ready to be experienced, brought to awareness, articulated, connected to the existing body of articulation/knowledge creating new qestions as well as answers.

    1. Author Response:

      Reviewer #1 (Public Review):

      In this manuscript, the authors build off their previous data where they have identified differences in the sst1 locus as responsible for differences in susceptibility of B6 and C3HeB/Fej mice to Mycobacterium tuberculosis infection. The authors have previously shown that this susceptibility is attributed to higher levels of type I IFN signaling and in particular, the ISG IL-1Ra. The sst1 locus contains many genes that could be contributing to the differential susceptibility in C3HeB/Fej mice, and the model in the field was that differences in Sp110 expression was a likely candidate to explain the susceptibility. However, in this manuscript, the authors show that it is not lower expression of Sp110, but instead decreased expression of another gene in the sst1 locus, Sp140, that contributes to the increased susceptibility of mice carrying the sst1S sequence to bacterial infections. This is a very significant and surprising finding, supported by very clear and convincing data from experiments performed with a high level of rigor. Although identification of the gene responsible for differences in susceptibility and outcomes during bacterial infections is an advance for the field, the manuscript stops there in terms of new insight and falls short of providing any additional information beyond what has already been published regarding how this gene or lucus is functioning to regulate immune responses to infection. This limited scope embodies the major concern for this otherwise strong manuscript.

      We thank for the reviewer for recognizing the importance of our discovery that loss of Sp140 (not Sp110) confers susceptibility to M. tuberculosis. Our generation of Sp140 deficient mice allows us to demonstrate, for the first time, that Sp140 is a negative regulator of type I IFNs. By generating crosses between Sp140–/– and Ifnar–/– mice, we further demonstrate that type I IFNs mediate the susceptibility of Sp140–/– mice to M. tuberculosis and Legionella. The reviewer appears to believe that because IFNs were previously shown to mediate the phenotype of Sst1S mice that somehow the function of Sp140 was already known. By contrast, we feel that in fact the function of Sp140 was not at all clear prior to our work, and that our work does indeed provide important mechanistic insight into the function of Sp140 as a regulator of type I IFNs. Sst1S mice contain many genetic differences compared to B6 mice. It is only because of our work that we can now go back and reinterpret the prior work on Sst1S mice, but this would not be possible without the work we have reported in this paper. Of course we would love to be able to describe more about the molecular mechanism by which Sp140 represses interferon transcription. This is indeed something we are working on. However, our preliminary experiments indicate this is not likely to be straightforward and will require considerable effort that is certainly beyond the scope of this current paper. It should be noted, for example, that Sp140 is in the same protein family as the well-known transcriptional regulator Aire. The mechanism by which Aire regulates gene expression has been studied for almost two decades and is still not entirely clear (and was certainly not clear in the initial foundational paper on Aire function published by Anderson et al in Science in 2002). We expect the mechanism of Sp140 to be similarly complex. Importantly, we now know for the first time which protein to study mechanistically, i.e., SP140 instead of SP110.

      Reviewer #2 (Public Review):

      The authors have suggested the importance of SP140 for resistance to Mtb, Legionella infections in mice. They also provide evidence for IFNaR signalling in mediating the increased susceptibility of SP140-/- mice. While they attribute an important function of the transcriptional regulator SP140 to regulation of type I IFN responses by demonstrating the dysregulation of these responses in the SP140-/- mice, more direct evidence for this is needed.

      We appreciate the reviewer’s succinct summary of the main conclusions of our manuscript. While we would agree that there is more to learn about the mechanism of SP140 function, it is not entirely clear to us what the reviewer means when they say that more “direct” evidence is needed for our claim that Sp140 regulates the IFN response during bacterial infection. We feel that the genetic experiments we provide are clear on this point. The reviewer may be thinking that we are proposing a specific mechanism, e.g., that our model is that Sp140 regulates IFN production by binding to the IFN beta gene; although that is an appealing possibility, we agree that is not shown in our manuscript, and indeed, we are careful not to make any such claim. Indeed, we explicitly state that a more indirect mechanism is possible (line 390). What is clear, though, is that loss of Sp140 mediates susceptibility to infection via (direct or indirect) increases in type I IFN. We observe increased type I IFN responses in Sp140–/– mice in vivo, and moreover, we find that a cross of Sp140–/– mice to Ifnar–/– mice reverses susceptibility to infection. These results demonstrate that the dysregulation of type 1 IFN in the absence of Sp140 is not merely correlative, but in fact drives susceptibility to bacterial infection in vivo.

      Reviewer #3 (Public Review):

      In this manuscript Ji et al carefully examine candidate genes driving a previously described susceptibility within the severe susceptibility to tuberculosis (sst1). Surprisingly, mice deficient in the original candidate gene within this locus, SP110, showed no change in susceptibility to infection with M. tuberculosis. In contrast, the authors found that loss of a second gene in this locus, SP140, recapitulated many phenotypes seen in the SST1 mouse, including increased Type I IFN. SP140 susceptibility was reversed by blocking these exacerbated type I IFNs, similar to SST1 mice. RNAseq analysis identify changes in pro-inflammatory cytokines and type I IFNs. The strengths of this paper are the careful and controlled experiments to target and analyze mouse mutants within a notoriously challenging region with homopolymers. Their results are robust, convincing and will be of broad interest to the field of immunology and host-pathogen interactions. Convincingly identifying a single gene within this region that recapitulates many aspects of the SST1 mouse is very important. While a minor weakness is the lack of any mechanistic understanding of how SP140 functions, this is overcome by the impact of the other findings and it is anticipated that this mouse will now be a key resource to dissect the mechanisms of susceptibility in much greater detail.

      We thank the reviewer for their generous evaluation. Mechanistically, we do show that Sp140 affects resistance to bacterial infection via regulation of the interferon response, which we think is an important and technically non-trivial advance that provides insight into the function of Sp140. However, we agree that the mechanism for how Sp140 regulates type I IFN is not shown (nor is it claimed to be shown) and addressing this mechanism is now an important and exciting question for future studies.

    1. Author Response:

      Reviewer #2 (Public Review (required)):

      Using high-speed holographic methodology, the swimming trajectories of two Leishmania life cycle stages are measured. Significant differences between the life stages become apparent. In addition, the authors show in a chemotaxis experiment that the infectious metacyclics respond chemotactically to the presence of macrophages.

      The physics part of the study is flawless, and the holography is very impressive, especially in view of the comparatively simple setup. The analysis and presentation of the data is also flawless.

      What is not so clear is the biological interpretation of the data. Chemotactic behavior has been repeatedly postulated for Leishmania, trypanosomes, and other parasites. However, there have been no experiments to date that allow conclusions to be drawn about in vivo relevance. Unfortunately, this does not really change with this study.

      It has been shown in trypanosomes that the swimming behavior of different species and life stages are influenced by the mechanical conditions of their microenvironments. Viscosity, obstacles, and hydrodynamics can all play a critical role in determining motility. These factors are ignored in the study. Cell culture medium with the viscosity of water cannot image the situation in the vector or body fluids such as blood or lymph. A chemotactic gradient such as the one generated here by rather simple means cannot arise at all in vivo, simply because everything is in flux and parasites and macrophages move continuously. Moreover, one may wonder why Leishmania should actively move chemotactically toward macrophages when they come into contact with target cells much more rapidly by chance due to self-stirring properties of body fluids. I am not questioning the finding at all. I am merely questioning its biological relevance. Perhaps it would be better to describe this aspect of the paper more cautiously and to discuss it quite openly critically. Otherwise, the result might enter our knowledge as evidence for biologically relevant chemotaxis, and that would be problematic.

      We thank the reviewer for their perspective and agree that providing formal evidence for chemotaxis in vivo is complicated. The reviewer is right that mechanical stimulus, viscosity, elasticity etc. are present in body tissues, and that they will affect the motion of the flagellum, and that there is evidence that physical obstructions interrupt the flagellar beat (though ‘stirring’ does not play a role in Leishmania’s motion through tissue). At any rate, we contend that an in vitro study such as ours decouples the mechanical heterogeneity of the in vivo environment from the parasite’s cellular response. If a chemotactic response is present in the parasite, then it will be most sensitively and uniquely tested in an isotropic environment such as a bulk Newtonian fluid - indeed, this is what we find. Chemical gradients are known to occur and persist in cutaneous infections, as damage to tissue, sand fly saliva and Leishmania-derived molecules have been shown to recruit immune cells by this mechanism - we have added references and words to this effect on lines 211-214.

      Reviewer #3 (Public Review (required)):

      The authors describe a clever and powerful assay to show chemotactic behavior in metacyclic Leishmania, which is an important result. The data seem mostly solid, but some results are confusing (perhaps partly an issue with presentation?) and overall conclusions seem like they need to be toned down a little. It is expected that this work will have long-lasting impact on the research community, and the new methods developed will be widely utilized.

      Major concerns:

      • "Pre-Adaptation", e.g. lines 149-150: A major message of the work is to suggest that motility behavior and chemotaxis is a "pre-adaptation". However, I don't agree that the current studies show that "…flagellar motility is a …preadaptation to infection of human hosts." What are the data to support this? The authors do a very good job of defining motility features of PCF and META forms, including quantitative analysis of motility features in 3D. They find that motility differs in PCF vs META forms. They also demonstrate chemotaxis in META forms. But, I don't see how these combined results demonstrate a "pre-adaptation" to infection of human hosts. As such, the "pre-adaptation" statement should be moved to speculation. Notably, I did not see tests for chemotaxis in PCF. Thus, it is even not formally demonstrated whether or not chemotaxis itself is an "adaptation" specific to META forma, or rather (and quite likely) is a fundamental property of all life cycle stages.

      o To test if chemotaxis is an 'adaptation', the authors would need to provide an analysis of PCFs. To be an adaptation, one would expect to find either that PCFs do not exhibit chemotaxis, or that they do not chemotax toward macrophages in the assay used. Without this, the authors cannot say whether chemotaxis is a stage-specific behavior, much less a "pre"-adaptation.

      We have moderated the language around claims of ‘pre-adaptation’ (please see next point for locations), and provided additional results from chemotaxis assays in PCF. Consistent with previous studies (e.g. Oliveira et al, Exp. Parasitol. (2000), Leslie et al., Exp. Parasitol. (2002), Barros et al., Exp. Parasitol. (2006)), we find a different chemo/osmotactic response in which PCF cells are drawn towards the agar in the pipette tip even in the absence of an embedded stimulant such as macrophages. We speculate that this result is due to the presence of small carbohydrate molecules from the unrefined agar - and note that the response is distinct to META, which show no such attraction. However, as suggested, this has been made more speculative in the revised discussion.

      o Note, I think the work would not be negatively affected if the whole concept of "adaptation" were omitted and the work was framed around the very important results of developing a new and powerful approach to investigate Leishmania motility in 3D; quantitative definition of motility parameters; demonstration of chemotaxis in META forms.

      We thank the reviewer for their suggestion (and their positive words), and have modified the language around claims of pre-adaptation. We have rephrased the claims in the abstract, and around lines 188-90 in the summary/conclusions.

      • Chemotaxis: The work would benefit from some commentary on chemotaxis in kinetoplastids. A 'suggestion' for a potential advantage provided by chemotaxis (lines153-155) is not unwarranted, but that should be kept to speculation at this point, and implication that this is an 'adaptation' is not supported by the current data. With report of chemotaxis being a major message, the paper would benefit from a brief discussion on what's been demonstrated regarding chemotaxis in trypanosomatids, as this is an important, yet under-represented area of research on these organisms. Without this, the novelty and significance of the author's rigorous, novel and very interesting work are not brought out.

      We thank the reviewer for this suggestion, and have added another paragraph to the introduction (lines 53-81), giving additional context to our results by providing an overview of more experiments in the field. We have also changed the word ‘suggest’ to ‘speculate’ in the summary and conclusions (line 243).

      • Lines 125 - 129: How is it that tumble frequency decreases, but run duration is unaffacted? I would think that less frequent tumbles would lead to longer runs? This warrants more comment.

      We thank the reviewer for pointing out the apparent confusion here. This stems from the fact that (as stated in the subsequent sentence) in the majority of the population, the tumble rate is significantly suppressed, to either one or zero tumbles per track. We require at least two tumbles per track to measure run duration, so the small fraction of the population unaffected by the stimulus contributes the bulk of the measurable runs. We have clarified this section of the text to clarify how we measure run duration.

      • Fig 3 and Lines 135-139: How does one reconcile the finding that murine macrophages and human macrophages both induce taxis toward the pipet tip (3A), but there is opposite impact on speed profiles, with murine macrophages causing slower speeds, and human macrophages causing faster speeds (3H,K vs 3I,L)? Perhaps analysis done for human macrophages must also be done for murine macrophages. Some more commentary, and analysis needs to be provided on this point.

      We thank the reviewer for this suggestion, and in the light of their comments, we have revised our description of the murine data, highlighting that the results are not statistically significant. To further emphasise this point to the reader, we have recast the error bars in figure 3a in terms of 95% confidence intervals rather than using the standard error on the mean, as in the previous version. Although one may be calculated directly from the other without any further assumptions, the 95% CI representation might be more familiar to the readership. In this light, the fairly modest decrease in average swimming speed (also seen in absolute terms in the DMEM case) reinforces the revised conclusion that the null hypothesis (META are not stimulated by mm\phi) cannot be rejected.

      • Regarding replicates: While the number of cells tracked are clearly indicated, I did not see a description of how many different chambers were imaged for each condition, or how many different fields per chamber.

      This has been amended in the Methods section, subheading “Chemotaxis Assay”

    1. Author Response:

      Evaluation Summary:

      The authors assessed multivariate relations between a dimensionality-reduced symptom space and brain imaging features, using a large database of individuals with psychosis-spectrum disorders (PSD). Demonstrating both high stability and reproducibility of their approaches, this work showed a promise that diagnosis or treatment of PSD can benefit from a proposed data-driven brain-symptom mapping framework. It is therefore of broad potential interest across cognitive and translational neuroscience.

      We are very grateful for the positive feedback and the careful read of our paper. We would especially like to thank the Reviewers for taking the time to read this lengthy and complex manuscript and for providing their helpful and highly constructive feedback. Overall, we hope the Editor and the Reviewers will find that our responses address all the comments and that the requested changes and edits improved the paper.

      Reviewer 1 (Public Review):

      The paper assessed the relationship between a dimensionality-reduced symptom space and functional brain imaging features based on the large multicentric data of individuals with psychosis-spectrum disorders (PSD).

      The strength of this study is that i) in every analysis, the authors provided high-level evidence of reproducibility in their findings, ii) the study included several control analyses to test other comparable alternatives or independent techniques (e.g., ICA, univariate vs. multivariate), and iii) correlating to independently acquired pharmacological neuroimaging and gene expression maps, the study highlighted neurobiological validity of their results.

      Overall the study has originality and several important tips and guidance for behavior-brain mapping, although the paper contains heavy descriptions about data mining techniques such as several dimensionality reduction algorithms (e.g., PCA, ICA, and CCA) and prediction models.

      We thank the Reviewer for their insightful comments and we appreciate the positive feedback. Regarding the descriptions of methods and analytical techniques, we have removed these descriptions out of the main Results text and figure captions. Detailed descriptions are still provided in the Methods, so that they do not detract from the core message of the paper but can still be referenced if a reader wishes to look up the details of these methods within the context of our analyses.

      Although relatively minors, I also have few points on the weaknesses, including i) an incomplete description about how to tell the PSD effects from the normal spectrum, ii) a lack of overarching interpretation for other principal components rather than only the 3rd one, and iii) somewhat expected results in the stability of PC and relevant indices.

      We are very appreciative of the constructive feedback and feel that these revisions have strengthened our paper. We have addressed these points in the revision as following:

      i) We are grateful to the Reviewer for bringing up this point as it has allowed us to further explore the interesting observation we made regarding shared versus distinct neural variance in our data. It is important to not confuse the neural PCA (i.e. the independent neural features that can be detected in the PSD and healthy control samples) versus the neuro-behavioral mapping. In other words, both PSD patients and healthy controls are human and therefore there are a number of neural functions that both cohorts exhibit that may have nothing to do with the symptom mapping in PSD patients. For instance, basic regulatory functions such as control of cardiac and respiratory cycles, motor functions, vision, etc. We hypothesized therefore that there are more common than distinct neural features that are on average shared across humans irrespective of their psychopathology status. Consequently, there may only be a ‘residual’ symptom-relevant neural variance. Therefore, in the manuscript we bring up the possibility that a substantial proportion of neural variance may not be clinically relevant. If this is in fact true then removing the shared neural variance between PSD and CON should not drastically affect the reported symptom-neural univariate mapping solution, because this common variance does not map to clinical features and therefore is orthogonal statistically. We have now verified this hypothesis quantitatively and have added extensive analyses to highlight this important observation made the the Reviewer. We first conducted a PCA using the parcellated GBC data from all 436 PSD and 202 CON (a matrix with dimensions 638 subjects x 718 parcels). We will refer to this as the GBC-PCA to avoid confusion with the symptom/behavioral PCA described elsewhere in the manuscript. This GBC-PCA resulted in 637 independent GBC-PCs. Since PCs are orthogonal to each other, we then partialled out the variance attributable to GBC-PC1 from the PSD data by reconstructing the PSD GBC matrix using only scores and coefficients from the remaining 636 GBC-PCs (GBˆCwoP C1). We then reran the univariate regression as described in Fig. 3, using the same five symptom PC scores across 436 PSD. The results are shown in Fig. S21 and reproduced below. Removing the first PC of shared neural variance (which accounted for about 15.8% of the total GBC variance across CON and PSD) from PSD data attenuated the statistics slightly (not unexpected as the variance was by definition reduced) but otherwise did not strongly affect the univariate mapping solution.

      We repeated the symptom-neural regression next with the first 2 GBC-PCs partialled out of the PSD data Fig. S22, with the first 3 PCs parsed out Fig. S23, and with the first 4 neural PCs parsed out Fig. S24. The symptom-neural maps remain fairly robust, although the similarity with the original βP CGBC maps does drop as more common neural variance is parsed out. These figures are also shown below:

      Fig. S21. Comparison between the PSD βP CGBC maps computed using GBC and GBC with the first neural PC parsed out. If a substantial proportion of neural variance is not be clinically relevant, then removing the shared neural variance between PSD and CON should not drastically affect the reported symptom-neural univariate mapping solution, because this common variance will not map to clinical features. We therefore performed a PCA on CON and PSD GBC to compute the shared neural variance (see Methods), and then parsed out the first GBC-PC from the PSD GBC data (GBˆCwoP C1). We then reran the univariate regression as described in Fig. 3, using the same five symptom PC scores across 436 PSD. (A) The βP C1GBC map, also shown in Fig. S10. (B) The first GBC-PC accounted for about 15.8% of the total GBC variance across CON and PSD. Removing GBC-PC1 from PSD data attenuated the βP C1GBC statistics slightly (not unexpected as the variance was by definition reduced) but otherwise did not strongly affect the univariate mapping solution. (C) Correlation across 718 parcels between the two βP C1GBC map shown in A and B. (D-O) The same results are shown for βP C2GBC to βP C5GBC maps.

      Fig. S22. Comparison between the PSD βP CGBC maps computed using GBC and GBC with the first two neural PCs parsed out. We performed a PCA on CON and PSD GBC and then parsed out the first three GBC-PC from the PSD GBC data (GBˆCwoP C1−2, see Methods). We then reran the univariate regression as described in Fig. 3, using the same five symptom PC scores across 436 PSD. (A) The βP C1GBC map, also shown in Fig. S10. (B) The second GBC-PC accounted for about 9.5% of the total GBC variance across CON and PSD. (C) Correlation across 718 parcels between the two βP C1GBC map shown in A and B. (D-O) The same results are shown for βP C2GBC to βP C5GBC maps.

      Fig. S23. Comparison between the PSD βP CGBC maps computed using GBC and GBC with the first three neural PCs parsed out. We performed a PCA on CON and PSD GBC and then parsed out the first three GBC-PC from the PSD GBC data (GBˆCwoP C1−3, see Methods). We then reran the univariate regression as described in Fig. 3, using the same five symptom PC scores across 436 PSD. (A) The βP C1GBC map, also shown in Fig. S10. (B) The second GBC-PC accounted for about 9.5% of the total GBC variance across CON and PSD. (C) Correlation across 718 parcels between the two βP C1GBC map shown in A and B. (D-O) The same results are shown for βP C2GBC to βP C5GBC maps.

      Fig. S24. Comparison between the PSD βP CGBC maps computed using GBC and GBC with the first four neural PCs parsed out. We performed a PCA on CON and PSD GBC and then parsed out the first four GBC-PC from the PSD GBC data (GBˆCwoP C1−4, see Methods). We then reran the univariate regression as described in Fig. 3, using the same five symptom PC scores across 436 PSD. (A) The βP C1GBC map, also shown in Fig. S10. (B) The second GBC-PC accounted for about 9.5% of the total GBC variance across CON and PSD. (C) Correlation across 718 parcels between the two βP C1GBC map shown in A and B. (D-O) The same results are shown for βP C2GBC to βP C5GBC maps.

      For comparison, we also computed the βP CGBC maps for control subjects, shown in Fig. S11. In support of the βP CGBC in PSD being circuit-relevant, we observed only mild associations between GBC and PC scores in healthy controls:

      Results: All 5 PCs captured unique patterns of GBC variation across the PSD (Fig. S10), which were not observed in CON (Fig. S11). ... Discussion: On the contrary, this bi-directional “Psychosis Configuration” axis also showed strong negative variation along neural regions that map onto the sensory-motor and associative control regions, also strongly implicated in PSD (1, 2). The “bi-directionality” property of the PC symptom-neural maps may thus be desirable for identifying neural features that support individual patient selection. For instance, it may be possible that PC3 reflects residual untreated psychosis symptoms in this chronic PSD sample, which may reveal key treatment neural targets. In support of this circuit being symptom-relevant, it is notable that we observed a mild association between GBC and PC scores in the CON sample (Fig. S11).

      ii) In our original submission we spotlighted PC3 because of its pattern of loadings on to hallmark symptoms of PSD, including strong positive loadings across Positive symptom items in the PANSS and conversely strong negative loadings on to most Negative items. It was necessary to fully examine this dimension in particular because these are key characteristics of the target psychiatric population, and we found that the focus on PC3 was innovative because it provided an opportunity to quantify a fully data-driven dimension of symptom variation that is highly characteristic of the PSD patient population. Additionally, this bi-directional axis captured shared variance from measures in other traditional symptoms factors, such the PANSS General factor and cognition. This is a powerful demonstration of how data-driven techniques such as PCA can reveal properties intrinsic to the structure of PSD-relevant symptom data which may in turn improve the mapping of symptom-neural relationships. We refrained from explaining each of the five PCs in detail in the main text as we felt that it would further complicate an already dense manuscript. Instead, we opted to provide the interpretation and data from all analyses for all five PCs in the Supplement. However, in response to the Reviewers’ thoughtful feedback that more focus should be placed on other components, we have expanded the presentation and discussion of all five components (both regarding the symptom profiles and neural maps) in the main text:

      Results: Because PC3 loads most strongly on to hallmark symptoms of PSD (including strong positive loadings across PANSS Positive symptom measures in the PANSS and strong negative loadings onto most Negative measures), we focus on this PC as an opportunity to quantify an innovative, fully data-driven dimension of symptom variation that is highly characteristic of the PSD patient population. Additionally, this bi-directional symptom axis captured shared variance from measures in other traditional symptoms factors, such the PANSS General factor and cognition. We found that the PC3 result provided a powerful empirical demonstration of how using a data-driven dimensionality-reduced solution (via PCA) can reveal novel patterns intrinsic to the structure of PSD psychopathology.

      iii) We felt that demonstrating the stability of the PCA solution was extremely important, given that this degree of rigor has not previously been tested using broad behavioral measures across psychosis symptoms and cognition in a cross-diagnostic PSD sample. Additionally, we demonstrated reproducibility of the PCA solution using independent split-half samples. Furthermore, we derived stable neural maps using the PCA solution. In our original submission we show that the CCA solution was not reproducible in our dataset. Following the Reviewers’ feedback, we computed the estimated sample sizes needed to sufficiently power our multivariate analyses for stable/reproducible solutions. using the methods in (3). These results are discussed in detail in our resubmitted manuscript and in our response to the Critiques section below.

      Reviewer 2 (Public Review):

      The work by Ji et al is an interesting and rather comprehensive analysis of the trend of developing data-driven methods for developing brain-symptom dimension biomarkers that bring a biological basis to the symptoms (across PANSS and cognitive features) that relate to psychotic disorders. To this end, the authors performed several interesting multivariate analyses to decompose the symptom/behavioural dimensions and functional connectivity data. To this end, the authors use data from individuals from a transdiagnostic group of individuals recruited by the BSNIP cohort and combine high-level methods in order to integrate both types of modalities. Conceptually there are several strengths to this paper that should be applauded. However, I do think that there are important aspects of this paper that need revision to improve readability and to better compare the methods to what is in the field and provide a balanced view relative to previous work with the same basic concepts that they are building their work around. Overall, I feel as though the work could advance our knowledge in the development of biomarkers or subject level identifiers for psychiatric disorders and potentially be elevated to the level of an individual "subject screener". While this is a noble goal, this will require more data and information in the future as a means to do this. This is certainly an important step forward in this regard.

      We thank the Reviewer for their insightful and constructive comments about our manuscript. We have revised the text to make it easier to read and to clarify our results in the context of prior works in the field. We fully agree that a great deal more work needs to be completed before achieving single-subject level treatment selection, but we hope that our manuscript provides a helpful step towards this goal.

      Strengths:

      • Combined analysis of canonical psychosis symptoms and cognitive deficits across multiple traditional psychosis-related diagnoses offers one of the most comprehensive mappings of impairments experienced within PSD to brain features to date
      • Cross-validation analyses and use of various datasets (diagnostic replication, pharmacological neuroimaging) is extremely impressive, well motivated, and thorough. In addition the authors use a large dataset and provide "out of sample" validity
      • Medication status and dosage also accounted for
      • Similarly, the extensive examination of both univariate and multivariate neuro-behavioural solutions from a methodological viewpoint, including the testing of multiple configurations of CCA (i.e. with different parcellation granularities), offers very strong support for the selected symptom-to-neural mapping
      • The plots of the obtained PC axes compared to those of standard clinical symptom aggregate scales provide a really elegant illustration of the differences and demonstrate clearly the value of data-driven symptom reduction over conventional categories
      • The comparison of the obtained neuro-behavioural map for the "Psychosis configuration" symptom dimension to both pharmacological neuroimaging and neural gene expression maps highlights direct possible links with both underlying disorder mechanisms and possible avenues for treatment development and application
      • The authors' explicit investigation of whether PSD and healthy controls share a major portion of neural variance (possibly present across all people) has strong implications for future brain-behaviour mapping studies, and provides a starting point for narrowing the neural feature space to just the subset of features showing symptom-relevant variance in PSD

      We are very grateful for the positive feedback. We would like to thank the Reviewers for taking the time to read this admittedly dense manuscript and for providing their helpful critique.

      Critiques:

      • Overall I found the paper very hard to read. There are abbreviation everywhere for every concept that is introduced. The paper is methods heavy (which I am not opposed to and quite like). It is clear that the authors took a lot of care in thinking about the methods that were chosen. That said, I think that the organization would benefit from a more traditional Intro, Methods, Results, and Discussion formatting so that it would be easier to parse the Results. The figures are extremely dense and there are often terms that are coined or used that are not or poorly defined.

      We appreciate the constructive feedback around how to remove the dense content and to pay more attention to the frequency of abbreviations, which impact readability. We implemented the strategies suggested by the Reviewer and have moved the Methods section after the Introduction to make the subsequent Results section easier to understand and contextualize. For clarity and length, we have moved methodological details previously in the Results and figure captions to the Methods (e.g. descriptions of dimensionality reduction and prediction techniques). This way, the Methods are now expanded for clarity without detracting from the readability of the core results of the paper. Also, we have also simplified the text in places where there was room for more clarity. For convenience and ease of use of the numerous abbreviations, we have also added a table to the Supplement (Supplementary Table S1).

      • One thing I found conceptually difficult is the explicit comparison to the work in the Xia paper from the Satterthwaite group. Is this a fair comparison? The sample is extremely different as it is non clinical and comes from the general population. Can it be suggested that the groups that are clinically defined here are comparable? Is this an appropriate comparison and standard to make. To suggest that the work in that paper is not reproducible is flawed in this light.

      This is an extremely important point to clarify and we apologize that we did not make it sufficiently clear in the initial submission. Here we are not attempting to replicate the results of Xia et al., which we understand were derived in a fundamentally different sample than ours both demographically and clinically, with testing very different questions. Rather, this paper is just one example out of a number of recent papers which employed multivariate methods (CCA) to tackle the mapping between neural and behavioral features. The key point here is that this approach does not produce reproducible results due to over-fitting, as demonstrated robustly in the present paper. It is very important to highlight that in fact we did not single out any one paper when making this point. In fact, we do not mention the Xia paper explicitly anywhere and we were very careful to cite multiple papers in support of the multivariate over-fitting argument, which is now a well-know issue (4). Nevertheless, the Reviewers make an excellent point here and we acknowledge that while CCA was not reproducible in the present dataset, this does not explicitly imply that the results in the Xia et al. paper (or any other paper for that matter) are not reproducible by definition (i.e. until someone formally attempts to falsify them). We have made this point explicit in the revised paper, as shown below. Furthermore, in line with the provided feedback, we also applied the multivariate power calculator derived by Helmer et al. (3), which quantitatively illustrates the statistical point around CCA instability.

      Results: Several recent studies have reported “latent” neuro-behavioral relationships using multivariate statistics (5–7), which would be preferable because they simultaneously solve for maximal covariation across neural and behavioral features. Though concerns have emerged whether such multivariate results will replicate due to the size of the feature space relative to the size of the clinical samples (4), Given the possibility of deriving a stable multivariate effect, here we tested if results improve with canonical correlation analysis (CCA) (8) which maximizes relationships between linear combinations of symptom (B) and neural features (N) across all PSD (Fig. 5A).

      Discussion: Here we attempted to use multivariate solutions (i.e. CCA) to quantify symptom and neural feature co- variation. In principle, CCA is well-suited to address the brain-behavioral mapping problem. However, symptom-neural mapping using CCA across either parcel-level or network-level solutionsin our sample was not reproducible even when using a low-dimensional symptom solution and parcellated neural data as a starting point. Therefore, while CCA (and related multivariate methods such as partial least squares) are theoretically appropriate and may be helped by regularization methods such as sparse CCA, in practice many available psychiatric neuroimaging datasets may not provide sufficient power to resolve stable multivariate symptom-neural solutions (3). A key pressing need for forthcoming studies will be to use multivariate power calculators to inform sample sizes needed for resolving stable symptom-neural geometries at the single subject level. Of note, though we were unable to derive a stable CCA in the present sample, this does not imply that the multivariate neuro-behavioral effect may not be reproducible with larger effect sizes and/or sample sizes. Critically, this does highlight the importance of power calculations prior to computing multivariate brain-behavioral solutions (3).

      • Why was PCA selected for the analysis rather than ICA? Authors mention that PCA enables the discovery of orthogonal symptom dimensions, but don't elaborate on why this is expected to better capture behavioural variation within PSD compared to non-orthogonal dimensions. Given that symptom and/or cognitive items in conventional assessments are likely to be correlated in one way or another, allowing correlations to be present in the low-rank behavioural solution may better represent the original clinical profiles and drive more accurate brain-behaviour mapping. Moreover, as alluded to in the Discussion, employing an oblique rotation in the identification of dimensionality-reduced symptom axes may have actually resulted in a brain-behaviour space that is more generalizable to other psychiatric spectra. Why not use something more relevant to symptom/behaviour data like a factor analysis?

      This is a very important point! We agree with the Reviewer that an oblique solution may better fit the data. For this reason, we performed an ICA as shown in the Supplement. We chose to show PCA for the main analyses here because it is a deterministic solution and the number of significant components could be computed via permutation testing. Importantly, certain components from the ICA solution in this sample were highly similar to the PCs shown in the main solution (Supplementary Note 1), as measured by comparing the subject behavioral scores (Fig. S4), and neural maps (Fig. S13). However, notably, certain components in the ICA and PCA solutions did not appear to have a one-to-one mapping (e.g. PCs 1-3 and ICs 1-3). The orthogonality of the PCA solution forces the resulting components to capture maximally separated, unique symptom variance, which in turn map robustly on to unique neural circuits. We observed that the data may be distributed in such a way that in the ICA highly correlated independent components emerge, which do not maximally separate the symptom variance associate with neural variance. We demonstrate this by plotting the relationship between parcel beta coefficients for the βP C3GBC map versus the βIC2GBC and βIC3GBC maps. The sigmoidal shape of the distribution indicates an improvement in the Z-statistics for the βP C3GBC map relative to the βIC2GBC and βIC3GBC maps. We have added this language to the main text Results:

      Notably, independent component analysis (ICA), an alternative dimensionality reduction procedure which does not enforce component orthogonality, produced similar effects for this PSD sample, see Supplementary Note 1 & Fig. S4A). Certain pairs of components between the PCA and ICA solutions appear to be highly similar and exclusively mapped (IC5 and PC4; IC4 and PC5) (Fig. S4B). On the other hand, PCs 1-3 and ICs 1-3 do not exhibit a one-to-one mapping. For example, PC3 appears to correlate positively with IC2 and equally strongly negatively with IC3, suggesting that these two ICs are oblique to the PC and perhaps reflect symptom variation that is explained by a single PC. The orthogonality of the PCA solution forces the resulting components to capture maximally separated, unique symptom variance, which in turn map robustly on to unique neural circuits. We observed that the data may be distributed in such a way that in the ICA highly correlated independent components emerge, which do not maximally separate the symptom variance associate with neural variance. We demonstrate this by plotting the relationship between parcel beta coefficients for the βP C3GBC map versus the βIC2GBC and βIC3GBC maps Fig. ??G). The sigmoidal shape of the distribution indicates an improvement in the Z-statistics for the βP C3GBC map relative to the βIC2GBC and βIC3GBC maps.

      Additionally, the Reviewer raises an important point, and we agree that orthogonal versus oblique solutions warrant further investigation especially with regards to other psychiatric spectra and/or other stages in disease progression. For example, oblique components may better capture dimensions of behavioral variation in prodromal individuals, as these individuals are in the early stages of exhibiting psychosis-relevant symptoms and may show early diverging of dimensions of behavioral variation. We elaborate on this further in the Discussion:

      Another important aspect that will require further characterization is the possibility of oblique axes in the symptom-neural geometry. While orthogonal axes derived via PCA were appropriate here and similar to the ICA-derived axes in this solution, it is possible that oblique dimensions more clearly reflect the geometry of other psychiatric spectra and/or other stages in disease progression. For example, oblique components may better capture dimensions of neuro-behavioral variation in a sample of prodromal individuals, as these patients are exhibiting early-stage psychosis-like symptoms and may show signs of diverging along different trajectories.

      Critically, these factors should constitute key extensions of an iteratively more robust model for indi- vidualized symptom-neural mapping across the PSD and other psychiatric spectra. Relatedly, it will be important to identify the ‘limits’ of a given BBS solution – namely a PSD-derived effect may not generalize into the mood spectrum (i.e. both the symptom space and the resulting symptom-neural mapping is orthogonal). It will be important to evaluate if this framework can be used to initialize symptom-neural mapping across other mental health symptom spectra, such as mood/anxiety disorders.

      • The gene expression mapping section lacks some justification for why the 7 genes of interest were specifically chosen from among the numerous serotonin and GABA receptors and interneuron markers (relevant for PSD) available in the AHBA. Brief reference to the believed significance of the chosen genes in psychosis pathology would have helped to contextualize the observed relationship with the neuro-behavioural map.

      We thank the Reviewer for providing this suggestion and agree that it will strengthen the section on gene expression analysis. Of note, we did justify the choice for these genes, but we appreciate the opportunity to expand on the neurobiology of selected genes and their relevance to PSD. We have made these edits to the text:

      We focus here on serotonin receptor subunits (HTR1E, HTR2C, HTR2A), GABA receptor subunits (GABRA1, GABRA5), and the interneuron markers somatostatin (SST) and parvalbumin (PVALB). Serotonin agonists such as LSD have been shown to induce PSD-like symptoms in healthy adults (9) and the serotonin antagonism of “second-generation” antipsychotics are thought to contribute to their efficacy in targeting broad PSD symptoms (10–12). Abnormalities in GABAergic interneurons, which provide inhibitory control in neural circuits, may contribute to cognitive deficits in PSD (13–15) and additionally lead to downstream excitatory dysfunction that underlies other PSD symptoms (16, 17). In particular, a loss of prefrontal parvalbumin-expression fast-spiking interneurons has been implicated in PSD (18–21).

      • What the identified univariate neuro-behavioural mapping for PC3 ("psychosis configuration") actually means from an empirical or brain network perspective is not really ever discussed in detail. E.g., in Results, "a high positive PC3 score was associated with both reduced GBC across insular and superior dorsal cingulate cortices, thalamus, and anterior cerebellum and elevated GBC across precuneus, medial prefrontal, inferior parietal, superior temporal cortices and posterior lateral cerebellum." While the meaning and calculation of GBC can be gleaned from the Methods, a direct interpretation of the neuro-behavioural results in terms of the types of symptoms contributing to PC3 and relative hyper-/hypo-connectivity of the DMN compared to e.g. healthy controls could facilitate easier comparisons with the findings of past studies (since GBC does not seem to be a very commonly-used measure in the psychosis fMRI literature). Also important since GBC is a summary measure of the average connectivity of a region, and doesn't provide any specificity in terms of which regions in particular are more or less connected within a functional network (an inherent limitation of this measure which warrants further attention).

      We acknowledge that GBC is a linear combination measure that by definition does not provide information on connectivity between any one specific pair of neural regions. However, as shown by highly robust and reproducible neurobehavioral maps, GBC seems to be suitable as a first-pass metric in the absence of a priori assumptions of how specific regional connectivity may map to the PC symptom dimensions, and it has been shown to be sensitive to altered patterns of overall neural connectivity in PSD cohorts (22–25) as well as in models of psychosis (9, 26). Moreover, it is an assumption free method for dimensionality reduction of the neural connectivity matrix (which is a massive feature space). Furthermore, GBC provides neural maps (where each region can be represented by a value, in contrast to full functional connectivity matrices), which were necessary for quantifying the relationship with independent molecular benchmark maps (i.e. pharmacological maps and gene expression maps). We do acknowledge that there are limitations to the method which we now discuss in the paper. Furthermore we agree with the Reviewer that the specific regions implicated in these symptom-neural relationships warrants a more detailed investigation and we plan to develop this further in future studies, such as with seed-based functional connectivity using regions implicated in PSD (e.g. thalamus (2, 27)) or restricted GBC (22) which can summarize connectivity information for a specific network or subset of neural regions. We have provided elaboration and clarification regarding this point in the Discussion:

      Another improvement would be to optimize neural data reduction sensitivity for specific symptom variation (28). We chose to use GBC for our initial geometry characterizations as it is a principled and assumption-free data-reduction metric that captures (dys)connectivity across the whole brain and generates neural maps (where each region can be represented by a value, in contrast to full functional connectivity matrices) that are necessary for benchmarking against molecular imaging maps. However, GBC is a summary measure that by definition does not provide information regarding connectivity between specific pairs of neural regions, which may prove to be highly symptom-relevant and informative. Thus symptom-neural relationships should be further explored with higher-resolution metrics, such as restricted GBC (22) which can summarize connectivity information for a specific network or subset of neural regions, or seed-based FC using regions implicated in PSD (e.g. thalamus (2, 27)).

      • Possibly a nitpick, but while the inclusion of cognitive measures for PSD individuals is a main (self-)selling point of the paper, there's very limited focus on the "Cognitive functioning" component (PC2) of the PCA solution. Examining Fig. S8K, the GBC map for this cognitive component seems almost to be the inverse for that of the "Psychosis configuration" component (PC3) focused on in the rest of the paper. Since PC3 does not seem to have high loadings from any of the cognitive items, but it is known that psychosis spectrum individuals tend to exhibit cognitive deficits which also have strong predictive power for illness trajectory, some discussion of how multiple univariate neuro-behavioural features could feasibly be used in conjunction with one another could have been really interesting.

      This is an important piece of feedback concerning the cognitive measure aspect of the study. As the Reviewer recognizes, cognition is a core element of PSD symptoms and the key reason for including this symptom into the model. Notably, the finding that one dimension captures a substantial proportion of cognitive performance-related variance, independent of other residual symptom axes, has not previously been reported and we fully agree that expanding on this effect is important and warrants further discussion. We would like to take two of the key points from the Reviewers’ feedback and expand further. First, we recognize that upon qualitative inspection PC2 and PC3 neural maps appear strongly anti-correlated. However, as demonstrated in Fig. S9O, PC2 and PC3 maps were anti-correlated at r=-0.47. For comparison, the PC2 map was highly anti-correlated with the BACS composite cognitive map (r=-0.81). This implies that the PC2 map in fact reflects unique neural circuit variance that is relevant for cognition, but not necessarily an inverse of the PC3.

      In other words, these data suggest that there are PSD patients with more (or less) severe cognitive deficits independent of any other symptom axis, which would be in line with the observation that these symptoms are not treatable with antipsychotic medication (and therefore should not correlate with symptoms that are treatable by such medications; i.e. PC3). We have now added these points into the revised paper:

      Results Fig. 1E highlights loading configurations of symptom measures forming each PC. To aid interpretation, we assigned a name for each PC based on its most strongly weighted symptom measures. This naming is qualitative but informed by the pattern of loadings of the original 36 symptom measures (Fig. 1). For example, PC1 was highly consistent with a general impairment dimension (i.e. “Global Functioning”); PC2 reflected more exclusively variation in cognition (i.e. “Cognitive Functioning”); PC3 indexed a complex configuration of psychosis-spectrum relevant items (i.e. “Psy- chosis Configuration”); PC4 generally captured variation mood and anxiety related items (i.e. “Affective Valence”); finally, PC5 reflected variation in arousal and level of excitement (i.e. “Agitation/Excitation”). For instance, a generally impaired patient would have a highly negative PC1 score, which would reflect low performance on cognition and elevated scores on most other symptomatic items. Conversely, an individual with a high positive PC3 score would exhibit delusional, grandiose, and/or hallucinatory behavior, whereas a person with a negative PC3 score would exhibit motor retardation, social avoid- ance, possibly a withdrawn affective state with blunted affect (29). Comprehensive loadings for all 5 PCs are shown in Fig. 3G. Fig. 1F highlights the mean of each of the 3 diagnostic groups (colored spheres) and healthy controls (black sphere) projected into a 3-dimensional orthogonal coordinate system for PCs 1,2 & 3 (x,y,z axes respectively; alternative views of the 3-dimensional coordinate system with all patients projected are shown in Fig. 3). Critically, PC axes were not parallel with traditional aggregate symptom scales. For instance, PC3 is angled at 45◦ to the dominant direction of PANSS Positive and Negative symptom variation (purple and blue arrows respectively in Fig. 1F). ... Because PC3 loads most strongly on to hallmark symptoms of PSD (including strong positive load- ings across PANSS Positive symptom measures in the PANSS and strong negative loadings onto most Negative measures), we focus on this PC as an opportunity to quantify an innovative, fully data-driven dimension of symptom variation that is highly characteristic of the PSD patient population. Additionally, this bi-directional symptom axis captured shared variance from measures in other traditional symptoms factors, such the PANSS General factor and cognition. We found that the PC3 result provided a powerful empirical demonstration of how using a data-driven dimensionality-reduced solution (via PCA) can reveal novel patterns intrinsic to the structure of PSD psychopathology.

      Another nitpick, but the Y axes of Fig. 8C-E are not consistent, which causes some of the lines of best fit to be a bit misleading (e.g. GABRA1 appears to have a more strongly positive gene-PC relationship than HTR1E, when in reality the opposite is true.)

      We have scaled each axis to best show the data in each plot but see how this is confusing and recognise the need to correct this. We have remade the plots with consistent axes labelling.

      • The authors explain the apparent low reproducibility of their multivariate PSD neuro-behavioural solution using the argument that many psychiatric neuroimaging datasets are too small for multivariate analyses to be sufficiently powered. Applying an existing multivariate power analysis to their own data as empirical support for this idea would have made it even more compelling. The following paper suggests guidelines for sample sizes required for CCA/PLS as well as a multivariate calculator: Helmer, M., Warrington, S. D., Mohammadi-Nejad, A.-R., Ji, J. L., Howell, A., Rosand, B., Anticevic, A., Sotiropoulos, S. N., & Murray, J. D. (2020). On stability of Canonical Correlation Analysis and Partial Least Squares with application to brain-behavior associations (p. 2020.08.25.265546). https://doi.org/10.1101/2020.08.25.265546

      We deeply appreciate the Reviewer’s suggestion and the opportunity to incorporate the methods from the Helmer et al. paper. We now highlight the importance of having sufficiently powered samples for multivariate analyses in our other manuscript first-authored by our colleague Dr. Markus Helmer (3). Using the method described in the above paper (GEMMR version 0.1.2), we computed the estimated sample sizes required to power multivariate CCA analyses with 718 neural features and 5 behavioral (PC) features (i.e. the feature set used throughout the rest of the paper):

      As argued in Helmer et al., rtrue is likely below 0.3 in many cases, thus the estimated sample size of 33k is likely a lower bound for the required sample size for sufficiently-powered CCA analyses using the 718+5 features leveraged throughout the univariate analyses in the present manuscript. This number is two orders of magnitude greater than our available sample (and at least one order of magnitude greater than any single existing clinical dataset). Even if rtrue is 0.5, a sample size of ∼10k would likely be required.

      As argued in Helmer et al., rtrue is likely below 0.3 in many cases, thus the estimated sample size of 33k is likely a lower bound for the required sample size for sufficiently-powered CCA analyses using the 718+5 features leveraged throughout the univariate analyses in the present manuscript. This number is two orders of magnitude greater than our available sample (and at least one order of magnitude greater than any single existing clinical dataset). Even if rtrue is 0.5, a sample size of ∼10k would likely be required. We also computed the estimated sample sizes required for 180 neural features (symmetrized neural cortical parcels) and 5 symptom PC features, consistent with the CCA reported in our main text:

      Assuming that rtrue is likely below 0.3, this minimal required sample size remains at least an order of magnitude greater than the size of our present sample, consistent with the finding that the CCA solution computed using these data was unstable. As a lower limit for the required sample size plausible using the feature sets reported in our paper, we additionally computed for comparison the estimated N needed with the smallest number of features explored in our analyses, i.e. 12 neural functional network features and 5 symptom PC features:

      These required sample sizes are closer to the N=436 used in the present sample and samples reported in the clinical neuroimaging literature. This is consistent with the observation that when using 12 neural and 5 symptom features (Fig. S15C) the detected canonical correlation r = 0.38 for CV1 is much lower (and likely not inflated due to overfitting) and may be closer to the true effect because with the n=436 this effect is resolvable. This is in contrast to the 180 neural features and 5 symptom feature CCA solution where we observed a null CCA effect around r > 0.6 across all 5 CVs. This clearly highlights the inflation of the effect in the situation where the feature space grows. There is no a priori plausible reason to believe that the effect for 180 vs. 5 feature mapping is literally double the effect when using 12 vs. 5 feature mapping - especially as the 12 features are networks derived from the 180 parcels (i.e. the effect should be comparable rather than 2x smaller). Consequently, if the true CCA effect with 180 vs. 5 features was actually in the more comparable r = 0.38, we would need >5,000 subjects to resolve a reproducible neuro-behavioral CCA map (an order of magnitude more than in the BSNIP sample). Moreover, to confidently detect effects if rtrue is actually less than 0.3, we would require a sample size >8,145 subjects. We have added this to the Results section on our CCA results:

      Next, we tested if the 180-parcel CCA solution is stable and reproducible, as done with PC-to-GBC univariate results. The CCA solution was robust when tested with k-fold and leave-site-out cross- validation (Fig. S16) likely because these methods use CCA loadings derived from the full sample. However, the CCA loadings did not replicate in non-overlapping split-half samples (Fig. 5L, see see Supplementary Note 4). Moreover, a leave-one-subject-out cross-validation revealed that removing a single subject from the sample affected the CCA solution such that it did not generalize to the left-out subject (Fig. 5M). This is in contrast to the PCA-to-GBC univariate mapping, which was substantially more reproducible for all attempted cross-validations relative to the CCA approach. This is likely because substantially more power is needed to resolve a stable multivariate neuro-behavioral effect with this many features. Indeed, a multivariate power analysis using 180 neural features and 5 symptom features, and assuming a true canonical correlation of r = 0.3, suggests that a minimal sample size of N = 8145 is needed to sufficiently detect the effect (3), an order of magnitude greater than the available sample size. Therefore, we leverage the univariate neuro-behavioral result for subsequent subject-specific model optimization and comparisons to molecular neuroimaging maps.

      Additionally, we added the following to Supplementary Note 4: Establishing the Reproducibility of the CCA Solution:

      Here we outline the details of the split-half replication for the CCA solution. Specifically, the full patient sample was randomly split (referred to as “H1” and “H2” respectively), while preserving the proportion of patients in each diagnostic group. Then, CCA was performed independently for H1 and H2. While the loadings for behavioral PCs and original behavioral items are somewhat similar (mean r 0.5) between the two CCAs in each run, the neural loadings were not stable across H1 and H2 CCA solutions. Critically, CCA results did not perform well for leave-one-subject-out cross-validation (Fig. 5M). Here, one patient was held out while CCA was performed using all data from the remaining 435 patients. The loadings matrices Ψ and Θ from the CCA were then used to calculate the “predicted” neural and behavioral latent scores for all 5 CVs for the patient that was held out of the CCA solution. This process was repeated for every patient and the final result was evaluated for reproducibility. As described in the main text, this did not yield reproducible CCA effects (Fig. 5M). Of note, CCA may yield higher reproducibility if the neural feature space were to be further reduced. As noted, our approach was to first parcellate the BOLD signal and then use GBC as a data-driven method to yield a neuro-biologically and quantitatively interpretable neural data reduction, and we additionally symmetrized the result across hemispheres. Nevertheless, in sharp contrast to the PCA univariate feature selection approach, the CCA solutions were still not stable in the present sample size of N = 436. Indeed, a multivariate power analysis (3) estimates that the following sample sizes will be required to sufficiently power a CCA between 180 neural features and 5 symptom features, at different levels of true canonical correlation (rtrue):

      To test if further neural feature space reduction may be improve reproducibility, we also evaluated CCA solutions with neural GBC parcellated according to 12 brain-wide functional networks derived from the recent HCP driven network parcellation (30). Again, we computed the CCA for all 36 item-level symptom as well as 5 PCs (Fig. S15). As with the parcel-level effects, the network-level CCA analysis produced significant results (for CV1 when using 36 item-level scores and for all 5 CVs when using the 5 PC-derived scores). Here the result produced much lower canonical correlations ( 0.3-0.5); however, these effects (for CV1) clearly exceeded the 95% confidence interval generated via random permutations, suggesting that they may reflect the true canonical correlation. We observed a similar result when we evaluated CCAs computed with neural GBC from 192 symmetrized subcortical parcels and 36 symptoms or 5 PCs (Fig. S14). In other words, data-reducing the neural signal to 12 functional networks likely averaged out parcel-level information that may carry symptom-relevant variance, but may be closer to capturing the true effect. Indeed, the power analysis suggests that the current sample size is closer to that needed to detect an effect with 12 + 5 features:

      Note that we do not present a CCA conducted with parcels across the whole brain, as the number of variables would exceed the number of observations. However, the multivariate power analysis using 718 neural features and 5 symptom features estimates that the following sample sizes would be required to detect the following effects:

      This analysis suggests that even the lowest bound of 10k samples exceeds the present available sample size by two orders of magnitude.

      We have also added Fig. S19, illustrating these power analyses results:

      Fig. S19. Multivariate power analysis for CCA. Sample sizes were calculated according to (3), see also https://gemmr.readthedocs.io/en/latest/. We computed the multivariate power analyses for three versions of CCA reported in this manuscript: i) 718 neural vs. 5 symptom features; ii) 180 neural vs. 5 symptom features; iii) 12 neural vs. 5 symptom features. (A) At different levels of features, the ratio of samples (i.e. subjects) required per feature to derive a stable CCA solution remains approximately the same across all values of rtrue. As discussed in (3), at rtrue = 0.3 the number of samples required per feature is about 40, which is much greater than the ratio of samples to features available in our dataset. (B) The total number of samples required (nreq)) for a stable CCA solution given the total number of neural and symptom features used in our analyses, at different values of rtrue. In general these required sample sizes are much greater than the N=436 (light grey line) PSD in our present dataset, consistent with the finding that the CCA solutions computed using our data were unstable. Notably, the ‘12 vs. 5’ CCA assuming rtrue = 0.3 requires only 700 subjects, which is closest to the N=436 (horizontal grey line) used in the present sample. This may be in line with the observation of the CCA with 12 neural vs 5 symptom features (Fig. S15C) that the canonical correlation (r = 0.38 for CV1) clearly exceeds the 95% confidence interval, and may be closer to the true effect. However, to confidently detect effects in such an analysis (particularly if rtrue is actually less than 0.3), a larger sample would likely still be needed.

      We also added the corresponding methods in the Methods section:

      Multivariate CCA Power Analysis. Multivariate power analyses to estimate the minimum sample size needed to sufficiently power a CCA were computed using methods described in (3), using the Genera- tive Modeling of Multivariate Relationships tool (gemmr, https://github.com/murraylab/ gemmr (v0.1.2)). Briefly, a model was built by: 1) Generating synthetic datasets for the two input data matrices, by sampling from a multivariate normal distribution with a joint covariance matrix that was structured to encode CCA solutions with specified properties; 2) Performing CCAs on these synthetic datasets. Because the joint covariance matrix is known, the true values of estimated association strength, weights, scores, and loadings of the CCA, as well as the errors for these four metrics, can also be computed. In addition, statistical power that the estimated association strength is different from 0 is determined through permutation testing; 3) Varying parameters of the generative model (number of features, assumed true between-set correlation, within-set variance structure for both datasets) the required sample size Nreq is determined in each case such that statistical power reaches 90% and all of the above described error metrics fall to a target level of 10%; and 4) Fitting and validating a linear model to predict the required sample size Nreq from parameters of the generative model. This linear model was then used to calculate Nreq for CCA in three data scenarios: i) 718 neural vs. 5 symptom features; ii) 180 neural vs. 5 symptom features; iii) 12 neural vs. 5 symptom features.

      • Given the relatively even distribution of males and females in the dataset, some examination of sex effects on symptom dimension loadings or neuro-behavioural maps would have been interesting (other demographic characteristics like age and SES are summarized for subjects but also not investigated). I think this is a missed opportunity.

      We have now provided additional analyses for the core PCA and univariate GBC mapping results, testing for effects of age, sex, and SES in Fig. S8. Briefly, we observed a significant positive relationship between age and PC3 scores, which may be because older patients (whom presumably have been ill for a longer time) exhibit more severe symptoms along the positive PC3 – Psychosis Configuration dimension. We also observed a significant negative relationship between Hollingshead index of SES and PC1 and PC2 scores. Lower PC1 and PC2 scores indicate poorer general functioning and cognitive performance respectively, which is consistent with higher Hollingshead indices (i.e. lower-skilled jobs or unemployment and fewer years of education). We also found significant sex differences in PC2 – Cognitive Functioning, PC4 – Affective Valence, and PC5 – Agitation/Excitement scores.

      Fig. S8. Effects of age, socio-economic status, and sex on symptom PCA solution. (A) Correlations between symptom PC scores and age (years) across N=436 PSD. Pearson’s correlation value and uncorrected p-values are reported above scatterplots. After Bonferroni correction, we observed a significant positive relationship between age and PC3 score. This may be because older patients have been ill for a longer period of time and exhibit more severe symptoms along the positive PC3 dimension. (B) Correlations between symptom PC scores and socio-economic status (SES) as measured by the Hollingshead Index of Social Position (31), across N=387 PSD with available data. The index is computed as (Hollingshead occupation score 7) + (Hollingshead education score 4); a higher score indicates lower SES (32). We observed a significant negative relationship between Hollingshead index and PC1 and PC2 scores. Lower PC1 and PC2 scores indicate poorer general functioning and cognitive performance respectively, which is consistent with higher Hollingshead indices (i.e. lower-skilled jobs or unemployment and fewer years of education). (C) The Hollingshead index can be split into five classes, with 1 being the highest and 5 being the lowest SES class (31). Consistent with (B) we found a significant difference between the classes after Bonferroni correction for PC1 and PC2 scores. (D) Distributions of PC scores across Hollingshead SES classes show the overlap in scores. White lines indicate the mean score in each class. (E) Differences in PC scores between (M)ale and (F)emale PSD subjects. We found a significant difference between sexes in PC2 – Cognitive Functioning, PC4 – Affective Valence, and PC5 – Agitation/Excitement scores. (F) Distributions of PC scores across M and F subjects show the overlap in scores. White lines indicate the mean score for each sex.

      Bibliography

      1. Jie Lisa Ji, Caroline Diehl, Charles Schleifer, Carol A Tamminga, Matcheri S Keshavan, John A Sweeney, Brett A Clementz, S Kristian Hill, Godfrey Pearlson, Genevieve Yang, et al. Schizophrenia exhibits bi-directional brain-wide alterations in cortico-striato-cerebellar circuits. Cerebral Cortex, 29(11):4463–4487, 2019.
      2. Alan Anticevic, Michael W Cole, Grega Repovs, John D Murray, Margaret S Brumbaugh, Anderson M Winkler, Aleksandar Savic, John H Krystal, Godfrey D Pearlson, and David C Glahn. Characterizing thalamo-cortical disturbances in schizophrenia and bipolar illness. Cerebral cortex, 24(12):3116–3130, 2013.
      3. Markus Helmer, Shaun D Warrington, Ali-Reza Mohammadi-Nejad, Jie Lisa Ji, Amber Howell, Benjamin Rosand, Alan Anticevic, Stamatios N Sotiropoulos, and John D Murray. On stability of canonical correlation analysis and partial least squares with application to brain-behavior associations. bioRxiv, 2020. .
      4. Richard Dinga, Lianne Schmaal, Brenda WJH Penninx, Marie Jose van Tol, Dick J Veltman, Laura van Velzen, Maarten Mennes, Nic JA van der Wee, and Andre F Marquand. Evaluating the evidence for biotypes of depression: Methodological replication and extension of. NeuroImage: Clinical, 22:101796, 2019.
      5. Cedric Huchuan Xia, Zongming Ma, Rastko Ciric, Shi Gu, Richard F Betzel, Antonia N Kaczkurkin, Monica E Calkins, Philip A Cook, Angel Garcia de la Garza, Simon N Vandekar, et al. Linked dimensions of psychopathology and connectivity in functional brain networks. Nature communications, 9(1):3003, 2018.
      6. Andrew T Drysdale, Logan Grosenick, Jonathan Downar, Katharine Dunlop, Farrokh Mansouri, Yue Meng, Robert N Fetcho, Benjamin Zebley, Desmond J Oathes, Amit Etkin, et al. Resting-state connectivity biomarkers define neurophysiological subtypes of depression. Nature medicine, 23(1):28, 2017.
      7. Meichen Yu, Kristin A Linn, Russell T Shinohara, Desmond J Oathes, Philip A Cook, Romain Duprat, Tyler M Moore, Maria A Oquendo, Mary L Phillips, Melvin McInnis, et al. Childhood trauma history is linked to abnormal brain connectivity in major depression. Proceedings of the National Academy of Sciences, 116(17):8582–8590, 2019.
      8. David R Hardoon, Sandor Szedmak, and John Shawe-Taylor. Canonical correlation analysis: An overview with application to learning methods. Neural computation, 16(12):2639–2664, 2004.
      9. Katrin H Preller, Joshua B Burt, Jie Lisa Ji, Charles H Schleifer, Brendan D Adkinson, Philipp Stämpfli, Erich Seifritz, Grega Repovs, John H Krystal, John D Murray, et al. Changes in global and thalamic brain connectivity in LSD-induced altered states of consciousness are attributable to the 5-HT2A receptor. eLife, 7:e35082, 2018.
      10. Mark A Geyer and Franz X Vollenweider. Serotonin research: contributions to understanding psychoses. Trends in pharmacological sciences, 29(9):445–453, 2008.
      11. H Y Meltzer, B W Massey, and M Horiguchi. Serotonin receptors as targets for drugs useful to treat psychosis and cognitive impairment in schizophrenia. Current pharmaceutical biotechnology, 13(8):1572–1586, 2012.
      12. Anissa Abi-Dargham, Marc Laruelle, George K Aghajanian, Dennis Charney, and John Krystal. The role of serotonin in the pathophysiology and treatment of schizophrenia. The Journal of neuropsychiatry and clinical neurosciences, 9(1):1–17, 1997.
      13. Francine M Benes and Sabina Berretta. Gabaergic interneurons: implications for understanding schizophrenia and bipolar disorder. Neuropsychopharmacology, 25(1):1–27, 2001.
      14. Melis Inan, Timothy J. Petros, and Stewart A. Anderson. Losing your inhibition: Linking cortical gabaergic interneurons to schizophrenia. Neurobiology of Disease, 53:36–48, 2013. ISSN 0969-9961. . What clinical findings can teach us about the neurobiology of schizophrenia?
      15. Samuel J Dienel and David A Lewis. Alterations in cortical interneurons and cognitive function in schizophrenia. Neurobiology of disease, 131:104208, 2019.
      16. John E Lisman, Joseph T Coyle, Robert W Green, Daniel C Javitt, Francine M Benes, Stephan Heckers, and Anthony A Grace. Circuit-based framework for understanding neurotransmitter and risk gene interactions in schizophrenia. Trends in neurosciences, 31(5):234–242, 2008.
      17. Anthony A Grace. Dysregulation of the dopamine system in the pathophysiology of schizophrenia and depression. Nature Reviews Neuroscience, 17(8):524, 2016.
      18. John F Enwright III, Zhiguang Huo, Dominique Arion, John P Corradi, George Tseng, and David A Lewis. Transcriptome alterations of prefrontal cortical parvalbumin neurons in schizophrenia. Molecular psychiatry, 23(7): 1606–1613, 2018.
      19. Daniel J Lodge, Margarita M Behrens, and Anthony A Grace. A loss of parvalbumin-containing interneurons is associated with diminished oscillatory activity in an animal model of schizophrenia. Journal of Neuroscience, 29(8): 2344–2354, 2009.
      20. Clare L Beasley and Gavin P Reynolds. Parvalbumin-immunoreactive neurons are reduced in the prefrontal cortex of schizophrenics. Schizophrenia research, 24(3):349–355, 1997.
      21. David A Lewis, Allison A Curley, Jill R Glausier, and David W Volk. Cortical parvalbumin interneurons and cognitive dysfunction in schizophrenia. Trends in neurosciences, 35(1):57–67, 2012.
      22. Alan Anticevic, Margaret S Brumbaugh, Anderson M Winkler, Lauren E Lombardo, Jennifer Barrett, Phillip R Corlett, Hedy Kober, June Gruber, Grega Repovs, Michael W Cole, et al. Global prefrontal and fronto-amygdala dysconnectivity in bipolar i disorder with psychosis history. Biological psychiatry, 73(6):565–573, 2013.
      23. Alex Fornito, Jong Yoon, Andrew Zalesky, Edward T Bullmore, and Cameron S Carter. General and specific functional connectivity disturbances in first-episode schizophrenia during cognitive control performance. Biological psychiatry, 70(1):64–72, 2011.
      24. Avital Hahamy, Vince Calhoun, Godfrey Pearlson, Michal Harel, Nachum Stern, Fanny Attar, Rafael Malach, and Roy Salomon. Save the global: global signal connectivity as a tool for studying clinical populations with functional magnetic resonance imaging. Brain connectivity, 4(6):395–403, 2014.
      25. Michael W Cole, Alan Anticevic, Grega Repovs, and Deanna Barch. Variable global dysconnectivity and individual differences in schizophrenia. Biological psychiatry, 70(1):43–50, 2011.
      26. Naomi R Driesen, Gregory McCarthy, Zubin Bhagwagar, Michael Bloch, Vincent Calhoun, Deepak C D’Souza, Ralitza Gueorguieva, George He, Ramani Ramachandran, Raymond F Suckow, et al. Relationship of resting brain hyperconnectivity and schizophrenia-like symptoms produced by the nmda receptor antagonist ketamine in humans. Molecular psychiatry, 18(11):1199–1204, 2013.
      27. Neil D Woodward, Baxter Rogers, and Stephan Heckers. Functional resting-state networks are differentially affected in schizophrenia. Schizophrenia research, 130(1-3):86–93, 2011.
      28. Zarrar Shehzad, Clare Kelly, Philip T Reiss, R Cameron Craddock, John W Emerson, Katie McMahon, David A Copland, F Xavier Castellanos, and Michael P Milham. A multivariate distance-based analytic framework for connectome-wide association studies. Neuroimage, 93 Pt 1:74–94, Jun 2014. .
      29. Alan J Gelenberg. The catatonic syndrome. The Lancet, 307(7973):1339–1341, 1976.
      30. Jie Lisa Ji, Marjolein Spronk, Kaustubh Kulkarni, Grega Repovš, Alan Anticevic, and Michael W Cole. Mapping the human brain’s cortical-subcortical functional network organization. NeuroImage, 185:35–57, 2019.
      31. August B Hollingshead et al. Four factor index of social status. 1975.
      32. Jaya L Padmanabhan, Neeraj Tandon, Chiara S Haller, Ian T Mathew, Shaun M Eack, Brett A Clementz, Godfrey D Pearlson, John A Sweeney, Carol A Tamminga, and Matcheri S Keshavan. Correlations between brain structure and symptom dimensions of psychosis in schizophrenia, schizoaffective, and psychotic bipolar i disorders. Schizophrenia bulletin, 41(1):154–162, 2015.
    1. In my house we spoke Spanish all the time because of my mom. To this day, she doesn't want to learn English even though we tell her to learn English. My little sister, she doesn't speak Spanish, she speaks more English and with her it's different. We tell her, "You have to learn Spanish because it's going to help you," but she doesn't want to learn.Anne: Is she a citizen?Juan: Yes, she was born in the US. So my parents didn't really adapt to the American culture. They always wanted to follow Mexican traditions, even when it's Mother's Day over there … I think here it's May 10th but over there, when is Mother's Day?Anne: I think it's the second Sunday of May, so it could be different days.Juan: We could take that as an example. They'd rather follow Mother's Day here in Mexico than over there. Also Christmas, I guess the one thing they did adapt to was Thanksgiving. We don't celebrate that here in Mexico, but they do celebrate there, and they did adapt that. Another thing, Easter day. You go out with your family, you hide the eggs as a tradition, no? They adapted to that, but here in Mexico they don't do that. They don't even know about that. In a way they wanted to keep their Mexican culture alive even though they were in the US, but they also wanted to adapt to the things that they did there.

      Family, mom, parents, translating for, learning English, Homelife, Mexican traditions, holidays, Spanish language;

    1. intrinsically the mind was virtually omniscient and that it merely it was not in fact omniscient here and now because for the benefit of the 00:34:28 animal who has to survive on the surface of this planet we cannot be omniscient because we should be so full of irrelevant information that we should simply not be able to get out of the way of the cars in the street and 00:34:42 consequently the nervous system central nervous system the brain exists in order to limit this virtually in this quantity of consciousness which we virtually have 00:34:57 to limit it and to funnel it through for the purposes of biological survival on the surface of this particular planet well my own feeling is I would I would think this 00:35:10 idea of a completely omniscient mind is a low seems to me a little fantastic but I would think that there is something to be said for a view which would say that 00:35:24 the this psychic medium whatever it may be is let us say virtually omniscient that is it could take on into itself 00:35:38 every kind of specialized information but what it is in itself is a kind of undifferentiated consciousness and as I shall try to point out later on in this 00:35:52 lecture there is a lot of evidence from the part of the on the part of the Mystics both east and west to the effect that our particular specialized 00:36:06 individualized consciousness is under Lane by an undifferentiated consciousness and this again differentiated consciousness possesses

      brain is there to limit

    1. Author Response:

      Reviewer #1:

      This study reports the novel and interesting finding that AKAP220 knockout leads to a dramatic increase in primary cilia in renal collecting ducts. AKAP220 is known to sequester PKA, GSK3, the Rho GTPase effector IQGAP-1 and PP1. Previous work from this group demonstrated that AKAP220-/- mice exhibit reduced accumulation of apical actin in the kidney attributable to less GTP-loading of RhoA. Relatedly, AKAP220-/- mice display mild defects in aquaporin 2 trafficking. In this work, Golpalan et al examine the effects of AKAP220 mutation on cilia. They demonstrate increased numbers of primary cilia decorating AKAP220-/- collecting ducts. This phenotype is striking as little is known about negative regulators of cilium biogenesis.

      The authors also provide evidence that interaction of AKAP220 with protein phosphatase 1 (PP1) is critical for its function. Through PP1, AKAP220 may regulate HDAC6, which may in turn inhibit tubulin acetylation, which may in turn control cilia stability. Aberrant cilia function is implicated in autosomal dominant polycystic kidney disease. The authors also speculate that AKAP220 and tubulin acetylation may have clinical relevance for autosomal dominant polycystic disease. However, it remains unclear how increased cilia biogenesis may affect cell or tissue physiology. This work is of interest to cell biologists seeking to understand the biogenesis of the primary cilium, and to others interested in ciliopathies (i.e., disorders of the primary cilium).

      We thank the reviewer 1 for their insightful comments and concur with their assessment that “it remains unclear how increased cilia biogenesis may affect cell or tissue physiology”. This is clearly a topic for further study within the field that will include ourselves and other laboratories.

      Reviewer #2:

      The authors show that AKAP220 knockout in kidney collecting ducts leads to a pronounced increase in primary cilia. They go on to demonstrate that this effect holds true in multiple different preparations, before clearly demonstrating that the PP1 anchoring site is critical for the normal role of AKAP220 is limiting primary cilia formation.

      Although the key overall finding is well supported, I did not find the specific mechanism concerning a AKAP220-PP1-HDAC6 signaling complex/axis csufficiently onvincing. The authors propose that AKAP220 interacts with HDAC6 via PP1, and that within the complex HDAC6 is stabilised through phosphorylation. The knock on effect is efficient deacetylation. Although this complicated mechanism is consistent with the data, three supporting observations towards this specific mechanism come with caveats: (i) in figure 2C, they show an increase in acetyl tubulin by immunoblotting, but the densitometry seems to be the ratio of acetyl tubulin to GAPDH - would it not be more appropriate to reference to total tubulin?

      We are encouraged that this reviewer considers that our “overall findings are well supported”. In response to their comments, we have bolstered our evidence that AKAP220 interacts with HDAC6 via PP1 by including new biochemical and imaging data showing that recruitment of the histone deacetylase is attenuated in kidney cells engineered to express a PP1-binding defective mutant of the anchoring protein. These new data are incorporated into figure 3D and supplemental figures S3D-L.

      The mechanism investigated in this paper is concerned with absolute levels of acetylated tubulin. Since the levels of both control proteins (alpha tubulin and GAPDH) and do not change between wildtype and AKAP220KO, therefore we chose to normalize to GAPDH. It is important to note that normalizing to total tubulin does not change the result.

      Reviewer #3:

      The authors had previously generated a mouse line with inactivation of AKAP220, which encodes an A-kinase anchoring protein, and observed defects in their collecting ducts (CD) leading to defects in trafficking of aquaporin 2. While further characterizing the samples, they observed that CD epithelia had increased numbers and length of their primary cilia compared to CD cells of control mice. While some AKAP proteins have been localized to the primary cilium, AKAP220 was not one of them so the authors pursued a systematic series of experiments to determine how AKAP220 has these effects. Using a combination of CRISPR-manipulated renal epithelial cell lines (IMCD cells), drugs/compounds, 3D and organ-on-a chip cell culture systems they present compelling data that show that AKAP220 anchors a complex of HDAC6 and Protein Phosphatase-1 (PP1) that controls the polymerization of actin and thereby affects cilia formation and elongation. Genetic or pharmacologic manipulations that disrupt AKAP220 or its ability to bind to PP1, inhibit HDAC6, or affect actin stability result in a similar phenotype of enhanced ciliogenesis and ciliary length. Given that polycystic kidney disease has been described as a ciliopathy, with the gene products of the two most common forms of the disease (polycystin-1 and polycystin-2) localized to the cilia, they tested whether inhibiting HDAC6 activity might affect cyst growth using a human iPSC organoid system. They found that organoids lacking polycystin-2 treated with tubacin had smaller cyst size compared to vehicle-treated mutants, leading them to propose manipulation of HDAC6 as a tentative therapeutic strategy for human autosomal dominant polycystic kidney disease and for ciliopathies characterized by defects in ciliogenesis.

      Strengths: These findings will be of interest to the ciliary community. They have identified a new factor and its associated partners that appear to regulate ciliogenesis. The studies follow a logical progression and are generally well-done with suitable controls, rigorous quantitation, and a reasonable level of replication (all done at least three times). They have used complementary methods (ie. Genetic manipulation, pharmacologic inhibition) to support their model, sometimes in combination to show that the underlying factor targeted by either genetics or drugs work through the same mechanism.

      Weaknesses: The major weakness of the report is in its attempt to be translational. Here, the report has a number of serious theoretical and experimental limitations. On the theoretical level, the rationale behind using an HDAC6 inhibitor is unclear given their data and their model. On the one hand, a prior study had reported that a non-specific inhibitor of HDACs slowed cyst growth in an orthologous mouse model of ADPKD. The current work could suggest that HDAC6 was the actual target in the prior work and that a specific inhibitor for HDAC6 should confer the same benefits. On the other hand, there are compelling reports that show that genetic inhibition of ciliogenesis actually attenuates cystic disease in orthologous mouse models of human ADPKD. The current paradigm is that preserved ciliary activity in the absence of Polycystin-1 or Polycystin-2 promotes cystic growth. This would suggest that any intervention that boosts ciliary function could actually worsen disease. And while the authors never directly comment on the functional properties of the "mutant" cilia that result from deletion of AKAP220 or inhibition of HDAC6, they imply that these "enhanced" cilia are functional by suggesting the use of HDAC6 inhibitors as therapy for ciliopathies that are the result of defective biogenesis. Their prior work also provides indirect support for the notion that the enhanced cilia are functional. AKAP220 knock-out mice are reported to be generally functional, apparently lacking phenotypes commonly associated with defective cilia structure or function. These contradictory observations suggest that one or more of the following conclusions: the "mutant" cilia are in fact poorly functional, the HDAC inhibitors are working through a different mechanism than that which has been proposed, or that the assay as used in this report is not a good read-out of cyst-modulating effects. The last point is particularly relevant for this report. The investigators scored effectiveness of tubacin based on the relative rate of growth of cysts treated with different concentrations of tubacin vs vehicle. In this assay, cyst growth is principally driven by rates of cellular proliferation. Tubacin is an anti-proliferative agent with some toxicity, and while it might be highly selective for HDAC6, these studies cannot distinguish between effects mediated through the AKAP22-HDAC6 pathway versus others. In sum, while tubacin or a similarly-acting drug may or may not be effective for slowing cyst growth, there are multiple reasons to think it isn't through the mechanism the authors propose.

      We are encouraged that reviewer 3 considers “our studies follow a logical progression and are generally well-done with suitable controls, rigorous quantitation, and a reasonable level of replication”. In terms of weaknesses, our reading of the reviewer’s detailed passage has identified two specific points that we can address.

      1) Lesions in cilia and polycystins are linked to Autosomal Dominant Polycystic Kidney Disease (Hughes et al., 1995; Mochizuki et al., 1996). Although there is general agreement on this point, the molecular details remain unclear and are inherently paradoxical. For example, loss of morphologically intact cilia favors a less severe cystic phenotype (Ma et al., 2013). In contrast, other investigators report that loss of intact primary cilia results in renal cystogenesis (Kolb and Nauli, 2008; Lin et al., 2003). How primary cilia can be pro-cystogenic in one context yet anti-cystogenic in another context remains an unsolved paradox for the field. We appreciate the need for further clarification on this point as raised by reviewer 3. This conundrum is now noted in the discussion on page 34, line 3.

      2) Searching for a therapeutic approach to restore functional primary cilia is the rationale behind our concluding studies. However, the complexity of genetic models for ADPKD and the above mentioned “cilia paradox” limits our ability to accurately predict how pharmacological agents targeting cilia might affect cellular models of cystogenesis. That being said, we realize that HDAC6 inhibitors have been used by other groups to target cyst size (Cebotaru et al., 2016; Yanda et al., 2017). The reviewer is correct in pointing out that the mechanism by which HDAC6 inhibitors act to inhibit cystogenesis could be less than straightforward given the multitude of functions for HDAC6. We have amended the discussion on page 34, line 5to reflect the reviewer’s valid point.

    1. Author Response:

      Reviewer #1:

      In this paper, the authors study one of the understudied aspects of the evolutionary transition to multicellularity: the evolution of irreversible somatic differentiation of germ cells. Division of labour via functional specialisation of cells to perform different tasks is pervasive across the tree of life. Various studies assume that the differentiation of reproductive cells ("germ-role cells" in this manuscript) into a non-reproducing cell type ("soma-role cells") is irreversible. In reality, the conditions that promote the evolution of this irreversible transition are unclear. Here, the authors set out to fill in this knowledge gap. They model a population of organisms that grow from a single germ-role cell and find the optimal developmental strategy in terms of differentiation probabilities, under different scenarios. Under their model assumptions, they show that irreversible somatic differentiation can evolve when 1) cell differentiation is costly, 2) somatic cells' contribution to growth rate is large, 3) organismal body size is large.

      Overall, I think the authors identified an interesting and neglected aspect of cellular differentiation and division of labour. I enjoyed reading the paper; I thought the writing was clear and the modelling approach was adequate to address the authors' question.

      Thank you for a detailed and constructive review.

      Some aspects that can be improved:

      1) Throughout the manuscript, I was somewhat confused about what system the authors have in mind: a colony with division of labour or a multicellular organism? While their model can potentially capture both, their Introduction and Discussion seem to be geared towards colonies at the transition to multicellularity, whereas the Results section gives the impression that the authors have multicellular organisms in mind (e.g. very large body sizes).

      We are interested in the transition from a colonial life, where tasks are distributed in time, to multicellular organisms, where tasks are divided between cells. As such, our model covers these scenarios as two limit cases. In the context of our study, we discuss examples from the nature where this transition is observed – e.g. among Volvocales algae. For the purpose of the necessary colony/organism size, we do not need to go further than 2^6 = 64 cells. However, to infer the patterns of the composition effect Fcomp (Fig.3 C,D), we consider organisms doing four more rounds of cell divisions before reproduction, leading to maturity size of 2^10=1024 cells. There, irreversible somatic differentiation can occur at a wide range of differentiation costs (see Fig.4 A). Also, smaller sizes put stronger restrictions on the composition effect Fcomp, so the distribution of parameters presented at Fig.3C,D taken at the n=6 instead of 10, would have much less data points and this could obfuscate the pattern found in this study. Overall, the scale of about 1000 cells, for which we report most of our modeling results, features entities with very diverse complexity: from undifferentiated colonies (ocean algae Phaeocystis antarctica), to intermediary life forms (slime molds slugs), to paradigm multicellular organisms (higher Volvocales and C. elegans). We think that the chosen range of the organism size is adequate to the comparison of entities with undifferentiated and differentiated cells. In the updated manuscript, we extend the exposition of organism size to reflect this aspect.

      2) From the point of view of someone who works on topics related to cancer and senescence, I think these fields are very much connected to the evolution of multicellularity. Maybe because I had multicellular organisms in mind rather than colonies with division of labour (above), I thought the manuscript missed this connection. Damage accumulation is key to Weismann and Kirkwood's theories of germ-soma divide and disposable soma, respectively, whereas dysregulated differentiation is one of the important aspects of tumour development (e.g. Aktipis et al. 2015). Making these links could also be relevant to discuss some of the model assumptions. For instance, the authors assume that fast growth comes with no cost in terms of cell damage, which may not always be the case (e.g. Ricklefs 2006) and reversibility of somatic differentiation can come at a cost of increased risk of somatic "cheaters" or cancerous cell lines.

      Thank you for this suggestion. Indeed, the aspect of cancer risk has not been considered in the initially submitted manuscript. In the updated manuscript, we introduce a model where differentiation is linked to the risk of an organism for death instead of a delay in development. The results with this model exhibit very similar pattern, see Fig.5. Hence, the term “cost of differentiation” can be interpreted more broadly than just cell division delay suggested by our main model.

      3) The authors assume the differentiation strategy (D) does not change over the lifetime (which equates to ontogenesis in their model, i.e. they do not consider mature lifespan). I wonder if this is really the case, or whether organisms/cells can respond to the composition of cells they perceive. For instance, at least in some animal tissues, a small number of stem cells are kept to replenish differentiated tissue cells when needed. I understand that making D plastic can make the model really complicated, but maybe it is worth talking about what strategy would evolve if D was not stable through ontogenesis (and mature lifespan). My initial guess is that if differentiation probabilities can change through life and if one considers cellular damage accumulation, senescence and cancer (as above), the conditions that favour irreversible somatic differentiation would expand.

      Indeed, we assume the differentiation strategy to be constant in our model. We do not know whether it is true at the brink of multicellularity and, for sure, once evolution makes a species complex enough, this assumption will become inadequate. Yet, when we consider a dynamic differentiation strategy, we find a very efficient but unrealistic solution: at the very beginning of a life cycle a germ-role cell gives rise to two soma-role cells, then these soma-role cells produce only soma-role cells and finally, at the very last round of cell division, they give rise to as many germ cells as possible. This scenario is the most efficient in terms of the rate of the organism development (100% of useful soma-role cells during growth), amount of offspring produced (every cell becomes a germ at the end of the day), and differentiation costs/risks (differentiation occurs only twice in a life time). Still, it is unrealistic. There must be some constraints on the flexibility of the dynamic differentiation strategy. We think that the exploration of the space of dynamical differentiation strategies and their constraints goes beyond the scope of the current study. Nevertheless, we are very interested to explore this topic further in following projects.

      Reviewer #2:

      This works seeks to determine the conditions in which simple multicellular groups can evolve irreversibly somatic cells, that is: a replicating cell lineage that provides cooperative benefits as the group grows and cannot de-differentiate into reproductive germ cells.

      This question is addressed with a well-constructed model that is easy to understand and provides intuitive results. Groups are composed of germ and soma cells that replicate synchronously until the group has reached a maximal size. When each type of cell divides, they may have different probabilities of producing daughter cells of each type, and the analysis determines the optimal differentiation probabilities for each type of cell depending on a variety of factors. Critically, irreversible somatic differentiation arises when the optimal probability for soma cells is to produce only soma cells.

      The elegance of the model means that the predictions are easy to interpret. First, when there is a higher cost for soma cells to produce germ cells, then a dedicated lineage of somatic cells is more favourable. Second, when soma cells produce only soma cells and germ cells can produce both types, the proportion of soma cells in the group will increase with each division. Consequently, for irreversible somatic cells to be optimal, germ cells must produce a small number of soma cells and these few must provide large benefits. Third, larger group sizes are required for a small number of soma cells to arise and provide sufficient benefits to the group.

      Inevitably, there is a trade-off between the benefits of a simple model and the costs of idealised assumptions.

      Among other assumptions, the model assumes that germ cells and soma cells replicate synchronously and at the same rate, and that soma cells provide benefits throughout the growth of the group, but do not increase the fecundity of germ cells in the last generation. Consequently, it is not clear to what extent the predictions of the model apply to the notable empirical cases where these assumptions do not hold. For instance, in the often-cited Volvocine algae, soma cells do not provide any benefits until the last generation of the group life cycle. This may help to explain why many Volcocine species have a very large number of somatic cells, counter to the second prediction of the model.

      Overall, this analysis is targeted and provides clear predictions within the bounds of its assumptions. Thus, these results provide a compelling framework or stepping-stone against which future models of germ-soma differentiation in alternate scenarios can be compared and evaluated.

      Thank you for the kind words and the well-thought review. Indeed, our model takes a number of simplifying assumptions. In the revised manuscript, we consider the model, in which the strongest of our simplifications – of simultaneous cell divisions - is violated. This asynchronous cell division model shows that irreversible differentiation may evolve, at least, under asymmetric differentiation costs. However, its evolution is observed less often than in a synchronous model.

      We absolutely agree that the design of our model does not replicate the details of Volvocine life cycles. However, our work is not aimed to be a model of germ-soma differentiation in Volvocales. Instead, we developed a simplistic model implementing features from a diverse range of organisms. While in higher Volvocales young colonies develop within a maternal organism, there is a wide range of colonial organisms, which grow from independently living single cell, e.g. colonial diatoms, Haptophytes Phaeocystis antarctica, and amoebazoan Phalansterium. We agree that the protection by maternal organism should play a major role in Volvocales and we are looking forward to investigate a follow-up model taking this factor into account.

      Reviewer #3:

      This paper provides a theoretical investigation of the evolution of somatic differentiation. While many studies have considered this broad topic, far fewer have specifically modelled the evolutionary dynamics of the reversibility of somatic differentiation. Within this subset, the conditions that select for irreversible somatic differentiation have appeared conspicuously restrictive. This paper suggests that an overly simplified fitness function (mapping the soma-germline composition of an organism to its growth rate) may be partly to blame. By allowing for a more complex fitness function (that captures the effect of upper and lower bounds for the contribution of somatic cells to organism fitness) the authors are able to identify three conditions for the evolution of irreversible somatic differentiation: costly cell differentiation (particularly for the redifferentiaton of soma-cell lineages to germ line); a high/near maximal organismal growth advantage imbued by a small proportion of soma cells; a large maturity size for the organism (typically greater than 64 cells).

      The model presented is simple and elegant, and succeeds in its aim of providing biologically feasible conditions for the evolution of irreversible somatic differentiation. Although the observation arising from the first condition (that high costs to reversible somatic differentiation promote the evolution of irreversible somatic differentiation) is perhaps unsurprising, the remaining conditions on the fitness function and the organism maturity size are interesting and initially non-obvious. Particularly tantalising is the prospect of testing these conditions, either against available empirical data, or in an experimental setting.

      The model does however make a number of simplifying assumptions, the effects of which may limit the broad applicability of the results.

      The first is to assume that cell division is synchronous, so that the costs of cell differentiation can be straight-forwardly averaged across the organism at each division. While the authors present a convincing biological justification for this assumption for algae such as Eudorina illinoiensis and Pleodorina californica, it is not immediately that this assumption should hold more widely.

      The second is to assume that the development strategy (i.e. the rates of differentiation between somatic and germ-line cell types) is constant throughout the organism's growth. For instance, there may be a growth advantage in the current model (aside from the advantages with respect to reduced mutation accumulation) of producing more germ cells early in the developmental programme, before transitioning to producing more soma cells in later development.

      Exploring such extensions to this model presents a seam of potential avenues for investigation in future theoretical studies.

      Thank you for the kind assessment of our findings. In the updated manuscript, we in addition investigated a model with asynchronous cell divisions. However, due to computational limitations, we are unable to fully replicate the investigation protocol of the original synchronous model. The execution time of the synchronous model scales linearly with the number of generations (n) and it still takes about a week to compute a single map like Fig.2A on a 2000-node cluster. The asynchronous model, in turn scales linearly with number of cell divisions, and hence, exponentially with generation time (as 2^n), which results in calculations taking much more time. For instance, the map in Fig.2A requires about 160 times more computer time with the asynchronous model. Nevertheless, we were able to implement this model for smaller organisms, with less statistics. There, we found that asynchronous model allows an evolution of irreversible somatic differentiation. However, it is suppressed comparing with the synchronous model – the fraction of Fcomp profiles promoting irreversible differentiation is much smaller and the organism size restriction is higher.

      To study a dynamic differentiation strategy would be wonderful. Early on, we considered studying this scenario. The crucial factor here is how flexible can the strategy be. In a naïve situation with a complete flexibility between every cell generation, the most successful strategy would be all cells of an organism first completely turn into soma-role to gain the maximal benefits, and then at the last step, they all convert back to germ to produce the maximal number of offspring. This is not observed in natural species; hence the flexibility of dynamic differentiation program must be constrained. We are curious to study what kind of constraints can lead to irreversible soma, but this task is beyond the scope of the current study. Our work with a constant differentiation program is the beginning of the future line of research. We are already looking forward to explore the space of dynamic differentiation programs in later projects.

    1. Author Response:

      Reviewer #1:

      Zappia et al investigate the function of E2F transcriptional activity in the development of Drosophila, with the aim of understanding which targets the E2F/Dp transcription factors control to facilitate development. They follow up two of their previous papers (PMID 29233476, 26823289) that showed that the critical functions of Dp for viability during development reside in the muscle and the fat body. They use Dp mutants, and tissue-targetted RNAi against Dp to deplete both activating and repressive E2F functions, focussing primarily on functions in larval muscle and fat body. They characterize changes in gene expression by proteomic profiling, bypassing the typical RNAseq experiments, and characterize Dp loss phenotypes in muscle, fat body, and the whole body. Their analysis revealed a consistent, striking effect on carbohydrate metabolism gene products. Using metabolite profiling, they found that these effects extended to carbohydrate metabolism itself. Considering that most of the literature on E2F/Dp targets is focused on the cell cycle, this paper conveys a new discovery of considerable interest. The analysis is very good, and the data provided supports the authors' conclusions quite definitively. One interesting phenotype they show is low levels of glycolytic intermediates and circulating trehalose, which is traced to loss of Dp in the fat body. Strikingly, this phenotype and the resulting lethality during the pupal stage (metamorphosis) could be rescued by increasing dietary sugar. Overall the paper is quite interesting. It's main limitation in my opinion is a lack of mechanistic insight at the gene regulation level. This is due to the authors' choice to profile protein, rather than mRNA effects, and their omission of any DNA binding (chromatin profiling) experiments that could define direct E2F1/ or E2F2/Dp targets.

      We appreciate the reviewer’s comment. Based on previously published chromatin profiling data for E2F/Dp and Rbf in thoracic muscles (Zappia et al 2019, Cell Reports 26, 702–719) we discovered that both Dp and Rbf are enriched upstream the transcription start site of both cell cycle genes and metabolic genes (Figure 5 in Zappia et al 2019, Cell Reports 26, 702–719). Thus, our data is consistent with the idea that the E2F/Rbf is binding to the canonical target genes in addition to a new set of target genes encoding proteins involved in carbohydrate metabolism. We think that E2F takes on a new role, and rather than being re-targeted away from cell cycle genes. We agree that the mechanistic insight would be relevant to further explore.

      Reviewer #2:

      The study sets out to answer what are the tissue specific mechanisms in fat and muscle regulated by the transcription factor E2F are central to organismal function. The study also tries to address which of these roles of E2F are cell intrinsic and which of these mechanisms are systemic. The authors look into the mechanisms of E2F/Dp through knockdown experiments in both the fat body* (see weakness) and muscle of drosophila. They identify that muscle E2F contributes to fat body development but fat body KD of E2F does not affect muscle function. To then dissect the cause of adult lethality in flies, the authors proteomic and metabolomic profiling of fat and muscle to gain insights. While in the muscle, the cause seems to be an as of yet undetermined systemic change , the authors do conclude that adult lethality in fat body specific Dp knockdown is the result of decrease trehalose in the hemolymph and defects in lipid production in these flies. The authors then test this model by presenting fat body specific Dp knockdown flies with high sugar diet and showing adult survival is rescued. This study concurs with and adds to the emerging idea from human studies that E2F/Dp is critical for more than just its role in the cell-cycle and functions as a metabolic regulator in a tissue-specific manner. This study will be of interest to scientists studying inter-organ communication between muscle and fat.

      The conclusions of this paper are partially supported by data. The weaknesses can be mitigated by specific experiments and will likely bolster conclusions.

      1) This study relies heavily on the tissue specificity of the Gal4 drivers to study fat-muscle communication by E2F. The authors have convincingly confirmed that the cg-Gal4 driver is never turned on in the muscle and vice versa for Dmef2-Gal4. However, the cg-Gal4 driver itself is capable of turning on expression in the fat body cells and is also highly expressed in hemocytes (macrophage-like cells in flies). In fact, cg-Gal4 is used in numerous studies e.g.:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4125153/ to study the hemocytes and fat in combination. Hence, it is difficult to assess what contribution hemocytes provide to the conclusions for fat-muscle communication. To mitigate this, the authors could test whether Lpp-Gal4>Dp-RNAi (Lpp-Gal4 drives expression exclusively in fat body in all stages) or use ppl-Gal4 (which is expressed in the fat, gut, and brain) but is a weaker driver than cg. It would be good if they could replicate their findings in a subset of experiments performed in Figure 1-4.

      This is indeed an important point. We apologize for previously not including this information. Reference is now on page 7.

      Another fat body driver, specifically expressed in fat body and not in hemocytes, as cg-GAL4, was tested in previous work (Guarner et al Dev Cell 2017). The driver FB-GAL4 (FBti0013267), and more specifically the stock yw; P{w[+mW.hs]=GawB}FB P{w[+m*] UAS-GFP 1010T2}#2; P{w[+mC]=tubP-GAL80[ts]}2, was used to induce the loss of Dp in fat body in a time-controlled manner using tubGAL80ts. The phenotype induced in larval fat body of FB>DpRNAi,gal80TS recapitulates findings related to DNA damage response characterized in both Dp -/- and CG>Dp- RNAi (see Figure 5A-B, Guarner et al Dev Cell 2017). The activation of DNA damage response upon the loss of Dp was thoroughly studied in Guarner et al Dev Cell 2017. The appearance of binucleates in cg>DpRNAi is presumably the result of the abnormal transcription of multiple G2/M regulators in cells that have been able to repair DNA damage and to resume S-phase (see discussion in Guarner et al Dev Cell 2017). More details regarding the fully characterized DNA damage response phenotype were added on page 6 & 7 of manuscript.

      Additionally, r4-GAL4 was also used to drive Dp-RNAi specifically to fat body. But since this driver is weaker than cg-GAL4, the occurrence of binucleated cells in r4>DpRNAi fat body was mild (see Figure R1 below).

      As suggested by the reviewer, Lpp-GAL4 was used to knock down the expression of Dp specifically in fat body. All animals Lpp>DpRNAi died at pupa stage. New viability data were included in Figure 1-figure supplement 1. Also, larval fat body were dissected and stained with phalloidin and DAPI to visualize overall tissue structure. Binucleated cells were present in Lpp>DpRNAi fat body but not in the control Lpp>mCherry-RNAi (Figure 2-figure supplement 1B). These results were added to manuscript on page 7.

      Furthermore, Dp expression was knockdowned using a hemocyte-specific driver, hml-GAL4. No defects were detected in animal viability (data not shown).

      Thus, overall, we conclude that hemocytes do not seem to contribute to the formation of binucleated-cells in cg>Dp-RNAi fat body.

      Finally, since no major phenotype was found in muscles when E2F was inactivated in fat body (please see point 3 for more details), we consider that the inactivation E2F in both fat body and hemocytes did not alter the overall muscle morphology. Thus, exploring the contribution of cg>Dp-RNAi hemocytes in muscles would not be very informative.

      2) The authors perform a proteomics analysis on both fat body and muscle of control or the respective tissue specific knockdown of Dp. However, the authors denote technical limitations to procuring enough third instar larval muscle to perform proteomics and instead use thoracic muscles of the pharate pupa. While the technical limitations are understandable, this does raise a concern of comparing fat body and muscle proteomics at two distinct stages of fly development and likely contributes to differences seen in the proteomics data. This may impact the conclusions of this paper. It would be important to note this caveat of not being able to compare across these different developmental stage datasets.

      We appreciate the suggestion of the reviewer. This caveat was noted and included in the manuscript. Please see page 11.

      3) The authors show that the E2F signaling in the muscle controls whether binucleate fat body nuclei appear. In other words, is the endocycling process in fat body affected if muscle E2F function is impaired. However, they conclude that imparing E2F function in fat does not affect muscle. While muscle organization seems fine, it does appear that nuclear levels of Dp are higher in muscles during fat specific knock-down of Dp (Figure 1A, column 2 row 3, for cg>Dp-RNAi). Also there is an increase in muscle area when fat body E2F function is impaired. This change is also reflected in the quantification of DLM area in Figure 1B. But the authors don't say much about elevated Dp levels in muscle or increased DLM area of Fat specific Dp KD. Would the authors not expect Dp staining in muscle to be normal and similar to mCherry-RNAi control in Cg>dpRNAi? The authors could consider discussing and contextualizing this as opposed to making a broad statement regarding muscle function all being normal. Perhaps muscle function may be different, perhaps better when E2F function in fat is impaired.

      The overall muscle structure was examined in animals staged at third instar larva (Figure 1A-B). No defects were detected in muscle size between cg>Dp-RNAi animals and controls. In addition, the expression of Dp was not altered in cg>Dp-RNAi muscles compared to control muscles. The best developmental stage to compare the muscle structure between Mef2>Dp-RNAi and cg>Dp-RNAi animals is actually third instar larva, prior to their lethality at pupa stage (Figure 1- figure supplement 1).

      Based on the reviewer’s comment, we set up a new experiment to further analyze the phenotype at pharate stage. However, when we repeated this experiment, we did not recover cg>Dp-RNAi pharate, even though 2/3 of Mef2>Dp-RNAi animals survived up to late pupal stage. We think that this is likely due to the change in fly food provider. Since most cg>DpRNAi animals die at early pupal stage (>75% animals, Figure 1-figure supplement 1), pharate is not a good representative developmental stage to examine phenotypes. Therefore, panels were removed.

      Text was revised accordingly (page 6).

      4) In lines 376-380, the authors make the argument that muscle-specific knockdown can impair the ability of the fat body to regulate storage, but evidence for this is not robust. While the authors refer to a decrease in lipid droplet size in figure S4E this is not a statistically significant decrease. In order to make this case, the authors would want to consider performing a triglyceride (TAG) assay, which is routinely performed in flies.

      Our conclusions were revised and adjusted to match our data. The paragraph was reworded to highlight the outcome of the triglyceride assay, which was previously done. We realized the reference to Figure 6H that shows the triglyceride (TAG) assay was missing on page 17. Please see page 17 and page 21 of discussion.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We are grateful to the reviewers for their thoughtful comments and propose the following experiments or clarifications listed below (blue) in a revised manuscript.


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The authors use a combination of Dsn1-Flag kinetochore purification from yeast extracts and laser trapping experiments (as in a number of previous studies), to study the effect of Mps1-dependent phosphorylation on reconstituted kinetochore-microtubule attachments in vitro. They complement this analysis with genetic experiments characterizing the effects of non-Mps1 phosphorylatable mutants on checkpoint activity and chromosome segregation in yeast.

      The authors had previously shown that Mps1 is the major kinase activity that copurifies with Dsn1-Flag in their purification scheme. They now investigate the effect of adding ATP and thereby allowing Mps1 phosphorylation in the reconstituted system. They show that addition of ATP decreases the rupture force of kinetochore-microtubule attachments, meaning it weakens the strength of the attachment. This effect can be negated either by inhibiting Mps1 with reversine, or by providing kinetochores in which the Mps1 phosphorylation sites on Ndc80 (most of them in the N-terminal tail) have been mutated to alanine. Thus, like the activity of Ipl1, Mps1 phosphorylation of the Ndc80 N-tail (which is known to be important for full MT affinity) weakens kinetochore-microtubule attachments.

      Cellular experiments demonstrate that non-Mps1 phosphorylatable Ndc80 14-A mutants have a functional mitotic checkpoint (contrary to previous claims by Kemmler et al., 2009), but show synthetic sickness with stu2 alleles that are involved in error correction.

      **Major points:**

      Within the framework of this experimental setting, the study as presented is logical and clear. The conclusions regarding the effect of Mps1 in this reconstituted system are overall well supported by the data. I have a couple of major and some minor points that can further improve data interpretation and should therefore be considered:

      1. In previous publications (e.g. Gutierrez et al., Current Biology 2020), the authors have reported that the Dam1 complex, an established Mps1 substrate, is required for full attachment strength in this system. Are the effects of Mps1-dependent Ndc80 phosphorylation and Dam1 independent from one another? For example would dad1-1 or non Cdk1 phosphorylatable Dam1 complex further reduce the rupture force in ATP? Or does Mps1 phosphorylation affect, for example, Dam1 binding to Ndc80?

      Response: To better understand the effects of ATP treatment, we analyzed the levels of Dam1 on the kinetochores after ATP treatment and did not see any change. We will add this data to a supplemental figure. Dam1 clearly makes a major contribution to the strength of the kinetochores because their strength even after ATP-treatment is higher than the rupture force of kinetochores purified from a dad1-1 mutant strain. However, as we report in the paper, blocking the eight Mps1 target sites in the tail of Ndc80 was sufficient to block the effect of ATP, so it is unlikely that phosphorylation of the Dam1 complex by Mps1 makes a major contribution to the ATP-dependent kinetochore weakening in vitro. We think Dam1 phosphorylation by Aurora B probably contributes independently to error correction, because the dam1-3D mutant, carrying phospho-mimetic substitutions in three Aurora B sites, is synthetically lethal when combined with the ndc80-8D phospho-mimetic mutant in eight Mps1 sites. We will add this genetic interaction data to the revised manuscript to provide additional information about the pathways.

      What is the effect of ATP on initial binding events? Are there differences in the fraction of beads that spontaneously attach laterally at the start of the experiment? This may allow to draw conclusions whether any kind of binding or specifically force-generating end-on attachments are affected by ATP.

      Response: We did measure a reduction in the fraction of free kinetochore-decorated beads capable of binding microtubules upon exposure to ATP (from 20% binding in the absence of adenosine to 11% in the presence of ATP). This observation suggests that the microtubule-binding activity of the kinetochores, like their rupture strength, is reduced upon exposure to ATP, as reported in the methods, in the "rupture force measurements" section. However, because we worked with a low density of kinetochores on the beads, the initial numbers of beads that spontaneously attached was quite low and free beads capable of binding to microtubules were relatively rare. In addition, when we find a bead already attached to the lattice, we cannot distinguish whether it bound initially to the lattice or instead bound to a tip that then grew beyond the bead. For these reasons, we feel it would be very difficult using our current approach to draw statistically significant conclusions about whether there were ATP-dependent changes in the relative affinities of the kinetochores for lateral versus tip attachments.

      Ndc80-8D has low attachment strength, consistent with lowered MT affinity of the phospho-mimetic Ndc80 tail. Interestingly, Supplementary Figure S4B shows that the amount of Cse4 in the pull-down western appears substantially reduced in 8D vs 8A or wt. Is the amount of co-purified inner kinetochore affected in this mutant? This may be an alternative explanation for decreased attachment strength, for example if the fraction of "full" or "complete" kinetochores may be reduced. Could this also happen upon inclusion of ATP?

      Response: The reviewer is correct that the level of Cse4 and other inner kinetochore components is slightly reduced in the Ndc80-8D kinetochores, for reasons that are not clear to us. However, the incubation of wild type kinetochores with ATP does not affect the levels of these proteins, suggesting that the weakened rupture strength is not due to reduced levels of these inner kinetochore proteins. We will add the data showing that ATP does not affect levels of inner kinetochore proteins into a supplemental figure to clarify this point.

      **Minor points:**

      page 13 (heading): "Weakening occurs via phosphorylation...". Probably good to mention what is weakened ("Weakening of kinetochore-microtubule attachments occurs via phosphorylation...".

      Response: We will alter the heading as suggested.

      page 14/Figure5C: Median Rupture Force for Ndc80-8D is 4.8 pN according to the text. In the graph it looks like >5 pN.

      Response: We thank the reviewer for noticing this mistake and will correct the median rupture force to 5.6 pN.

      page 23: comma missing between T21 S37 and T47 (should be T21, S37 and T47)

      Response: We thank the reviewer for noticing this omission and will correct it.

      page 24/25: different spelling of G1 (sometimes with subscript)

      Response: We thank the reviewer for noticing this inconsistency and will correct all to be G1.

      page 24/25: ug instead of µg

      Response: Thanks. We will fix this mistake.

      page 28: Figure 5B instead of Figure 5A

      Response: Thanks for noticing this mistake. We will correct this.

      Figure 6A: Lambda-Phosphatase treatment for 20 minutes according to figure legend and 30 minutes according to Material and Methods section.

      Response: The material and methods section specified a 20-minute incubation with phosphatase, in agreement with the figure legend. We believe the reviewer might have accidentally confused the time value with the temperature, which was 30 degrees.

      Figure 6E: One should not draw any conclusions from the anti-phospho T47 blot here, the quality is simply too poor to allow a statement regarding an mps1-1 effect

      Response: While the immunoblots with the T74 phospho-specific antibody are not as clean as many standard antibodies, we have reproduced the results multiple times and therefore feel comfortable concluding that there is a decrease in signal that is Mps1-dependent.

      Figure 6: Labelling T47P misleading (Proline substitution?, use pT47 instead)

      Response: We will change the labeling on this figure, as suggested, from T74P to pT74. To be consistent, we will also change this nomenclature in the text.

      Figure 6F: Make clear in the labelling that a stu2-AID background is used here, makes it easier to understand why Auxin is used here.

      Response: We will change the labeling, as suggested, to include the genotype of stu2-AID in the figure.

      how specific is reversine for yeast Mps1? I have not seen any data on this in previous publications.

      Response: Reversine is not necessarily specific for Mps1. However, the only kinase activity that co-purifies with the isolated kinetochores is from Mps1, so reversine should inhibit only Mps1 in our in vitro experiments. Nevertheless, to further address this concern, we will include optical trapping results using mps1-1 mutant kinetochores in the revised manuscript. We have already performed these additional experiments and found that mps1-1 kinetochores do not undergo ATP-dependent weakening, strongly reinforcing our conclusion that Mps1 is the major kinase involved.

      additional genetic interactions might be informative, if Ndc80-8D has weakened attachments, it may have synthetic effects with other mutants (dam1?), conversely, ndc80-8A may show genetic interactions with ipl1 alleles, for example.

      Response: We agree that the ndc80 phospho-mutant alleles might have genetic interactions with other mutants. Consistent with this prediction, we have found that ndc80-8D is synthetically lethal when combined with the dam1-3D mutant in three Ipl1 sites. As mentioned above, we will add this data into the revised text. We will also perform additional genetic interaction experiments with ipl1 and mps1 alleles and add any additional interactions we discover into the revised text.

      Reviewer #1 (Significance (Required)):

      The study adds to the characterization of the effects of Mps1 kinase on kinetochore-microtubule attachments and characterizes the cellular phenotypes of non-Mps1 phosphorylatable Ndc80 mutants. The major conceptual point that Mps1 phosphorylation can weaken kinetochore-microtubule interactions and thereby contributes to error correction in a manner similar to Ipl1 has previously been made in the literature. Maure et al., (Tanaka lab, 2007, Current Biology) have characterized the effects of mps1 mutant alleles on biorientation of authentic chromosomes and on replicated/unreplicated mini-chromosomes. In particular the experiments with unreplicated mini-chromosomes have revealed less frequent detachment in mps1 mutants, demonstrating that Mps1 activity is required to release attachments that are not under tension.

      Another benefit of this study is that it puts the Kemmler 2009 EMBO J. paper into perspective and corrects some of it claims. In particular the notion of sustained checkpoint activation in the Mps1 phospho-mimetic Ndc80-14D mutant, whose lethality was claimed to be rescued by checkpoint deletion. It is confirmed here that the allele is lethal but cannot be alleviated by simultaneous checkpoint deletion. Conversely, the Ndc80-14A mutant is shown to have a functional checkpoint. One could argue that since the publication of the Kemmler paper, the idea of requirement of Mps1 phosphorylation on Ndc80 for checkpoint activity has not gained any traction in the field, but it's still useful for the field to put some of these earlier claims into perspective. The paper will therefore be interesting to researchers working on mechanisms of chromosome segregation and error correction.

      From my background I cannot comment on technical details of the biophysical force spectroscopy experiments (laser trapping), but I have no reason to doubt that the authors accurately report their findings.

      Response: We sincerely thank the reviewer for their careful reading, helpful comments, and enthusiasm for our manuscript.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      This paper focusses on the mechanisms underlying chromosome biorientation in mitosis, an essential process that warrants equal chromosome segregation to the dividing cells. Correction of improper kinetochore-microtubule attachments relies on two conserved protein kinases, Aurora B and Mps1, that detach kinetochores that are not under tension in order to provide them with a second opportunity to establish bipolar connections. In vivo, Aurora B and Mps1 have intertwined functions and share some common targets. For this reason, despite the large body of literature on the subject, their precise roles in chromosome biorientation have been difficult to tease apart.

      The authors take advantage of an in vitro reconstitution assay that they previously published (Akyioshi et al., 2010) to identify the critical target(s) of Mps1 in weakening kinetochore-microtubule connections. The assay uses kinetochore particles purified from budding yeast cells that bear Mps1 but are notably deprived of Aurora B. Upon addition of ATP to activate the co-purified kinases (e.g. Mps1), kinetochores are added to coverslip-anchored microtubules to which they attach laterally. Through a laser trap, kinetochores are brought to the microtubule plus-end and pulled with increasing force until the kinetochore detaches, which allows measurements of the average rupture forces that reflect the strength of the attachments. The approach is straightforward and potentially very powerful, first because it provides a simplified experimental set-up in comparison to the cellular context, and second because it directly measures the impact of protein phosphorylation on the strength of attachments.

      The authors convincingly show that Mps1-dependent phosphorylation of the N-terminal part of Ndc80 significantly weakens the strength of kinetochore-microtubule attachments in vitro, while phosphorylation of other known Mps1 targets, such as Spc105, does not seem to have an effect. Eight phosphorylation sites in Ndc80, which were previously identified as Mps1-dependent phosphorylation sites (Kemmler et al., 2009), are shown to be critical to destabilise kinetochore-microtubule attachments in the in vitro reconstitution assays. The authors also present evidence for a moderate involvement of Ndc80 phosphorylation by Mps1 in correcting improper attachments in vivo, suggesting that additional mechanisms are physiologically relevant for error correction.

      The experiments are mostly well designed, the data are solid and support the main conclusions. However, to my opinion additional experiments could be performed, as outlined below, to strengthen the physiological relevance of the main findings and corroborate some of the conclusions.

      **Major points:**

      1. Given the partially overlapping function of Mps1 and Ipl1 (Aurora B) in error correction, the ndc80-8A mutant should display synthetic growth and chromosome mis-segregation defects with ipl1 temperature-sensitive alleles. Conversely, the ndc80-8D mutant should suppress the lethality at high temperatures of mps1-3 mutant cells, which were recently shown to be defective in chromosome biorientation (Benzi et al., 2020). Finally, chromosome mono-orientation could become apparent in ndc80-8A cells upon a transient treatment with microtubule-depolymerising drugs, which should amplify the cellular need for error correction.

      Response: We agree that further exploration of the possible genetic interactions might help to reinforce the physiological relevance of our main findings. Toward this goal, we will obtain the mps1-3 mutant to determine whether ndc80-8D can suppress its lethality and will add this to the revised manuscript if there is a positive result. As mentioned in response to Reviewer 1, we will add a synthetic lethal interaction between ndc80-8D and a dam1-3D mutant where the Aurora B sites are altered to the revised text. We will also perform additional genetic interactions with ipl1 and mps1 mutants and add any we find into the revision. As requested, we will perform a nocodazole wash out experiment, to determine if ndc80-8A cells show a defect in error correction and add this data to the revision if there is a defect.

      The authors show that Mps1-dependent phosphorylation of Ndc80 is not involved in the spindle assembly checkpoint, a conclusion that contradicts a previous report (Kemmler et al., 2009). They also find, in contrast with the same report, that the lethal phenotype of the ndc80-14D phospho-mimetic mutant cannot be rescued by disabling the spindle checkpoint. In my opinion, Kemmler et al. convincingly showed, through a number of different experimental approaches, that ndc80-14D cells die because of spindle checkpoint hyperactivation. Not only deletion of checkpoint genes was shown to rescue the lethality, but re-introduction of a wild type copy of the deleted checkpoint gene reinstated lethality. Thus, the explanation invoked here that spontaneous suppressing mutations could underlie the viability of ndc80-14D SAC-deficient mutants is not consistent with the published observations. A thorough examination by the authors of the phenotype of ndc80-14D cells in their hands should be carried out to support these conflicting conclusions. If authors find that ndc80-14D cells actually die because of chromosome mono-orientation, then this would highlight an important function for some or all the six additional phosphorylation sites, relative to the ndc80-8D mutant, for chromosome biorientation in vivo.

      Response: We were unable to reproduce the data that deletion of the spindle checkpoint suppresses lethality of the ndc80-14Dmutant, so it remains unclear why our results differ from those of the Kemmler paper. However, we note that re-introducing a wild-type checkpoint gene via transformation and restoring lethality to the ndc80-14D cells does not necessarily mean there were no suppressors. While that is one possible interpretation, another possibility is that there was a suppressor mutation in the viable ndc80-14D cells that also required the lack of the checkpoint to live. Kemmler and co-workers selected for viability on FOA media and never backcrossed those viable strains to show that they could regenerate the double mutant through a cross with the expected segregation pattern of two mutations, which would have been a more rigorous demonstration that the viability was specifically due to ndc80-14D and the checkpoint mutation. Instead, they transformed a wild-type copy of the checkpoint gene back into the strain that was selected for growth on FOA and showed that it reverted the phenotype. This approach cannot rule out a suppressor mutation that fails to suppress in the presence of an active checkpoint. Therefore, in our opinion, the Kemmler paper does not make an entirely convincing case that the ndc80-14D cells die because of spindle checkpoint hyperactivation.

      To further analyze the phenotype of ndc80-14D cells, we have constructed an Ndc80-AID ndc80-14D strain and added auxin, to deplete the wild-type copy of Ndc80. In agreement with the findings of Kemmler et al., this did trigger the spindle assembly checkpoint. However, when we made an Ndc80-AID ndc80-14D mad2 strain and analyzed segregation, we found that chromosome 8 missegregated in 28% of the cells compared to 2% of control cells. This observation suggests that there is a kinetochore defect in these cells that may have triggered the checkpoint and is inconsistent with the mutant solely activating the checkpoint in the absence of any other kinetochore defect. In addition, the levels of Ndc80-14D as well as Mps1 were altered on the mutant kinetochores. The combination of these defects strongly suggests that the ndc80-14D mutant alters kinetochore function in addition to leading to constitutive checkpoint signaling. Because our manuscript is mainly focused on phosphorylation of the Mps1 target sites within the N-terminal tail, we do not plan to add this data involving many additional sites, including Ipl1 target sites and sites on the CH domains of Ndc80, into the current manuscript. We will further pursue the other phosphorylation sites in the future.

      The conclusion that Spc105 phosphorylation by Mps1 is not required for the Mps1-mediated weakening of kinetochore attachments in vitro is based on the comparison between kinetochore particles bearing wild type, untagged Spc105 and particles bearing non-phosphorylatable Spc105-6A tagged at the C-terminus with twelve myc epitopes. Thus, the presence of the tag could obliterate the effects of the mutations in the phosphorylation sites by destabilising kinetochore-microtubule attachments in the presence of ATP. Consistent with this conclusion, Spc105-6A-12myc-bearing kinetochores withstand lower rupture forces than Spc105-bearing kinetochores upon ATP addition. Furthermore, Spc105-6A-12myc kinetochore particles show an interacting protein at MW above 150 KD that is not present in wild type particles (Fig. S2A), suggesting that either the tag or the mutations might affect kinetochore composition. Thus, this set of experiments should be repeated using Spc105-6A kinetochore particles lacking the tag.

      Response: If we understand correctly, the reviewer is suggesting that the myc tag on Spc105-6A could cause an ATP-dependent effect on kinetochore strength. While this is formally possible, it seems highly unlikely to us, for two reasons: First, a myc tag is not expected to bind nucleotides, and while it can sometimes have a general effect on protein stability or interfere with protein-protein interactions, we are not aware of any evidence for a myc tag directly causing an ATP-dependent effect in vitro. Second, when we measured Spc105-6A kinetochores in control experiments, without adenosine or with ADP, their rupture strengths were high like wild-type kinetochores. The strength of ADP-treated Spc105-6A kinetochores (8.7 pN), for example, was statistically indistinguishable from that of ADP-treated wild-type kinetochores (8.7 pN, p = 0.27 based on a log-rank test). The wild-type-like behavior of untreated and mock-treated Spc105-6A kinetochores indicates that their composition is not affected in a manner that significantly impacts kinetochore-microtubule strength.

      In general, it would have been informative to complement the data presented here with a mass spec analysis of the composition of kinetochore particles, at least for the experiments that are most relevant to the conclusions. For instance, the composition of the Ndc80-8A kinetochore particles is assumed to be similar to that of wild type kinetochores based on gel silver staining (Fig. S4A; note also that ndc80-8A particles are compared to ndc80-8D particles and not to wild type particles). However, the authors previously showed that kinetochore particles purified from dad1-1 mutant cells (affecting the Dam1 complex) have an apparently identical composition to particles purified from wild type cells by silver staining, yet they display significantly lower resistance to the rupture strength in vitro (Akyioshi et al., 2010). What is the status of the Dam1 complex (or other kinetochore subunits) in kinetochores purified from ndc80-8A/-8D or spc105-6A cells relative to wild type kinetochore particles?

      Response: We agree that further characterization of the kinetochore particle composition would be valuable and propose to further analyze the composition by purifying wild-type, Ndc80-8A, Ndc80-8D and Spc105-6A kinetochores and performing immunoblotting against the Dam1 complex. In addition, we will analyze the Ndc80-8A and Ndc80-8D kinetochores by mass spectrometry and report a qualitative analysis of the relative amounts of each kinetochore subcomplex in the revised manuscript supplementary data.

      **Minor comment:**

      I believe that the right reference for the sentence in the Discussion "If Aurora B is defective, for example, the opposing phosphatase PP1 prematurely localizes to kinetochores" is Liu et al. 2010.

      Response: We had cited the reference showing this effect in yeast, since our work was performed in yeast. We will also add the Liu et al paper, which showed the same result in human cells.

      Reviewer #2 (Significance (Required)):

      Although the experiments are well designed and the conclusions are mainly supported by the data, the question arises as to what extent the in vitro assays recapitulate, at least partly, what happens in vivo. An emblematic example is the involvement of Spc105 in the error correction pathway. The Biggins lab previously showed that Spc105 phosphorylation by Mps1 and subsequent Bub1 recruitment is not only essential for the spindle assembly checkpoint, but is also crucial for chromosome segregation in vivo, as shown by slow-growth phenotype and aneuploidy of the spc105-6A non-phosphorylatable mutant (London et al., 2012). Additionally, a recent paper showed that Spc105 is a crucial Mps1 target in chromosome biorientation (Benzi et al., 2020).

      In sharp contrast, the ndc80-8A mutant, which in vitro completely erases the ability of Mps1 to destabilise kinetochore-microtubule attachments, displays no growth defects in otherwise wild type cells and only modestly enhances chromosome mis-segregation in a mutant affecting an intrinsic correction pathway (stu2ccΔ). The N-terminal part of Ndc80 (aa 1-116) containing the aforementioned eight phosphorylation sites can even be deleted altogether without any consequence on cell viability (Kemmler et al., 2009). Thus, although the in vitro assays presented here produced clear-cut and reproducible results, their physiological relevance in vivo remains unclear.

      Left apart this criticism, the manuscript has several merits outlined above and will be of interest for people working in the fields of chromosome segregation, kinetochore assembly, spindle assembly checkpoint, etc.

      Expertise of this reviewer: mitosis and related checkpoints

      Response: We are grateful to the reviewer for carefully reading our manuscript and detailing their concerns. We agree that it can be challenging to establish the physiological relevance of experiments performed in vitro. However, our in vitro approach allowed the effects of Mps1 specifically on kinetochore-microtubule attachment strength to be disentangled from its numerous other effects in vivo. In our view, the relatively mild phenotypes associated with mutants in the Mps1 phosphorylation sites on the Ndc80 tail are consistent with similarly mild phenotypes of mutants in the Aurora B phosphorylation sites on the Ndc80 tail. In both cases, this appears to be due to additional error correction pathways that compensate in vivo.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Sarangapani, Koch, Nelson et al. applied a combination of in vitro biophysical assays with purified kinetochore particles and in vivo analyses to investigate the contribution of Mps1 kinase to kinetochore-microtubule (KT-MT) attachment stability and error correction.

      The manuscript is well written and the authors nicely highlight the facts that 1) the focus of the field has long been on the contribution of Aurora kinases (Ipl1 in budding yeast) to attachment stability and error correction, and 2) it has been difficult to assess the relative contributions of Aurora versus Mps1 kinases in cell-based experiments. The authors note that their KT particle assay is uniquely positioned to address this gap in our understanding and to specifically isolate the contribution of Mps1 to attachment stability in vitro. The findings are well-presented and quite convincing although I have several comments that should be addressed to strengthen the central conclusion that this work has isolated the contribution of Mps1 in their assays.

      **Major points:**

      1) I think it is important to note that reversine is not specific for Mps1 kinase - although it is typically presented as such in the field. It was initially identified as an Aurora kinase inhibitor (IC50: ~25nM (Aurora B) - 900nM (Aurora A)) that turned out be an even more potent Mps1 inhibitor (IC50 ~6nM). I have concerns that the in vitro assays were done with 5 uM reversine - a concentration so high that it could certainly inhibit any Ipl1 that is present (see comment 3 below) and possibly even inhibit Bub1 activity as Santaguida et al. (JCB, 2010) measured an IC50 >1uM for Bub1 inhibition. It is important to complement/confirm the chemical inhibitor experiment by repeating the rupture assays +/- ATP in KT particles purified from the mps1-1 strain (shown in Figure 6).

      Response: We agree that reversine is not necessarily specific for Mps1 and this concern was also brought up by Reviewer 1. Because Mps1 is the only kinase activity that co-purifies with the isolated kinetochore particles, we expect reversine to inhibit only Mps1 in our in vitro assays. However, to further address this point, we will add rupture force assays using kinetochores purified from mps1-1 mutant cells to the revised manuscript. We have already performed these experiments and they confirm that kinetochores lacking Mps1 do not undergo ATP-dependent weakening. We did not put this data into the original submission because the experiment needs to be performed differently due to altered Dam1 levels. But we will clarify the changes in the materials and methods and add the data to a supplementary figure.

      2) If the ATP-mediated reduction on rupture force is lost in the mps1-1 KT particles, which will also lack Bub1 kinase, then preserving the ATP-dependent reduction in rupture force from KT particles purified from the Bub1delta mutant strain would be strong evidence that the contribution of Mps1 kinase has been disentangled from other kinases in this assay.

      Response: Although Mps1 recruits Bub1, we think it is unlikely that we are assaying Bub1 kinase activity in our in vitroexperiments. We cannot detect Bub1 activity on the purified kinetochores using a sensitive radioactive kinase assay (London et al, Curr Bio 2011), and the levels of Bub1 in our kinetochore purifications are very low (for example, see Akiyoshi et al, Nature, 2010). However, we agree with the reviewer that this caveat should be mentioned and will add this point to the revised text for clarity.

      3) Recent work has shown that Sli15-Ipl1 interacts with and is recruited to KTs by the COMA complex (Rodriguez et al., Curr Biol, 2019 and Fischbock-Halwachs et al., eLife 2019) and that this population of Ipl1 is important for accurate chromosome segregation as also shown 10 years prior by Knockleby and Vogel (Cell Cycle, 2009). I realize that this group previously showed (London et al., Curr Biol, 2012) that phosphorylation of KT particles was not affected when purified from the ipl1-321 mutants, but in light of the recent findings how sure are the authors that there is not any Sli15-Ipl1 in the preparations? I think commenting on this would be worthwhile.

      Response: We have not detected Ipl1 or Sli15 in the numerous mass spectrometry experiments we have performed on the kinetochore purifications. In addition, we have been separately assaying the effects of Ipl1 phosphorylation on kinetochores for another project (de Regt, https://doi.org/10.1101/415992), which independently confirmed that the only detectable kinase activity in our kinetochore purifications is Mps1. We will add this additional reference to the manuscript.

      4) Since the interplay between Mps1 and Aurora B are central to this story, the authors should expand upon the sentence on page 5 reading "While there is some evidence that Mps1 regulates Aurora B activity (Jelluma et al., 2010; Saurin et al., 2011; Tighe et al., 2008), significant data suggests it has an independent role in error correction and acts downstream of Aurora B (Hewitt et al., 2010; Maciejowski et al., 2010; Maure et al., 2007; Meyer et al., 2013; Santaguida et al., 2010)." I am not entirely convinced that the in vivo experiments presented here differentiate as to whether Mps1 is upstream from Ipl1 or whether they are acting independently? For example, phosphorylation of T74 looks to be completely lost in figure 6E (although it's difficult to tell since the blot for T74P is very smeary). If they are acting independently in error correction then Ipl1 should still be able to phosphorylate T74 in this condition. However, if the P-T74 really is lost completely in the mcd1-1 cells then this suggests to me that Ipl1 is downstream of Mps1 in this live cell error correction assay.

      Response: We thank the reviewer for bringing this to our attention. We did not mean to imply that Mps1 is downstream from Aurora B in budding yeast and were intending only to summarize findings from the literature regarding other organisms. We will revise this section of the text to make that point clearer, and we agree that the order of events remains unresolved. In addition, we will note that Mps1 does not eliminate the phosphorylation detected by the T74 antibody in the revision, to avoid misconceptions about the order of events.

      **Other points:**

      1) On p.8 "a median strength of 7.5 pN, similar to untreated and ADP-treated kinetochores". Similar is vague so I'm curious as to whether there a statistically significant difference between this and the 9.8 pN and 8.7 pN measured in the other conditions. If so this could be explained by partial dephosphorylation with the phosphatase.

      Response: The quoted phrase refers to the 7.5-pN strength measured when λ-phosphatase was included together with ATP (data from Fig. 1D and Supp. Fig. S1B). P-values computed from comparisons of survival plots using the log-rank test show that this strength was not significantly different from the ADP-treated wild-type (8.7 pN, p = 0.06), nor was it significantly different from the ADP- and MnCl2-treated wild-type (8.1 pN, p = 0.35). However, it was barely significantly different from MnCl2-treated wild-type (8.6 pN, p = 0.03), and it was more significantly different from untreated wild-type (9.8 pN, p = 0.0007). With the revised manuscript, we will include a supplemental table with p-values computed from log-rank tests for all the key statistical comparisons, including those mentioned here.

      2) On p.19 the authors note that Aurora A phosphorylates Ndc80 tail during mitosis. Ye et al. (Curr Biol, 2015) also showed that Aurora A can phosphorylate Aurora B sites and that this activity "converges" at the tail to weaken attachments during error correction.

      Response: We will add the reference and thank the reviewer for pointing out this omission.

      3) Optional: I am curious as to whether the addition of ATP to the Ndc80-8D particles further reduces the rupture force. If so then other sites may also be in play.

      Response: We agree this is an interesting question but we have not yet performed those assays and agree it might be worthwhile for a future study.

      4) Please comment on why MnCl2 is used in the rupture assays in Figure S1. I saw no mention of this in the main text.

      Response: We include MnCl2 in the assay because it is required for phosphatase activity and will add this point to the legend of supplementary Figure S1.

      5) Consider moving S2 A and B to Figure 3 C and D. This is an interesting result and would go well in the main figure next to the significantly reduced rupture force measurements for the 6A mutant so the reader doesn't have to dig into the supplemental for the data providing this reasonable explanation for the rupture force result.

      Response: We thank the reviewer for this suggestion and will move S2A and S2B into Figure 3.

      Reviewer #3 (Significance (Required)):

      The significance of this relates to focusing on an important phenomenon - error correction - and in looking beyond the traditional focus of the field on Aurora kinases to Mps1 kinase, which is largely implicated in checkpoint signaling. Disentangling the contributions of these two players is an important advance.

      The work will be of interest to audiences interested in: kinases, cell division, checkpoints, kinetochore biology, biophysics

      The above areas of interest overlap with my expertise.

      Response: We thank the reviewer for their enthusiasm for our experiments that help distinguish kinase activities and thus contribute to understanding the process of error correction.

    1. 1. A solution which has become increasingly popular for dealing with resistance to change is to get the people involved to “participate” in making the change. But as a practical matter “participation” as a device is not a good way for management to think about the problem. In fact, it may lead to trouble.

      Kunal - This is what we are trying to do. They have proposed a solution a few paragraphs below.

    1. Author Response:

      Reviewer #1 (Public Review):

      The authors have studied mutations in the K13 gene that is linked to Artemisinin resistance in a range of African parasites. They show that these mutations can confer resistance in a in vitro survival assay but that they are often linked to reduced fitness. The authors also show that different parasites have less of an impact on fitness when the K13 mutations are introduced in line with the suggestion that the overall genetic background is critical for transmission of K13 mutations. The paper also shows evidence that genes potentially contributing to the genetic background are not involved.

      The overall work involves a significant amount of work that to generate a wide range of different parasite lines that allow a detailed assessment of how different mutations interact with the genetic background of the parasite. This provides a significant amount of new insights. A key conclusion the authors draw from this work relates to the relationship between fitness and resistance and by inference on why artemisinin resistance has occurred in SE Asia. While this indeed would be a striking conclusion I think the data at this stage is not strong enough to make this claim. The claim is mainly based on Figure 3 E and F as well as 5 C and D. While indeed, initially it looks like RSA has much less of a survival impact in Dd2 there is some concern that the data is generated using different baselines (isogenic WT parasite in Figure 3 and Dd2eGFP in Figure 5 D). This is noteworthy as in Figure 5C the Dd2wt parasite is used and the fitness cost appears to be different.

      Please see our reply below to Reviewer 1 Comment #2.

      A striking finding is that the UG659C560Y line appears to have a relatively small fitness cost - especially if looked at for the whole 40 generations rather than the somewhat arbitrarily picked 38 days. This data could suggest that there are parasites in Africa that have the capacity to acquire resistance with minimal cost to fitness.

      We thank the Reviewer for this suggestion and have now recalculated our fitness data using a 36-day period, which we have adopted as a standardized timeline and which allows us to compare across all prior and newly acquired fitness assays. We note that this is already relatively lengthy compared to a number of other reports in the literature. For example, Baragana et al. (2015, Nature) measured competitive growth rates over a 14-day period. Gabryszweski et al. (2016, Mol Biol Evol) used 20-day assays. Siddiqui et al. (2020, mBio) used longer 48-day assays. We agree with the Reviewer that our data suggest that some African strains can achieve in vitro ART resistance with a minimal cost to fitness. In support of this, our new data presented in the revised Figure 3 provide evidence for the R561H mutation having little to no fitness cost in 3D7 parasites that are closely related to Rwandan isolates (see our response above to Comment #2 from the Editors).

      As pointed out above, we now include new fitness data on the R561H variant in African parasites, based on competition assays with an eGFP reporter line. To standardize our fitness data, we now have analyzed our data to day 36 across assays, as follows:

      Methods lines 538-539: “Cultures were maintained in 12-well plates and monitored every four days over a period of 36 days (18 generations) by harvesting at each time point a fraction of each co-culture for saponin lysis.”

      Figure 3 Legend lines 920-921: “K13 mutant clones were co-cultured at 1:1 starting ratios with isogenic K13 wild-type controls over a period of 36 days.”

      The selective sweep to C560Y in SE Asia is something that has been known for a while. It is striking that it has been selected as based on the data presented here P563L has a similar fitness and RSA profile. The authors could explore this further.

      The Reviewer highlights the important point that RSA values and fitness were comparable for C580Y and P553L, yet only the former swept across Southeast Asia. This would argue for additional factors that contribute to the successful dissemination of C580Y. These could include favorable genetic backgrounds that help propagate C580Y mutant parasites, or increased transmission rates, relative to P553L. To date, reasons for C580Y’s success beyond its moderate resistance and relatively minor fitness cost have not been firmly established. One possibility might be related to piperaquine pressure that selected for amplification in plasmepsins II and III as well as novel mutations in PfCRT, which emerged in parasites harboring K13 C580Y and which have been shown to spread as a series of genetically closely related sublineages (referred to as KEL1/PLA1; Hamilton et al. 2019, Lancet Infect Dis; Imwong et al. 2020, Lancet Infect Dis). These points are discussed as follows:

      Discussion lines 361-369: “Our studies into the impact of K13 mutations on in vitro growth in Asian Dd2 parasites provide evidence that that the C580Y mutation generally exerts less of a fitness cost relative to other K13 variants, as measured in K13-edited parasites co-cultured with an eGFP reporter line. A notable exception was P553L, which compared with C580Y was similarly fitness neutral and showed similar RSA values. P553L has nonetheless proven far less successful in its regional dissemination compared with C580Y (Menard et al., 2016). These data suggest that additional factors have contributed to the success of C580Y in sweeping across SE Asia. These might include specific genetic backgrounds that have favored the dissemination of C580Y parasites, possibly resulting in enhanced transmission potential (Witmer et al., 2020), or ACT use that favored the selection of partner drug resistance in these parasite backgrounds (van der Pluijm et al., 2019).”

      Overall, the main conclusion that there are K13 mutations that can confirm resistance to Art in the context of African parasites is clearly presented and convincing and this highlights the risk that exists for public health officials in African nations. What would be interesting from a readers perspective is how likely it is that this loss of fitness hurdle is going to be overcome in Africa and whether the risk of resistance development will increase as transmission rates drop.

      We appreciate this suggestion from the Reviewer. Our revised manuscript now addresses this topic as follows:

      Discussion lines 393-399: “It is nonetheless possible that secondary determinants will allow some African strains to offset fitness costs associated with mutant K13, or otherwise augment K13-mediated ART resistance. Identifying such determinants could be possible using genome-wide association studies or genetic crosses between ART-resistant and sensitive African parasites in the human liver-chimeric mouse model of P. falciparum infection (Vaughan et al., 2015; Amambua-Ngwa et al., 2019). Reduced transmission rates in areas of Africa where malaria is declining, leading to lower levels of immunity, may also benefit the emergence and dissemination of mutant K13 (Conrad and Rosenthal, 2019).”

      Reviewer #2 (Public Review):

      In this paper, the investigators performed two large-scale surveys of the propeller domain mutations in the K13 gene, a marker of artemisinin (ART) resistance, in African (3299 samples) and Cambodian (3327 samples) Plasmodium falciparum populations. In the African parasite population, they identified the K13 R561H variant in Rwanda, while parasites from other areas had the wild-type K13. In Cambodia, however, they documented a hard genetic sweep of C580Y mutation that occurred rapidly. They generated the C580Y and M579I mutations in four different parasite strains with different genetic backgrounds and found that these mutations conferred varying degrees of in vitro ART resistance. They further edited the SE Asian parasite strains Dd2 and Cam3.II with 7 K13 mutations and found that all the propeller domain mutations conferred ART resistance in the Dd2 parasite, whereas three of the mutations did so in the Cam3.II background. The R561H and C580Y mutations were also evaluated in several parasites collected from Thailand. In vitro growth competition analysis showed that K13 mutations caused substantial fitness costs in the African parasite background, but much less fitness costs in the SE Asian parasites. This study demonstrated the potential emergence of ART resistance in African parasite populations and offered insights into the importance of the parasite's genetic background in the emergence of ART resistance.

      We thank the Reviewer for this thorough summary and favorable assessment of our work.

      Reviewer #3 (Public Review):

      Stokes et al address the question: Why have mutations in the K13 gene spread rapidly across South East Asia and led to widespread treatment failure with artemisinin-based antimalarials? In contrast, why do K13 mutations remain quite rare in Africa, and artemisinin-based antimalarials remain effective?

      The work combines a number of different studies on different parasites of different origins. Gene editing has been used to assess the effects of K13 mutations in different parasite backgrounds, leading to a very complex view of the competing factors of level of resistance conferred and fitness cost.

      The authors put forward the hypothesis that fitness costs associated with K13 mutations select against their dissemination in the high malaria transmission settings in Africa. However, the complexity of the genetic backgrounds of the parasites makes it difficult to tease out the contributing factors.

      We agree that these are complex and multifactorial areas of investigation and appreciate the Reviewer’s summary.

    1. Author Response:

      Reviewer #1 (Public Review):

      This work described a novel approach, host-associated microbe PCR (hamPCR), to both quantify microbial load compared to the host and describe interkingdom microbial community composition with the same amplicon library preparation. The authors used the host single (low-copy) genes as PCR targets to set the host reference for microbial amplicons. To handle the problem that in many cases, the host DNA is excessive compared to the microbiome DNA, the authors adjusted the host-to-microbe amplicon ratio before sequencing. To prove the concept, hamPCR was tested with the synthetic communities, was compared to the shotgun metagenomics results, was applied in the biological systems involving the interkingdom microbial communities (oomycetes and bacteria), or diverse hosts, or crop hosts with large genomes. Substantial data from diverse biological systems confirmed the hamPCR approach is accurate, versatile, easy-to-setup, low-in-cost, improving the sample capacity and revealing the invisible phenomena using regular microbial amplicon sequencing approaches.

      Since the amplification of host genes would be the key step for this hamPCR approach, the authors might also include more strategy discussions about the selection of single (low copy) genes for a specific host and the primer design for the host genes to guarantee the hamPCR usage in the biological systems other than those mentioned in the manuscript.

      A deeper discussion about the design of suitable host primers has been added to the Supplementary Information as Supplementary Discussion 3, and is now mentioned in the main text in the first section of the Methods.

      Reviewer #2 (Public Review):

      Lundberg and colleagues provide a detailed set of data showing the utility of host-associated microbe PCR. By simultaneously amplifying microbial community and host DNA, hamPCR provides an opportunity to measure the microbial load of a sample. I was largely convinced about the robustness of this approach after seeing the many different optimization datasets that were presented in the paper. I also appreciated the various applications of hamPCR that were demonstrated and compared to other standard approaches (CFU counting and shotgun metagenomics, for example). As clearly illustrated in Figure 6f, hamPCR could dramatically improve our understanding of interactions within microbiomes as it helps remove issues of relative abundance data.

      One challenge about the approach presented is that it cannot be quickly adapted to a new system. Unlike most primers for 'standard' microbial amplicon sequencing, considerable time will be required to determine which host gene to target, how to make that host gene size larger than the size of the microbial amplicon, etc. This may limit wide adoption of hamPCR in the field. I do appreciate the authors providing some details in the Supplement on how they developed hamPCR for the several different systems described in this paper. The helpful tips may make it easier for others to develop hamPCR for their own systems.

      Additional strategy of primer design was addressed in the response to Reviewer #1 Public Review.

      An issue that repeatedly came up is that at high and low ends of host:microbe ratios, inaccurate estimates can occur. For example, with high levels of microbial infection, the authors note that hamPCR has reduced accuracy. The authors propose three solutions to this problem (1. altering host:microbe amplicon ratio, 2. use a host gene with higher copy number, 3. and adjust concentrations of host primers), but only present data for #1 and 3. Do they have any data to show that #2 would actually work?

      One instance of potential unreliable load that sticks out in the paper is in Figure 5b. The authors note that this is likely due to unreliable load calculation. Is this just one of 4 replicates? What are other potential reasons this would be an outlier and how can the authors rule this out? Did they repeat the hamPCR for this outlier to confirm the striking difference from the other three samples in the eds1-1 Hpa + Pto sample?

      Both qPCR and amplicon sequencing can be used to detect copy number variation in genomes [1]. Because amplicon-based methods are known to be sensitive to small differences in gene copy number, we are confident, without generating additional data on the topic, that #2 would work.

      Furthermore, bacterial genomes from different taxa are known to vary slightly in their copy number of 16S rDNA, usually from between 1 to about 15 copies [2]. These variations are reflected in sequence counts from amplicon sequencing, biasing the counts towards taxa with more 16S rDNA gene copies [2, 3, 4]. This phenomenon has been well documented, distorts the accurate description of microbial communities, and therefore has led to some efforts to correct 16S rDNA gene amplicon data by dividing the counts from each taxon by the (estimated) 16S rDNA copy number of that taxon, so that the counts better reflect the numbers of bacterial cells.

      Because amplicon methods are sensitive to copy number variation (whether those copies are from inside the same cell, or coming from different cells), we reasoned that choosing a host gene with a higher copy number, similar to the effects of copy number variation on 16S rDNA gene counts, will increase the representation of that host amplicon in the final library (because there will be more template host DNA molecules available to amplify). We did not test this explicitly - we think the evidence from literature is strong support on its own. We have added to the paper a statement that now references the Kembel 2012 paper, which we hope adequately supports our claim:

      “Second, a host gene with a higher copy number could be chosen for HM-tagging throughout the entire project, which would increase host representation by a factor of that copy number (Kembel et al., 2012).”

      1) Martins, W.F.S., Subramaniam, K., Steen, K. et al. Detection and quantitation of copy number variation in the voltage-gated sodium channel gene of the mosquito Culex quinquefasciatus . Sci Rep 7, 5821 (2017). https://doi.org/10.1038/s41598-017-06080-8

      2) Kembel, S. W., Wu, M., Eisen, J. A., & Green, J. L. (2012). Incorporating 16S gene copy number information improves estimates of microbial diversity and abundance. PLoS Computational Biology, 8(10), e1002743. https://doi.org/10.1371/journal.pcbi.1002743

      3) Starke, R., Pylro, V. S., & Morais, D. K. (2021). 16S rRNA Gene Copy Number Normalization Does Not Provide More Reliable Conclusions in Metataxonomic Surveys. Microbial Ecology, 81(2), 535–539. https://doi.org/10.1007/s00248-020-01586-7

      4) Louca, S., Doebeli, M., & Parfrey, L. W. (2018). Correcting for 16S rRNA gene copy numbers in microbiome surveys remains an unsolved problem. Microbiome, 6(1), 41. https://doi.org/10.1186/s40168-018-0420-9

      Could the DNA extraction method used cause biases in hamPCR for/against either the host or the microbiome? If two different labs study the same system (let's say bacterial communities growing on Arabidopsis leaves) but use different DNA extraction approaches, would we expect them to obtain different answers using hamPCR? Did the authors try several different DNA extraction methods to see if this is an issue? Or has another team of researchers considered this and addressed it in a separate paper? I would appreciate seeing either data to address this or a discussion paragraph that reasons through this.

      Differences in DNA extraction method will certainly change the results, not only of the microbe-to-plant ratio, but also in the representation of microbes, because microbes differ in their sensitivity to different lysis methods. This is a well-documented concern in microbiome studies and has been demonstrated by using different methods on the same mock community in papers such as the following:

      Yuan, S., Cohen, D. B., Ravel, J., Abdo, Z., & Forney, L. J. (2012). Evaluation of methods for the extraction and purification of DNA from the human microbiome. PloS One, 7(3), e33865. https://doi.org/10.1371/journal.pone.0033865

      Albertsen, M., Karst, S. M., Ziegler, A. S., Kirkegaard, R. H., & Nielsen, P. H. (2015). Back to Basics--The Influence of DNA Extraction and Primer Choice on Phylogenetic Analysis of Activated Sludge Communities. PloS One, 10(7), e0132783. https://doi.org/10.1371/journal.pone.0132783

      In short, if the DNA is not extracted because plant or microbial cells are not lysed, it cannot be amplified in PCR. However, there is a good overall strategy to minimize the problem, as also proposed in the above papers, and that is to err on the side of a harsher lysis (using strong bead beating, as we have done), since this will leave fewer cells unlysed (and thus less information will be hidden). We note that similar concerns about lysis methods changing results also apply to DNA extraction for qPCR and live bacterial isolation for CFU counting (for which too harsh a lysis will kill bacteria, but too gentle a lysis will not release them from host tissue).

      We addressed this in two places. First, in the results section we mention briefly the following:

      “All DNA preps employed heavy bead beating to ensure thorough lysis of both host and microbes, as an incomplete DNA extraction can lead to underrepresentation of hard-to-lyse cells (Albertsen et al., 2015; Yuan et al., 2012).”

      Second, we added a paragraph to the discussion about sample selection and DNA extraction as follows:

      “Because hamPCR can only quantify the DNA available in the template, choice of sample and appropriate DNA extraction methods are very important. In particular, the sample must in the first place include a meaningful quantity of host DNA. For example, although there is some host DNA in mammalian fecal samples or in plant rhizosphere soil samples, this host DNA does not accurately represent the sample volume, and therefore relating microbial abundance to host abundance probably has less value in these cases. Further, the DNA extraction method chosen must lyse both the host and microbial cell types. An enzymatic lysis suitable for DNA extraction from pure cultures of E. coli may not lyse host cells or even other microbes. Appropriate DNA preparation methods for metagenomics have been thoroughly evaluated elsewhere (Albertsen et al., 2015; Yuan et al., 2012), and a common point of agreement is that strong bead-beating increases the yield and completeness of the DNA extraction, but comes at the cost of some DNA fragmentation. Especially for short reads, as we have used here, this fragmentation is not a problem, and we recommend to err on the side of a harsher lysis, using strong bead beating potentially preceded by grinding steps using a mortar and pestle as necessary for tougher tissue.”

      One emerging theme in microbiome science is to have consistent methodologies that are used across studies/labs to allow direct comparisons of microbiome datasets. Standardization of approaches may make microbiome science more robust in the long-term. Given much of the nuance in developing hamPCR for different systems, my impression is that this method is best for comparing samples within a particular host-microbe system and not across systems. For example, it may be challenging to directly compare my bacterial load hamPCR data from Arabidopsis to another lab's if we used different Arabidopsis host genes or if we used different 16S gene regions. Can the authors unpack this a bit in a discussion paragraph? If it is widely adopted, is there a way to standardized hamPCR so that it can be consistently used and compared across datasets? Or should that not be the goal?

      There appears to be considerable non-specific amplification or dimers in the gels presented throughout the manuscript. Could this non-specific amplification vary across host-microbe primer combinations? Would this impact quantification of host and microbial amplicons?

      Non-specific amplification / dimers do vary across host-microbe primer combinations. Indeed, they also vary between common 16S rRNA primer pairs used on their own (not shown). Fortunately non-specific amplicons amplified during the exponential PCR step do not, at least with our method, seem to impact quantification of host and microbial amplicons.

      One reason is that non-specific amplicons can be recognized by their sequence and ignored. After the sequences of the amplicons have been extracted from the short read data, only those that match expected length and sequence patterns of the targeted amplicons need to be counted. Non-specific amplicons are certainly a nuisance because they represent wasted sequencing resources, but they can be excluded bioinformatically and therefore do not change the accuracy of the microbial load measurement. This is in contrast to ddPCR/qPCR, for which any off-target amplicons are also quantified!

      A second reason is that the sensitive exponential amplicon step of hamPCR is done with a single primer pair. Off-target sequences do squander PCR reagents including primers and dNTPs, such that they become limiting at earlier cycles than without off-target sequences, but because the exponential PCR step is done with a single primer pair, such inferior amplification conditions are shared by all molecules, and therefore do not differentially affect the host or microbial amplicon. Any off-target binding occurring in the initial tagging reaction (before the PCR step) would certainly be a concern if the reaction was carried on long enough, because for example the microbial primer pair might become limiting at an earlier cycle number, leading to underestimates of microbial load. However, limiting the tagging cycle to a low number of cycles ensures that – should primers targeting a particular host or microbial amplicon be non-specific – the fraction still available to bind the correct sequence remains in excess.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      General comments:

      We thank the reviewers for their constructive critique and are pleased they see the results as interesting and of general relevance. We also acknowledge their concerns on the issue of whether all claims are supported by sufficiently strong data. Our careful reading and analysis of the points that are raised suggest there are different reasons for the different cases that are brought up:

      1. Misunderstandings, due to lack of clarity on our side. Example: When talking about ‘reduced actin’, our wording focussed on the endosome-associated actin (partly out of consideration for the fact the actual measurements we show come from the area around the endosomes, so we did not want to make any stronger claims), even though we should have made it clear that other areas of the cell tip are also affected. This will be addressed by clearer explanations.

      Ill-advised wording we chose that is or can be seen as overinterpretation.

      Example: ‘anchoring’ of actin at endosomes. We had not intended to infer anything about specific anchoring sites or mechanisms. We should have used a more neutral term, such as ‘associate with’ or accumulate around’ for the description. This and other cases can also be resolved by rewriting and better wording.

      Anecdotal evidence or insufficient data.

      Example: Images of phalloidin stainings depicting how actin is organized around late endosomes in control embryos. These and other cases will be addressed by adding further examples and additional quantification.

      Finally, one suggestion was made for obtaining additional experimental data, which would involve laser ablation. While the experiment would provide an interesting extension of our findings, we will sadly not be in a position to carry it out in the foreseeable future, as explained below. We hope the referees will agree that our now extended discussion addresses the point in question sufficiently to support the conclusions from the experiments we do present.

      These and all other points are addressed individually below. We highlighted the corresponding text changes in the manuscript file for the reviewers to identify them more easily.

      Detailed responses:

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      **Summary**

      Rios-Barrera and Leptin investigate the formation and guidance of the subcellular tube that forms in the terminal cell of the dorsal branches of the Drosophila tracheal system. In previous work the authors documented the presence of late endosomes at the tip of the growing terminal cell, ahead of the forming subcellular tube, which are involved in membrane recycling {Mathew, 2020 #1407}. In this present work they analyze the organization of the actin cytoskeleton in the tip terminal cell, in relation with the late endosomes, and assess the guidance of the subcellular tube. They find that the presence and localization of late endosomes play a role in tube guidance. They also find that late endosomes recruit actin around them, mediated by the activity of the actin polymerization regulator Wash, which is recruited to maturing late endosomes. When Wash activity is decreased, actin around late endosomes is decreased and tube guidance is compromised. Based on this observation, laser ablation experiments of actin ahead of the tube and actin staining at the tip of the terminal cell, the authors propose an exciting model: late endosomes recruit actin, which connects the actin pool of the basal membrane and the actin pool of the apical (subcellular tube) membrane thereby directing tube growth and guidance.

      The manuscript is well-written and well-presented, the images and movies are of high quality and the experimental data, which is technically challenging, is very good and sufficiently replicated.

      **Major comments:**

      1. A critical point in the model that the authors put forward (which is also contained in the title and abstract) is that actin organized at late endosomes anchors apical and basal actin cortices. However, there is no clear and conclusive evidence for this. Clear evidence in this direction should be provided to propose it as a mechanism (as it is in the text, particularly in the first sentences of the discussion) and imply it in the title. The authors show endogenous actin around late endosomes and actin fibers at the tip of the terminal branch. However, at the level of resolution presented (Fig 3A,B), it is not possible to determine whether the different actin populations are actually "anchored". I suggest to present stronger data supporting this important conclusion.

      In the same direction, it would be critical to show that this anchoring of actin fibers is disturbed when actin enrichment at the late endosome is perturbed (see also point 5).

      Actually, the authors show that when Vha or Wash activity are downregulated actin accumulation around the CD4 vesicles decrease. However, this experiment has a few inconveniences. First, it is difficult to determine levels of a construct that is overexpressed (UAS-utr::GFP). Could the authors use phalloidin or an actin antibody to confirm the result?

      Second, I find the result difficult to interpret. In the images provided I see a general decrease of actin (UtrGFP) at the tip, not only around the CD4 vesicles (Fig 6D,F) . Are these mutant conditions also affecting the rest of actin pools? If this is the case, can the authors attribute the defects exclusively to the abnormal recruitment of actin to the late endosomes?

      Most importantly, the authors should analyze the pattern of actin distribution (labelling endogenous actin) and determine a possible loss of "anchoring" of fibers when late endosome maturation is perturbed.

      We understand the referee addresses three issues here, to which we will respond in turn below:

      • As already mentioned above, the referee interprets the term ‘anchoring’ in a more specific meaning than we had intended it to have. We obviously have to rephrase.
      • A technical critique of the use of an overexpressed construct to visualize actin, which in turn has two sub-points: potential physiological effects on actin, and potentially inaccurate localisation. Both are valid points, but in our view do not undermine our conclusions. We will raise and discuss these concerns in our revised text.
      • The specificity of the effects of reducing Vha and Wash of function on actin associated with endosomes versus throughout the growth cone of the cell – a very good point, about which we should have been much clearer and now will be. (a) Considering the use of the term ‘anchoring’, and the referee’s concern over whether we provide the appropriate evidence gave us with a good starting point to re-think what we actually show and how it can be interpreted.

      Put in neutral terms, what [we felt] we had shown was an accumulation or enrichment of actin around endosomes that was dependent on proper functioning of Vha and Wash.

      We agree that the term ‘anchoring’ cannot be justified by the description of actin localisation alone. The term implies a physical (and perhaps strong or long term) interaction between the endosome and the surrounding actin.

      We see a strong enrichment of actin around endosomes, including in experiments in which we use phalloidin to visualize actin (Fig. 3B). The resolution of our images is approximately 200nm so they are able to reveal the very close association. The question is what the mechanistic basis for this closeness is. It is unlikely to be random, as shown by a quantification we have now included (Figure S3C). It is difficult to imagine how it could persist without at least transient physical interaction between the two components. The association is indeed highly dynamic and is constantly being re-established. This must mean that something ‘attracts’ actin to endosomes, most likely a molecule that is itself associated with endosomes. The presence or accessibility of such a molecule depends on the proper maturation of endosomes, as shown by the results of reducing Vha activity. And the ability of actin to associate depends on Wash. Together these findings suggest to us the existence of a (dynamic) molecular link between the endosomes and the actin network.

      In order not to give the impression that we are claiming a permanent ‘anchor’, we now use more general terms such as ‘associates’ or ‘accumulates’, but also include the clarification on our thinking in the text. Furthermore, to illustrate a representative range of cases, we will add more examples of late endosomes and the actin meshwork surrounding them (Figure S3A, B). These images should give a broader reflection of the actin populations and their dynamism during tube growth.

      (b) A major reason why we use live imaging with actin reporters is that the distribution of actin around late endosomes and the tip compartment in general is very dynamic, so capturing cells at the right time point can be challenging from fixed samples. This problem is exacerbated by a technical limitation: For the actin cytoskeleton to be well preserved during fixation, embryos have to be manually dechorionated which limits the throughput of the experiment. We therefore found that analysing cells over time is more informative than analysing cells fixed at a given time point.

      As the reviewer points out, using an overexpressed reporter can have drawbacks. With regard to the problem of not representing the endogenous distribution faithfully, this can be the case when making statements about the absolute distribution. However, what we are looking at here is not absolute quantities of actin but relative changes in the area of interest with respect to other, unaffected regions of the cells, and then comparing these between mutant conditions and the control. We do this by normalizing the signal to the levels seen in the subcellular tube, using it as an internal control that allows us to adjust for variation in expression levels.

      There is on case where such a normalization could be problematic, and that is when comparing actin levels in cells expressing bitesize RNAi, because Bitesize is itself involved in organizing the actin cytoskeleton in the tube membrane (JayaNandanan et al., 2014). However, in this experiment, the analysis still shows that actin levels at late endosomes do not correlate with the tube misguidance phenotype.

      With regard to potential physiological effects of an over-expressed construct, some of the commonly used actin reporters have subtle effects on actin physiology, whereas Utr-ABD has negligible or no effects on the actin cytoskeleton, and it also reproduces actin dynamics faithfully (Spracklen et al., 2014). It is therefore generally considered the most reliable tool for live imaging of actin in Drosophila.

      We have adapted the text and commented on these issues and hoped we have achieved more clarity.

      (c) We agree that when Vha or Wash are downregulated, actin levels are overall reduced in the growth cone of the cells, while this is not the case in other regions, for example at the base of the cell. Although we had not explicitly stated this (but now will), this is a further indication that the different actin populations in the growing tip interact with each other.

      For the downregulation of Wash, this could potentially have been due to a direct effect of Wash on the apical and basal actin, but then we would have expected a similar result in other parts of the cell, including the cell body and the proximal part of the branch, but we do not see that. Even more importantly, the expression of Vha100-DN has the same effect and this cannot be easily explained by a direction action on actin. Together, these findings therefore indicate that depletion of actin around endosomes has a knock-on effect on the basal and apical actin cortex in the vicinity. We have included this reasoning in the paper now.

      Another critical point in the model put forward by the authors is that late endosomes drive tube guidance. To test this point the authors use an elegant system to mislocalize Rab7 late endosomes.

      However, the effects are not strong (1G), and only a proportion of branches show misguided tubes. Do the cases with a ventrally-guided tube in the experiment Rab7:YFP+/+ (Fig. 1G) have a CD4 endosome (with Rab7YFP) at the tip? This would help to explain the weak effect.

      This is an excellent point, and it is indeed what we observe: all cells with ventrally guided tubes have a late endosome that is positive for the YRab7-nanobody-membrane complex at the tip of the cell (n=42), whereas only 2/3 of misguided tubes do (n = 12), and those always have the additional endosome at the tip of the misguide tube. As the reviewer suggests, this provides an obvious explanation for why these cells do not have a tube misguidance phenotype. We have added a representative image of this condition (Rab7::YFP+/+, ventrally-guided tube) in Figure 2 to illustrate the phenotype.

      What is the cause that preventing proper endosome maturation and acidification leads to misguided tubes (rather than missing ones)?

      A complete loss of late endosome activity would indeed result in the absence of the subcellular tube. However, we and others have shown that partial loss of function (as caused by RNAi) can have more subtle effects. For instance, fully blocking endocytosis using the shibire**ts line completely prevents proper tube extension (Mathew et al., 2020), but expression of a shibire RNAi still allows tube formation to proceed, albeit in a defective manner (Schottenfeld-Roames et al., 2014). Similarly, the misguidance phenotypes resulting from Vha downregulation likely reflect weaker loss of late endosome function. These perturbations would allow initial tube growth to proceed, but later on they would uncover this later function of the endocytic pathway in regulating tube guidance.

      We believe that what we see as this weaker defect is an uncoupling of direction from growth per se. The cells still receive their growth-inducing signals from the FGF-receptor, and this leads to directed cell growth in the direction of the chemotactic signal. The normal trafficking of membrane material from the apical to the basal domain is also not disrupted. Thus, membrane keeps being added to both domains and both the tube and the basal domain continue to growth. However, the growing tube has been disconnected from its guiding structure at the tip of the cell (our speculation: because failed endosome maturation no longer allows proper actin coordination) and therefore follows a random path. We had not been sufficiently clear about this but have now hopefully remedied this in the text.

      The authors indicate that downregulating Vha activity leads to defects in acidification, but late endosome-MVB normally form. It is intriguing to see extra CD4 vesicles (like in 1C or 6C).

      Wouldn't we expect to see "normal" tip accumulation of CD4 vesicles only, and not extra ones? How relevant are these extra CD4 vesicles?

      Wouldn't we expect to see "non functional" CD4 vesicles, unable to recruit actin and lead intracellular tube formation (i.e. no tube) rather than missguidances? (1D shows higher proportion of misguided tubes than no tubes)

      Similarly, is Wash-RNAi producing extra CD4 vesicles (as observed in movie 5, fig 6E)?

      We do not postulate that the late endosomes are morphologically normal – there are vesicles carrying the CD4 marker (which is only a membrane marker, not specific for endosomes), but the literature indicates that the endosomes do not undergo their normal maturations, and we would have no reason to claim otherwise. So we agree that the ones we see in the Vha-downregulated cells are not fully functional, and this is indeed confirmed by their inability to recruit actin.

      With regard to the number of large CD4 vesicles at the tip, terminal cells can normally have from 1 to 3 in the growth cone, and the fact that the experimental cells we showed were at the upper range whereas the control at the lower end was pure chance. We have now quantified the number of vesicles in the abnormal conditions and see that there is no increase (Figure S5F).

      Actin recruitment to late endosomes was already documented, where it plays a role in cargo trafficking.

      The authors propose that Wash is recruited to late endosomes upon acidification where it would prime actin nucleation around the endosome. The authors indicate a decrease in Wash accumulation upon expression of Vha dominant negative. However, this decrease is not quantified. In addition, it is difficult to determine levels of a construct when this is overexpressed (UAS-Wash::GFP). It would be desirable to use antibodies against the endogenous protein (Wash in this case) to claim differences in accumulation in mutant conditions.

      We have quantified the amount of Wash::GFP in CD4 vesicles. As mentioned, the vesicles are very dynamic, and so is their recruitment of Wash::GFP, and doing the analysis in the live cells is therefore more meaningful than extracting information from fixed samples, but we will also try to obtain the antibody for confirmation in fixed material. We appreciate that as discussed above for actin, results using overexpressed constructs have to be interpreted with care, but here again, we mitigate against this by assessing relative changes rather than absolute amounts and mitigate against misinterpretation by normalizing the signal to the one seen in the cytoplasm.

      The results presented do not rule out a requirement of Wash in terminal branching which is not associated with the enrichment in the late endosomes. The genetic interaction observed with Shrub is also compatible with both proteins acting on terminal branching but in different/parallel mechanisms.

      While the fact that downregulation of Vha has the same effect cannot be explained in this manner, we agree with the reviewer and will rephrase this section in the paper.

      Laser ablation experiments

      The laser ablation experiments are difficult to interpret.

      First, it is unclear to me what the results exactly indicate. What does the recoil observed suggest? Does it fit with the expected tension exerted by a link of the actin cytoskeleton relayed by late endosomes?.

      The observed recoil suggests that there was tension across the ablated area. The laser ablation experiments were one way to evaluate whether the actin cytoskeleton within the tip of the cell was continuous between the subcellular tube and the leading edge of the cell. Tension along this axis would support such a model. We assumed that if the actin cytoskeleton at the tip is continuous with both membrane compartments it was likely to be under tension, and our laser ablation experiments showed that is indeed the case. We have rewritten this section to make it clearer.

      From the text and figure I don't understand how is the recoil calculated: retraction of the subcellular tube backwards? "enlargement" of the bleached area?

      Briefly, we had used three measuring points: the backward displacement of (i) the subcellular tube and (ii) of the plasma membrane adjacent to the ablated area, which both retract towards the cell body, and we also measured (iii) the forward displacement of plasma membrane on the other side of the ablated area. We then calculated the average of these for each experiment.

      However, we have now redone the evaluations of these experiments using PIV, an established method that is commonly used to calculate initial recoil after ablation and have explained this in the text.

      Second, it is unclear to me what laser ablation actually ablates. Does it only affect actin? Or are also CD4-late endosomes and other tip structures affected?

      The laser ablations with the conditions we use have in the past been shown to temporarily disrupt the actin cytoskeleton without otherwise damaging the cell (Rauzzi et al., 2015).

      The ablations were done in cells that express the actin reporter Utr::GFP together with the membrane marker CD4::mIFP but we have no reason to believe that CD4 containing structures were damaged. For example, upon ablation, the CD4 vesicles in the ablated area are bleached, but in the recovery phase, we observe actin puncta in the positions where CD4 vesicles were originally located, suggesting that the vesicles themselves persist. Our interpretation of these observations is that the bleached CD4 vesicles do not recover their fluorescence (CD4::mIFP is an integral membrane protein and cannot simply be re-inserted within short periods), but they are still capable of recruiting actin. We have added a representative image of this to better describe the experiment (Fig. S4).

      Third, is the recovery observed after ablation correlated with new actin recruitment around old or new late endosomes?

      Actin rapidly reappears in the bleached area and the region that recoiled, where it is first seen in the basal cortex and filopodia. The tube re-extends towards the ablated area, and actin reassembles around the tube within seconds. During further recovery, actin reappears in puncta ahead of the tube and we assume that this is partly de novo assembly around the existing vesicles (Fig. S4A, B). At the same time, we also see new CD4 vesicles reaching the tip, so it is likely that both populations (old and new vesicles) mediate the recovery phase. We have added images of additional examples that illustrate these points.

      Forth, I find the experiments in cells with secondary subcellular tubes very confusing and the explanations very speculative

      The data on cuts in cells with tube duplications are indeed difficult to interpret, and because the emergence of secondary branches is unpredictable, it is not easy to obtain large numbers of observations. Figure S4 is another example of the response of these cells to the laser cut, and we will make clear that our interpretations are merely speculative.

      Finally, and most importantly. I think that performing laser ablation experiments in mutant conditions that affect actin recruitment (VhaDN and Wash RNAi,....) would be very informative. One would expect to find a decrease in recoil. If this was the case, it would validate, on the one hand, that in control conditions there is a tension that depends (at least in part) on actin organization, and on the other hand it would show that when actin recruitment is affected tension decreases, supporting the "anchoring" model. I understand that laser ablation experiments are not easy to perform, but I think this would be a useful experiment.

      To my understanding, as it stands, the laser ablation experiments "....support the notion that adequate cytoskeletal organization at the tip is required for tube guidance and stability" as the authors acknowledge, but they do not convincingly support their "anchoring" model

      Laser cuts on cells that express Vha100-DN or wash-RNAi would be a nice addition that would take the work to the next level. But sadly, these are among the experiments that right now are impossible to carry out because of all the logistical and other problems resulting from the Covid pandemic, as explained in the cover letter.

      **Other comments:**

      • From the images presented, it is often difficult to figure out where the subcellular tube forms, the presence of vesicles, the cell morphologies,... and to determine the correlation between the CD4 vesicles and tube guidance.

      This is the result of a frustrating technical limitation. In experiments in the past we have used markers for the outline of the cell, as we do here, too. Thus, where CD4 is expressed under the btl-gal4 driver it marks the entire outline of the cell against a completely negative background. Even for other markers, if expressed under btl-gal4, the outline of the cell is visible against the dark background. However, for endogenously marked proteins that are expressed ubiquitously, this is no longer true, and as we add more markers to follow different structures, we run out of fluorescent colours for everything we would like to highlight (and genetically, out of chromosomes to accommodate the necessary transgenic or endogenously modified constructs). We will provide tracings of the outlines of the cells to make the images clearer.

      For instance, in Fig 1H and 1J, is there a "lateral" CD4 vesicle? Why it does not generate a missguided tube?

      Yes, there are also CD4 vesicles closer to the proximal part of the cell. They are enriched at but not restricted to the tip of the cell. As we have shown previously (Mathew et al., 2020), they emerge along the subcellular tube, and most are transported towards the tip (also seen in Fig. 1A, for example). Why the remaining ones do not affect the guidance of the tube is unclear, but it is almost certain that the growth of the tip of the cells towards the chemotactic FGF signal plays a role: the basal membrane is constantly moving away from the tip of the tube at this location, but not at the sides further down the branch.

      Fig 1I, are there 2 subcellular tubes? Can the authors mark them? I cannot really visualize them with the CD4 marker, they seem stalled or short or missing.

      In Fig 1I, the tube is curled up inside of the cell, a phenotype often seen in larval terminal cells with excessive FGF signaling (for instance see Ukken et al., 2014). We added diagrams that explain the morphology of the tubes in this figure.

      Fig 1L: what do the authors mean by "corrected" tube sprouts?

      This is not well phrased, and we will also improve the figure to make the point clearer.

      Panels 1K-M (now 1H-J) show snapshots from a movie of a cell that originally had only a misguided tube (at the top left) and is here in the process of forming its ‘correct’ tube growing in the ventral direction. In 1L (now 1I) this second tube is showing first signs of emerging, in 1M (now 1J) it is clearly visible. We have changed the wording in the figure and add an explanation in the legend, and we added a second example of this process in Figure S1.

      It is difficult to identify the cell in Fig 2D-F

      We added a dotted line in one of the channels showing the general morphology of the cell.

      • Movie S3: I find it difficult to spot the association of CD4 and utrGFP that the authors point. Can the authors label in the movie the vesicles and the association?

      We added pauses in the movie and arrows to the frames where actin is seen surrounding late endosomes.

      • The results with the Rab7 downregulation and upregulation are not very clear.

      Does the downregulation of Rab 7 (Rab7 DN construct) have any effect on tube guidance?

      Does it decrease or eliminate actin association with CD4 vesicles in the embryo? The authors show that in the larvae expression of Rab7 DN leads to loss of actin enrichment in Rab7 vesicles. Does this have an effect on terminal branching?

      Rab7DN is not visible in the embryo so we did not pursue further experiments in those stages and we previously showed that loss of Rab7 does not affect branching in larvae (Best and Leptin 2019). However, as the reviewer rightfully pointed out, expression of Rab7DN prevents actin nucleation at late endosomes in larval stages, so having the phenotypic consequence of this experiment would be informative and we are grateful for the observation. We had done the experiment, and we found no difference in the number of branches compared to controls. This suggests either that at larval stages actin recruitment at late endosomes is no longer required for branching or that there are redundant mechanisms that can balance the lack of actin nucleation. We favour the second model, because it has been shown that microtubules also play a role in tube branching and in coordinating the actin cytoskeleton (Araujo lab, 2021), so it is possible that actin nucleation can be bypassed. This is also consistent the fact that the phenotypes we describe are not all fully penetrant, again pointing to redundant mechanisms ensuring consistent directed growth.

      We added the data regarding Rab7DN to the manuscript (Figure S2).

      The Rab7 active construct produce effects at larval stages but not in the embryo. Is terminal cell branching in the larvae also dependent on late endosomes? Can the authors show "excess" of late endosomes in the larvae that lead to extra terminal branches? Even that the authors indicate that they cannot detect Rab7Q67L, can they find any effect at embryonic stages (e.g. presence and position of CD4 vesicles, other unrelated effects,...)?

      Expression of Rab7CA in the embryo generates similar defects as nanobody-mediated mislocalisation of Rab7. We include below an example for the reviewer, but we did not feel comfortable including these data in the paper because some technical complications made them impossible to document and interpret with the certainty that we would wish. Most importantly, the YFP fusion protein is not detectable at embryonic stages, even with the most sensitive microscopes and detectors available to us. This means that we cannot correlate the observed phenotypes with the presence or absence of Rab7CA, which in our view makes them too weak for publication. At face value, these results suggest that Rab7CA begins to trigger branching during embryonic development, which eventually leads to the excess number of branches we see in the larva, but alas, we think this is too speculative to include in the paper.

      • In some examples in the movies there seem to be a correlation between CD4 vesicles presence/positioning and basal lamellipodia/filopodia or actin enrichment, and also in -btl experiments. Have the authors explored this? They may want to comment on this in the discussion section.

      That is a very pertinent point, and we should indeed have commented on it. If we assume the reviewer is looking at examples such as the one in Fig. 1I (currently S1C), then the explanation is the following. The terminal cells in the embryo often form transient side-branches, presumably in response to a low level FGF signals from the environment. In those cases, the basal actin cytoskeleton rearranges in the branching area to form the filopodia that lead the outgrowth of the branch, and what the reviewer observed is that this transient branch also forms the late endosome structure that we see in the main or proper growth cone. Thus, the guiding FGF-signal leads to a reorganisation of the entire actin cytoskeleton in the growth cone, and the formation of the actin-covered endosome is part of that process. We have included this in the discussion.

      Reviewer #2 (Significance (Required)):

      This work is relevant for the morphogenesis field and deals with the important issue of how the cytoskeleton regulates shape and cellular events. The work represents a deep analysis of a specific issue in the specialized field of tracheal development, but the results may be relevant for other types of cells forming subcellular tubes. Describing a function of trafficking vesicles (late endosome in this case) in cell morphogenesis (in addition to cargo trafficking) in an in vivo system is also relevant to advance in the cell biology field.

      **Referees cross-commenting**

      I agree with the comments of reviewer #1. I find relevant the points raised in "major comments number 2 and 4".

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      The authors investigate the role of late endosomes in

      the context of actin organization during cell morphogenesis. They use as experimental model the polarized terminal cells in the Drosophila tracheal system that forms a sub-cellular projection containing a tube. The authors show that disruption of the sub-cellular localization or maturation of late endosomes leads to increased proportion of terminal cells with mis-guided tubes. Their analysis indicated that endosomal F-actin recruitment is crucial for the directionality of the tube growth. The authors propose a model where, late endosomes control a coordinated crosstalk between endosomal and cortical actin pools to drive subcellular tube-guidance.

      **Major comments:**

      1. The conclusion about how WASH functions in the tube-guidance, is not clearly shown and it should be better explained and documented. It is known that loss of function of the WASH leads to dysregulation of endosomal tubulation inducing enlarged endosomes, which in turn affects the endosome-to-plasma-membrane recycling of various cargos (including luminal cargos like Serp) (Gomez, et al., Mol. Biol. Cell, 2012; Dong et al, Nat. Comm. 2013). The authors should clarify if there is a defect in the integrity of endosomes located close to the cell tip in the btl>WASHIR knock down. In the cartoon panel C' (Figure 7), the endosomes in the cell tip are shown intact (Including their relative position to the tip) but no experimental data support this conclusion.

      Given the fact that Wash contributes to proper late endosome morphology we do not necessarily expect the endosomes to look normal. We had not shown this in the diagram because our own data do not directly address this point, but the literature is of course clear enough about this, so we have modified our diagrams so that they better reflect the expected phenotypes and included a reference to the relevant literature.

      We and others have shown the important role of late endosomes in plasma membrane and luminal cargo delivery, and as elaborated in the response to referee 2’s point 3, complete loss of endosomal function blocks these processes. Here, at reduced but not abolished function plasma membrane delivery is clearly still functional.

      -The functional analysis of WASH was based on RNAi knock down. The authors express a single RNAi construct against WASH. The expression of this RNAi line gave a low penetrance phenotype. A well-known caveat of RNAi is off-targeting. Hence, phenotypic analysis needs to include a verification by a second independent RNAi construct or a rescue of the RNAi phenotype with an overexpressed cDNA of WASH. Ideally, the null wash mutant (Nagel et al. 2017) can be used to confirm the phenotype.

      Analysing wash mutants would provide a welcome additional confirmation of the knockdown results, and it is true in general that poorly characterised RNAi lines can have off target effects. However, this is a well validated line: Nagel et al. (2017) showed that the same RNAi line that we used fully recapitulates the phenotype seen in wash mutants: In both cases, actin fails to localize to late endosomes, and this is what we also found in terminal cells.

      Whereas we believe therefore that the experiment is not essential to support our conclusions, we agree it is desirable and have ordered these flies. However, progress is being hampered by import restrictions at the first author’s lab: the necessary paperwork for flies to be imported for his work is still under revision by officials. The experiment thus cannot be done at the moment.

      The authors claim a role of the late endosomes in subcellular tube growth and guidance. But show no data on lumen formation to prove tube presence in the tracheal terminal cells of V100R755A, btl>WASHIR, shrb mutants or in GrabFP-Bint treated terminal cells. The interpretation and quantification of the phenotypic classes "miss-tube-guided" and "ventrally-tube-guided" are based on membrane markers and not on luminal markers. The presented data with the provided resolution does not prove if the mCD4-mIFP or PH-GFP markers define apical membrane protrusions/extensions or tubular structures. Therefore, the classifications of the tube-guidance phenotype and the quantification of "distance from tube to tip" may be suggestive. The authors need to provide additional confocal data of co.stainings of the endosomal compartments with luminal antigens (i.e. GASP or Serp or Verm).

      We are very unsure as to what this would add and in what context it would be necessary. Membrane and actin markers have been widely used to follow the formation of the subcellular tube by all groups working in this field. There is ample documentation in the literature that the subcellular tube, as defined by luminal content (Serp, Verm, Gasp, CBP-GFP, ANF-GFP) is surrounded by apical plasma membrane which carries apical transmembrane proteins (Crb, Uif) and their associated apical cytoplasmic complexes (Par3, Par6, aPKC, pMoesin), and apical phospholipids which can be visualized by specific PIP-binding markers, e. g. The PLC-d PH-domain that binds to PIP2 (see, e.g., Kato et al., 2004; Oshima et al., 2006; Okenve-Ramos & Llimargas, 2014 (here they use both luminal and actin reporters); Ochoa-Espinosa et al., 2017; and from our lab: JayaNandanan et al., 2014; Mathew et al., 2020. Therefore, all labs in this field have used these markers interchangeably to visualize the subcellular tube and we are not aware of a single case where luminal shape and apical membrane shape were not exactly congruent.

      We have in fact used luminal markers in some experiments here, but we believe there is no reason to assume that luminal markers would have a different distribution compared to membrane, apical or actin reporters in any the experiments described here. Finally, the focus of the paper is on the behaviour of the early out-growing membrane rather than the mature tube, and on how membrane is remodelled in this process by modifications in the actin cytoskeleton. Including confirmation of the presence of luminal material would not add to the paper.

      page 8 line 248, the authors interpret that reducing the dose of Shrb by half strongly enhances the wash-RNAi phenotype and suggest that WASH and Shrb act in the same pathway. Shrub is a subunit of the ESCRT-III complex involved in inward membrane budding of endosomes and WASH functions in outward endosomal membrane budding.

      The Shrb and WASH form discrete molecular complexes in endosomes. The authors should consider that Shrb and WASH may well act in parallel to control directional tube growth.

      This is a good point and we will rephrase our conclusions from this experiment.

      The authors use nanobody-based GFP trap construct to investigate the effect of Rab7YFP localization. This is an excellent way to provide novel information for protein miss-localization in vivo. Using this method the authors concluded that ... "the correct distribution of late endosomes is required for proper tube guidance" (page 5, lines 157-158). The authors obviously consider that GrabFP-B-Int construct affected the distribution of late endosomes. However, this is unclear and additional control experiments are needed to support the author's claims. For instance, did expression of GrabFP-B-Int, target the Rab7-YFP protein or the Rab7-associated endosomes? With the presented data, it is not clear if the Rab7-YFP positive vesicles are endosomes? or aggregates formed by the trapped Rab7-YFP protein? Co-stainings using GFP in Rab7-YFP terminal cells with another endosomal markers i.e. Avl, or hrs, should be provided. It is also not clear if endocytosis of apical/basal membrane or luminal cargos was affected in GrabFP-B-Int treated terminal cells. The loss of endocytic components has been associated with defects in subcellular tube shape and morphology (Schottenfeld-Roames et al, Cur Biol. 2014). The authors should clarify these issues.

      The nanobody would of course bind both to free Rab7::YFP (if there is any available) and to endosome-associated Rab7::YFP. However, in addition to Rab7::YFP we also assayed the distribution of CD4::mIFP, a membrane-associated protein that is seen at very low levels in all membranes (Mathew et al., 2020), but highly enriched in cytoplasmic vesicles, which we showed by co-expressed markers to correspond to endosomes (Mathew et al., 2020). If the nanobody sequestered free Rab7::YFP, we would expect little overlap between Rab7::YFP and CD4::mIFP puncta. Instead, we see that the large Rab7::YFP/nanobody puncta have membrane associated with them (63% of vesicles are triple positive, vs 8% of Rab7::YFP-GrabFP vesicles) indicating that they are not merely Rab7 aggregates. We will include a quantification of the degree of overlap between these components.

      Regarding the question of whether endocytosis is affected, we believe this is unlikely, or if it is at all, only to a minimal extent, since growth of the outer membrane, which crucially depends on endocytosis, continues in these cells. We have added a comment to this effect in the text. The cells look very different from cells in which endocytosis has been inhibited.

      In the legends of Figure 7 (C'), the authors stated that.... "lack of actin regulators at the basal cortex prevents the connection of the actin meshwork at the tip to the basal plasma membrane".... by depicting the singed mutant phenotype. singed mutant analysis is not shown in the manuscript.

      Singed/Fascin has previously been shown to be required for actin organization in fillopodia (Okenve-Ramos & Llimargas, 2014). We have now included new data that show that cells expressing singed RNAi also have reducedamounts of actin at late endosomes, and that reduced actin correlates strongly with tube misguidance. This shows that an actin bundling protein that has previously been shown to be needed for actin bundles in filopodia again affects actin around endosomes, providing another illustration that these compartments interact with each other.

      Our quantifications on actin around late endosomes show that interfering with endosome maturation, actin nucleation via Wash and basal/filopodial actin all lead to loss of actin around endosomes, and the misguidance phenotype correlates with actin loss (Figure 6J). By contrast, disruption of the apical actin cortex does not affect endosomal actin but does lead to misguidance. This establishes a hierarchy of actin organisation in the tip of the cell: basal actin affects endosomal actin, loss endosomal actin affects both apical and basal actin, but apical actin does not feed back on endosomal. All three pools are nevertheless required for tube guidance.

      The authors consider the late endosomes nucleate actin ahead of the tube (i.e. page3, line 87-88, page 9, line 285). This is not very convincing from the presented data. The authors should provide some quantitative data showing that lack of WASH (and endosomal F-actin network) effects the apical and basal F-actin pools in the tip of the cell.

      If we understand the reviewer correctly, there are two comments included in this point: (i) whether actin is nucleated at late endosomes, and (ii), whether reducing endosomal F-actin affects apical-basal actin pools in the tip of the cell.

      (i) As stated above in the response to reviewer #2, we have added quantitative data illustrating actin recruitment at late endosomes with phalloidin stainings. Actin association with endosomes is also confirmed by the Rab7 stainings in larval terminal cells in Fig. 3G-H that show actin puncta associated with endosomes.

      (ii) Again, as mentioned in the response to reviewer #2, we do think that all actin pools in the growth cone are affected. We are glad that the reviewers encouraged us to make this more explicit and will now discuss more clearlyhow endosomal F-actin could affect apical and basal F-actin pools.

      **Minor concerns:**

      1. The authors concluded (page 9, line 285) that "endosomes serve as actin nucleating centres that propagate forces within the cell by physically linking different subcellular compartments".

      We agree with the reviewer, this is a good way of phrasing it, and we will rewrite this conclusion accordingly.

      The authors should depict in the panels the Ventral/Dorsal axis.

      All images are positioned in the same orientation, but we will ensure that the D/V axis orientation is stated in the manuscript.

      Numerous omissions need to be corrected. Labeling is missing in the panels J-M' (Figure 1). The statistical significance and the p values levels are not indicated in Figure 2 (G). The panel figure 5 (D) is miss-labelled. The panels C-C' in f igure 7 are not very informative. They do not reflect the general model of the study. How the prevention of actin nucleation at late endosomes, or apical or basal cortex affects tube directionality is not graphically shown.

      We thank the reviewer for noticing these omissions, we will fix them for resubmission. Having added more discussion about the general organization of actin at the tip of the cell, we think the relevance of panels 7C is justified.

      In the section "Crosstalk between cytoskeletal compartments" (Lines 359- 400, discussion) the argument about the involvement of microtubules in tube-guidance is a likely scenario. But I found this argument over-extended. WASH interacts with tubulin Derivery et al. Dev Cell (2009) and WASH activity balances the endosomal and cortical F-actin networks during epithelial tube maturation in multicellular tracheal tubes (Tsarouhas et al., Nat. Comm 2019). These results should be considered in the discussion section.

      We will incorporate these references to the discussion, they will for sure enrich it.

      Reviewer #1 (Significance (Required)):

      The important role of actin cytoskeleton in the initiation of endocytosis is well established. Actin structures in the plasma membrane are dynamically organized to assist the remodeling of the cell surface and to facilitate the inward movement of vesicles. Similarly actin networks in endosomes are critical for endosomal fusion and fission. In this work, the authors identified an opposing but interesting scenario. They propose a role for the late endocytic pathway in organizing actin networks for proper cell morphogenesis and point out an intracellular crosstalk and coordination between distinct cytoskeletal pools within a cell.

      Although the mechanism about how the separate F-actin pools communicate is not shown, the paper is interesting and shows an original contribution in the area of cell morphogenesis. In addition it represents a conceptual advance as it proposes a mechanism through which actin cytoskeleton is coordinated to regulate tube morphogenesis. The proposed mechanism may be relevant for tracheal terminal cells, but could represent a general mechanism in the field of cell biology. The methodology is appropriate and the text flow is well organized. However, as explained, there are few inconsistencies in the manuscript. I believe the above additions would strengthen the conclusion of the paper.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary

      Rios-Barrera and Leptin investigate the formation and guidance of the subcellular tube that forms in the terminal cell of the dorsal branches of the Drosophila tracheal system. In previous work the authors documented the presence of late endosomes at the tip of the growing terminal cell, ahead of the forming subcellular tube, which are involved in membrane recycling {Mathew, 2020 #1407}. In this present work they analyze the organization of the actin cytoskeleton in the tip terminal cell, in relation with the late endosomes, and assess the guidance of the subcellular tube. They find that the presence and localization of late endosomes play a role in tube guidance. They also find that late endosomes recruit actin around them, mediated by the activity of the actin polymerization regulator Wash, which is recruited to maturing late endosomes. When Wash activity is decreased, actin around late endosomes is decreased and tube guidance is compromised. Based on this observation, laser ablation experiments of actin ahead of the tube and actin staining at the tip of the terminal cell, the authors propose an exciting model: late endosomes recruit actin, which connects the actin pool of the basal membrane and the actin pool of the apical (subcellular tube) membrane thereby directing tube growth and guidance. The manuscript is well-written and well-presented, the images and movies are of high quality and the experimental data, which is technically challenging, is very good and sufficiently replicated.

      Major comments:

      1. A critical point in the model that the authors put forward (which is also contained in the title and abstract) is that actin organized at late endosomes anchors apical and basal actin cortices. However, there is no clear and conclusive evidence for this. Clear evidence in this direction should be provided to propose it as a mechanism (as it is in the text, particularly in the first sentences of the discussion) and imply it in the title.

      The authors show endogenous actin around late endosomes and actin fibers at the tip of the terminal branch. However, at the level of resolution presented (Fig 3A,B), it is not possible to determine whether the different actin populations are actually "anchored". I suggest to present stronger data supporting this important conclusion.

      In the same direction, it would be critical to show that this anchoring of actin fibers is disturbed when actin enrichment at the late endosome is perturbed (see also point 5). Actually, the authors show that when Vha or Wash activity are downregulated actin accumulation around the CD4 vesicles decrease. However, this experiment has a few inconveniences. First, it is difficult to determine levels of a construct that is overexpressed (UAS-utr::GFP). Could the authors use phalloidin or an actin antibody to confirm the result? Second, I find the result difficult to interpret. In the images provided I see a general decrease of actin (UtrGFP) at the tip, not only around the CD4 vesicles (Fig 6D,F) . Are these mutant conditions also affecting the rest of actin pools? If this is the case, can the authors attribute the defects exclusively to the abnormal recruitment of actin to the late endosomes? Most importantly, the authors should analyze the pattern of actin distribution (labelling endogenous actin) and determine a possible loss of "anchoring" of fibers when late endosome maturation is perturbed.

      1. Another critical point in the model put forward by the authors is that late endosomes drive tube guidance. To test this point the authors use an elegant system to mislocalize Rab7 late endosomes. However, the effects are not strong (1G), and only a proportion of branches show misguided tubes. Do the cases with a ventrally-guided tube in the experiment Rab7:YFP+/+ (Fig. 1G) have a CD4 endosome (with Rab7YFP) at the tip? This would help to explain the weak effect.
      2. What is the cause that preventing proper endosome maturation and acidification leads to misguided tubes (rather than missing ones)? The authors indicate that downregulating Vha activity leads to defects in acidification, but late endosome-MVB normally form. It is intriguing to see extra CD4 vesicles (like in 1C or 6C). Wouldn't we expect to see "normal" tip accumulation of CD4 vesicles only, and not extra ones? How relevant are these extra CD4 vesicles? Wouldn't we expect to see "non functional" CD4 vesicles, unable to recruit actin and lead intracellular tube formation (i.e. no tube) rather than missguidances? (1D shows higher proportion of misguided tubes than no tubes) Similarly, is Wash-RNAi producing extra CD4 vesicles (as observed in movie 5, fig 6E)?
      3. Actin recruitment to late endosomes was already documented, where it plays a role in cargo trafficking. The authors propose that Wash is recruited to late endosomes upon acidification where it would prime actin nucleation around the endosome. The authors indicate a decrease in Wash accumulation upon expression of Vha dominant negative. However, this decrease is not quantified. In addition, it is difficult to determine levels of a construct when this is overexpressed (UAS-Wash::GFP). It would be desirable to use antibodies against the endogenous protein (Wash in this case) to claim differences in accumulation in mutant conditions.

      The results presented do not rule out a requirement of Wash in terminal branching which is not associated with the enrichment in the late endosomes. The genetic interaction observed with Shrub is also compatible with both proteins acting on terminal branching but in different/parallel mechanisms.

      1. Laser ablation experiments The laser ablation experiments are difficult to interpret. First, it is unclear to me what the results exactly indicate. What does the recoil observed suggest? Does it fit with the expected tension exerted by a link of the actin cytoskeleton relayed by late endosomes?. From the text and figure I don't understand how is the recoil calculated: retraction of the subcellular tube backwards? "enlargement" of the bleached area? Second, it is unclear to me what laser ablation actually ablates. Does it only affect actin? Or are also CD4-late endosomes and other tip structures affected? Third, is the recovery observed after ablation correlated with new actin recruitment around old or new late endosomes? Forth, I find the experiments in cells with secondary subcellular tubes very confusing and the explanations very speculative Finally, and most importantly. I think that performing laser ablation experiments in mutant conditions that affect actin recruitment (VhaDN and Wash RNAi,....) would be very informative. One would expect to find a decrease in recoil. If this was the case, it would validate, on the one hand, that in control conditions there is a tension that depends (at least in part) on actin organization, and on the other hand it would show that when actin recruitment is affected tension decreases, supporting the "anchoring" model. I understand that laser ablation experiments are not easy to perform, but I think this would be a useful experiment. To my understanding, as it stands, the laser ablation experiments "....support the notion that adequate cytoskeletal organization at the tip is required for tube guidance and stability" as the authors acknowledge, but they do not convincingly support their "anchoring" model

      Other comments:

      • From the images presented, it is often difficult to figure out where the subcellular tube forms, the presence of vesicles, the cell morphologies,... and to determine the correlation between the CD4 vesicles and tube guidance. For instance, in Fig 1H and 1J, is there a "lateral" CD4 vesicle? Why it does not generate a missguided tube? Fig 1I, are there 2 subcellular tubes? Can the authors mark them? I cannot really visualize them with the CD4 marker, they seem stalled or short or missing. Fig 1L: what do the authors mean by "corrected" tube sprouts? It is difficult to identify the cell in Fig 2D-F
      • Movie S3: I find it difficult to spot the association of CD4 and utrGFP that the authors point. Can the authors label in the movie the vesicles and the association?
      • The results with the Rab7 downregulation and upregulation are not very clear. Does the downregulation of Rab 7 (Rab7 DN construct) have any effect on tube guidance? Does it decrease or eliminate actin association with CD4 vesicles in the embryo? The authors show that in the larvae expression of Rab7 DN leads to loss of actin enrichment in Rab7 vesicles. Does this have an effect on terminal branching? The Rab7 active construct produce effects at larval stages but not in the embryo. Is terminal cell branching in the larvae also dependent on late endosomes? Can the authors show "excess" of late endosomes in the larvae that lead to extra terminal branches? Even that the authors indicate that they cannot detect Rab7Q67L, can they find any effect at embryonic stages (e.g. presence and position of CD4 vesicles, other unrelated effects,...)?
      • In some examples in the movies there seem to be a correlation between CD4 vesicles presence/positioning and basal lamellipodia/filopodia or actin enrichment, and also in -btl experiments. Have the authors explored this? They may want to comment on this in the discussion section.

      Significance

      This work is relevant for the morphogenesis field and deals with the important issue of how the cytoskeleton regulates shape and cellular events. The work represents a deep analysis of a specific issue in the specialized field of tracheal development, but the results may be relevant for other types of cells forming subcellular tubes. Describing a function of trafficking vesicles (late endosome in this case) in cell morphogenesis (in addition to cargo trafficking) in an in vivo system is also relevant to advance in the cell biology field.

      Referees cross-commenting

      I agree with the comments of reviewer #3. I find relevant the points raised in "major comments number 2 and 4".

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): Very high evidence and clarity. Excellent scientific rigor. The findings are important and reported clearly. The experiments are conducted in a rigorous way by numerous participating laboratories. Reviewer #1 (Significance (Required)): Very high significance, both from a molecular biology and clinical standpoints. This is an important manuscript that challenges the findings and conclusions of a prior high-profile paper in Science by Ma et al 2016, claiming that LAG3 is a receptor for aggregation-prone species of alpha-synuclein and that deletion of LAG3 results in reduced cell to cell propagation of alpha-synuclein aggregates. The experiments in this paper are numerous and employ a variety of techniques. The overall conclusions are that LAG3 is not expressed by the relevant neurons and that LAG3 is not a receptor for alpha-synuclein fibrils (of different sizes). Therefore, the authors conclude that LAG3 is unlikely to play a role in the spread of alpha-synuclein pathology in Parkinson's disease and related disorders. There are, however, some weaknesses. For example, the Introduction contains passages that are not written in a stringent way: 1. "Histologically, PD is characterized by α-synuclein aggregates known as Lewy Bodies in neurons of the substantia nigra," That is not a good description of PD neuropathology. Lewy pathology is present in numerous areas of the CNS and PNS, and is not restricted to the substantia nigra.

      We have added a more detailed account:

      “Histologically, PD is characterized by α-synuclein inclusions known as Lewy Bodies whose accumulation is associated with neurodegeneration (Dickson, 2012; Mullin and Schapira, 2015; Corbillé et al., 2016). These inclusions affect the Substantia nigra and other mesencephalic regions as well as, in some cases, the amygdala and neocortex (Dickson, 2018).”

      1. "Growing evidence suggests that α-synuclein fibrils spread from cell to cell". While alpha-synuclein pathology can spread from cell to cell, it is not known if the fibrils are the species (alone or combined with other conformers) that cause the spreading of the pathology in a seeding fashion, or if smaller alpha-synuclein assemblies play that role.

      We have reformulated the sentence to credit the fact that we do not know which synuclein species is the one that is transmitted:

      “Growing evidence suggests that α-synuclein aggregates spread from cell to cell (Volpicelli-Daley et al., 2011; Volpicelli-Daley, Luk and Lee, 2014)… “

      1. "...by a "prionoid" process of templated conversion (Aguzzi, 2009; Aguzzi and Lakkaraju, 2016; Jucker and Walker, 2018; Kara, Marks and Aguzzi, 2018; Scheckel and Aguzzi, 2018; Uemura et al., 2020)." This sentence gives the impression that the corresponding author has led the field when it comes to alpha-synuclein's prionid properties. That is not really the case, and it would be appropriate to cite the literature in a more scholarly fashion that reflects how this part of the alpha-synuclein research field developed.

      I cannot disagree, and in fact I suspect that the present paper may be my second and possibly last experimental contribution to the synuclein field! However, I do claim intellectual parenthood of the prionoid (not “prionid”) concept, which I first expounded in a 2009 Nature paper. Anyway, we now provide a more balanced citation:

      “…by a “prionoid” process of templated conversion (Aguzzi, 2009; Jucker and Walker, 2018; Kara, Marks and Aguzzi, 2018; Henderson, Trojanowski and Lee, 2019; Karpowicz, Trojanowski and Lee, 2019; Uemura et al., 2020; Kara et al., 2021).“

      1. "Interrupting transmission of a-synuclein may slow down or abrogate the disease course." This is a bold statement and far from certain. While one might propose that this is the case, it is still just a hypothesis and the Introduction should reflect that.

      We have rewritten the sentence in a more subdued manner:

      “It is thought that interrupting transmission of a-synuclein may slow down or abrogate the disease course.”

      **Referee Cross-commenting** I concur with reviewers 2 and 3, and the new comment from reviewer 2. This paper should be published as soon as possible.

      *********************************************

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): This study conclusively shows that LAG3 is not the receptor for a-synuclein that underlies the spread of synucleinopathic damage in various PD-related conditions. The paper is done extremely carefully and comprehensively. My only suggestion is to indicate the significance level in Figure 5a, as it may turn out that LAG3 is actually protective.

      We have added the significance level in Fig. 5A, in the legend: “The survivals of ASYNA53T LAG3-/-, LAG3+/- and LAG3+/+ mice were similar (Mantel-Cox log-rank test, p-value = 0.165).”

      Reviewer #2 (Significance (Required)): This study is of extremely high significance - we need mechanisms to deal with spectacular results in the literature that should not have been published because they are were uncompelling to begin with, but were published for various sociological/political reasons. Science won't progress if we don't find correction mechanisms for wrong conclusions. **Referee Cross-commenting** I agree with reviewers 1 and 3, especially with the suggestions made by reviewer 1, which should be instituted. I think we all concur that the paper should be published without new experiments. I believe testing a-synuclein propagation in vivo in LAG3 KO mice would be useful, but given the complete lack of replication of LAG3 expression in brain and of a-synuclein binding to LAG3, this is not necessary.

      We considered running experiments in addition to those performed in vivo in ASYNA53T transgenic mice (including LAG3 KO) and ex vivo in organotypic slices, the latter using pre-formed fibrils. However, the outcome of these experiments, along with the absence of LAG3 expression in neurons and its unclear binding, convinced us that the usage of further animals and reagents would be unwarranted.

      *****************************************

      Reviewer #3 (Evidence, reproducibility and clarity (Required)): It was proposed that LAG3 is important in the treatment of PD and related disorders, because it functions as a receptor of pathogenic α-synuclein and the treatment with anti-LAG3 antibodies attenuated the spread of pathological α-synuclein and drastically lowered the aggregation in vitro (Mao et al, Science 2016). In this study, authors characterized 8 antibodies to LAG3 and investigated the presence of LAG3 in cultured cell lines, NSC-derived neural cultures, or organ homogenates for the presence of human or murine LAG3. But it was not detected in any of the neuronal samples tested. In addition, single cell (sc) RNAseq yielded only minimal counts for the LAG3 transcript in neurons, astrocytes, and mixed glial cells, and single-nucleus (sn) RNAseq human brain dataset for LAG3 expression across different cell types confirmed no LAG3 signals for any of 34 identified cell clusters, including 13 clusters of excitatory and 11 subtypes of inhibitory neurons, oligodendrocytes, oligodendrocyte precursor cells, microglia, astrocytes, and endothelial cells. Authors also analyzed the binding of LAG3 with α-synuclein in mouse and human model systems, and concluded that the affinity of LAG3 for α-synuclein fibrils, if any, is micromolar or less. Furthermore, authors studied the propagation of pre-formed fibrils (PFFs) of α-synuclein in neural stem cell (NSC)-derived neural cultures in the presence or absence of LAG3, and the impact of LAG3 on survival in ASYNA53T transgenic mice expressing wild-type LAG3 as well as hemizygous or homozygous deletions thereof. However, they were unable to see any significant role for LAG3 in these in vitro and in vivo models of α-synucleinopathies. In this connection, the reviewer would like to ask one question: Have you conducted any experiments of the propagation of PFFs of α-synuclein in LAG3-KO mice ? If they did, what were the results ?

      We did consider the possibility of replicating the experiments using PFFs in LAG3 KO mice. However, as stated above, we felt that our experiments – including the survival study in vivo in ASYNA53T transgenic mice – were unambiguous. After critical consideration, we remained unconvinced that this additional experiment would change the weight of our evidence in a substantial manner that would justify the inoculation of other animals and the utilisation of more resources.

      **Minor point** In Page 10, I think it's a typo: ASYYN mice must be ASYN mice.

      Thank you for pointing this out. We corrected it.

      Reviewer #3 (Significance (Required)): These negative findings about the LAG in α-synucleinopathies shown in this manuscript do not provide any new insight into the mechanisms of α-synuclein propagation. However, it is clear that LAG3 is not expressed in neuronal cells and the binding of LAG3 to α-synuclein fibrils appears limited. Overexpression of LAG3 in cultured human neural cells did not cause any worsening of α-synuclein pathology ex vivo. The overall survival of A53T α- synuclein transgenic mice was unaffected by LAG3 depletion and the seeded induction of α-synuclein lesions in hippocampal slice cultures was unaffected by LAG3 knockout. These data shown in this manuscript are convincing and the information is very important in terms of correcting the direction of disease treatment and research. **Referee Cross-commenting** I agree with reviewers 1 and 2. This paper should be published as soon as possible.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      This study conclusively shows that LAG3 is not the receptor for a-synuclein that underlies the spread of synucleinopathic damage in various PD-related conditions. The paper is done extremely carefully and comprehensively. My only suggestion is to indicate the significance level in Figure 5a, as it may turn out that LAG3 is actually protective.

      Significance

      This study is of extremely high significance - we need mechanisms to deal with spectacular results in the literature that should not have been published because they are were uncompelling to begin with, but were published for various sociological/political reasons. Science won't progress if we don't find correction mechanisms for wrong conclusions.

      Referee Cross-commenting

      I agree with reviewers 1 and 3, especially with the suggestions made by reviewer 1, which should be instituted. I think we all concur that the paper should be published without new experiments. I believe testing a-synuclein propagation in vivo in LAG3 KO mice would be useful, but given the complete lack of replication of LAG3 expression in brain and of a-synuclein binding to LAG3, this is not necessary.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Rebuttal letter – Response to Reviewers

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      This study focused on P. vivax, which is an important neglected human malaria killer. The reported evidence will have a significant impact on diagnosing infectious diseases. The language in the manuscript is very good. However, some typos were reported. Some paragraphs might need particular attention to punctuation. Overall, the work is very good. The statistics are straight forward. However, there are a couple of major points that must be addressed before publication. Some of my comments are just recommendations to clarify some sections of the text.

      **Major comments:**

      The statistical methods can be improved by using generalised mixed models (GLMM).

      1- PCA graphs need to be organised in more descriptive ways. Dim1 and Dim2 in each axis need to be defined clearly in the figures. PCA in Fig2 c is very difficult to follow, and it needs to be organised.

      Answer: Figures have been amended to be more self-explanatory and clearer to the reader.

      2- In this study, patients were male and female, and we know already male and female haematological parameters are hugely different, specially Hb level, and so on. My question is how the sex variable is treated in this study? Did your control group were from both sexes? Sex could be treated as a random variable in all studies if GLMM models were used.

      Answer: Information in how the sex variable was treated in the study has been added to the methods section. In our cross-sectional study with uncomplicated P. vivax malaria patients seen at FMT-HVD in Manaus, Brazil, patients and healthy donors (controls) were matched by age and sex. In both groups, frequency of female individuals was 30% and male individuals 70%.

      We think sex is better fitted as fixed effect since only two levels for this factor are possible. Thus, we used linear models with age and sex as fixed variables for statistical testing and to ensure that the differences observed between P.vivax- infected patients and controls, as well as between the clusters, were only due to disease status. This analysis showed that red blood cells count, hemoglobin, hematocrit, MXD and neutrophils counts (this parameter only when comparing the clusters) needed to be corrected only due to sex influence. For these parameters, estimates of predicted sex influence were subtracted from the raw parameter values and residuals were used for statistical testing. We have added this information in the Methods section as indicated below:

      Page 6, line 128: Patients and healthy donors were age and sex-matched, with a frequency of 30% female and 70% male individuals in both groups.

      Page 14, line 336:

      To ensure that differences observed between P. vivax - infected patients and controls, as well as between the clusters, were due to disease status and not confounded by age or sex, the clinical parameters were fitted as response variables in a linear model with sex and/or age fitted as explanatory variables. Age and sex were included in the model if their coefficients were estimated as different from zero with p-value The residuals from the linear model were then used as age and/or sex corrected parameters in subsequent analyses.

      3- Why 6h and 18h used for the HUVEC evaluation?

      Answer. We ran several optimization experiments with individual plasma samples where we observed maximal mRNA expression changes after 6h of stimulation. For experiments detecting protein expression (IFA and flow cytometry), we increased the stimulation time to 18h. Preliminary experiments suggested this to be the optimal duration without compromising cellular viability.

      4- It is mentioned only neutrophil enriched in this study, if myelopoiesis is affected, why the other granulocytes were not showed significant enhancement?

      Answer: Our data reveal no change in the number of circulating neutrophils in the different clusters of individuals. However, mixed cell counts (MXD), a parameter representing monocytes, basophils and eosinophils numbers, was significantly reduced in Vivaxhigh patients. As a result, there was a significant enrichment of neutrophils in the leukocyte fraction in the blood of Vivaxhigh patients as well as a higher Neutrophil:Lymphocyte count ratio (NLCR) (Figure 4). In hematopoietic progenitors, stochastic changes in each factor’s concentration could result in one factor’s becoming more abundant and committing a hematopoietic progenitor to a particular lineage. To generate each mature granulocyte population (e.g. basophils, eosinophils and neutrophils), common myeloid precursor cells (CMPs) and later precursors for granulocytic and monocytic lineages (GMPs) follow in the BM different lineage commitment programs, tightly-regulated or instructed by a specific set of soluble factors, cell-cell interactions and transcription factors, that define cell fate decisions and lineage restrictions. For instance, differential PU.1 activity can specify different cell fates during haematopoiesis regulating monocyte and neutrophils differentiation. Genetic and biochemical analyses have shown that G-CSF can direct granulocyte differentiation by changing the ratio of C/EBPα to PU.1 (Zhu et al., Oncogene 2002; Friedman Oncogene 2002; Dahl et al., Nat Immunol 2003). High expression levels of PU.1 and C/EBPa, stimulated by G-CSF, promote GMP differentiation to neutrophils and inhibits monocyte differentiation, while only PU.1 expression, IRF-8 and lower expression/activity of C/EBPs induce GMP differentiation to monocytes (Zhu et al., Oncogene 2002; Friedman Oncogene 2002; Dahl et al., Nat Immunol 2003). Meanwhile, a combination of PU.1, C/EBPb and low levels of GATA-1 differentiates GMPs to eosinophil lineage (Kulessa et al., 1995; McDevitt et al., 1997; Yamaguchi et al., 1999) and PU.1 must also cooperate with GATA2 to direct mast cell differentiation (Walsh et al., Immunity 2002). In addition, eosinophil and basophil differentiation are induced by a different set of cytokines, usually produced in prevalent T-helper 2 response, such as IL-5, which should be inhibited in the strong Th1 environment evidenced by our and previous Luminex data in Pv patients. The enrichment of activated neutrophils in the peripheral circulation of P. vivax patients could be due to a response that specifically enhances neutrophil production and release from the bone marrow (BM). This hypothesis is supported by emerging evidence for enrichment of P. vivax parasites in the hematopoietic niche of BM, our Luminex data showing significant increase in pro-inflammatory cytokines associated with emergency myelopoiesis (e.g., TNF-a, IL-1a, IL-1b, IL-6, IL-8), and increased circulating levels of G-CSF, the major inducer of neutrophils production in the BM. Likewise, increased activation-induced cell death (AICD) in T cells, splenic T-cell and platelet accumulation or decreased lymphopoiesis due to myeloid-biased HSC differentiation induced by inflammatory cytokines and EC activation in the BM (refs 36,37,39) might explain the neutrophil enrichment in vivax patients.

      5- I would also ask the authors to speculate a bit on, What could be the mechanism behind the different function of P. vivax compared to P. falciparum? From an evolutionary perspective, the parasite should rather become softer and keep the host alive for its own benefit.

      Answer: One of the characteristics of P. vivax that could play an important role in immunity is its restriction to invade immature reticulocytes. For example, the infected reticulocyte could play a role in the presentation of parasite antigens as reticulocytes (but not mature RBCs) express MHC-I and are capable to process and present antigens on their surface for recognition by T cells. Indeed, it has been shown that reticulocytes act directly as an antigen-presenting cell, emphasizing the importance of erythrocyte surface antigens both in the induction as well as the target of a protective immune response (Burel et al 2016, Junqueira et al 2018). Recent investigations comparing P. vivax and P. falciparum controlled human infection models (CHMIs) also revealed marked differences in the immune profiles generated following infection with the two species and postulated that protective immune responses to Plasmodium are species-specific. It has been hypothesized that this difference is due to strict P. vivax tropism for MHC-I-expressing reticulocytes that, unlike mature red blood cells, can present antigen directly to CD8+T cells. Specifically, P. vivax but not P. falciparum infection led to the expansion of a specific subset of CD38+CD8+ T cells which were associated with an activated phenotype and cytotoxic potential. Corroborating Burel et al findings in the CHMI model, Junqueira et al showed that P. vivax–infected reticulocytes express HLA-I. In P. vivax-infected patients, CD8+ T cells in the peripheral blood express high levels of cytotoxic proteins, recognize and form immunological synapses with P. vivax–infected reticulocytes in HLA–dependent manner. Next, it was showed that P. vivax-specific CD8+ T cells release their cytotoxic granules to kill both host cell and intracellular parasite, which prevented reinvasion (Junqueira et al 2018). Although these data indicate a protective role of cytotoxic CD8+ T cells during P. vivax blood-stage malaria, it is not clear whether these lymphocytes would always be beneficial because they might contribute to anemia, inflammation or other pathological sequelae of infection, which needs to be further investigated.

      **Minor comments:**

      • It is important to have a reference, version, and date for the R software, packages and GraphPad.

      Answer: We have added version and date for the R and GraphPad software.


      2- In Fig 5, E missed to report. This figure can be better organised. It is very hard to read and follow.

      Answer: There is no E in Figure 5. We will organize the figure to make it easier to read and follow.

      Reviewer #2 (Significance (Required)):

        • vivax remains endemic in 51 countries across Central and South Americas, the Horn of Africa, Asia and the Pacific islands. In most areas it is co-endemic with P. falciparum, which has been the priority species to address for national malaria control programmes. Malaria related deaths are mostly attributable to the more pathogenic P. falciparum, but over the last decade these have declined, however there has been a consistent rise in the proportion of malaria cases due to P. vivax. However, because it is difficult to diagnose resistant strains, strategies to detect and track drug resistant P. vivax* are limited. In this context it is vital to develop better tools to assess diagnostic, antimalarial efficacy and drug susceptibility so that emerging drug resistance can be tracked, and novel treatment strategies explored. From my viewpoint, despite some statistical problems to understand the complex nature of data (mixed interactions among multiple variables), these findings seem to be very interesting and (after a major revision) worth to be published. As said before, the story told by the authors could become interesting.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The manuscript titled: "Total parasite biomass but not peripheral parasitaemia is associated with endothelial and haematological perturbations in Plasmodium vivax patients" by Silva-Filho et al., reinforce the original observation and data by the group of Nicholas Anstey and coworkers, who first proposed the use of plasma parasite lactate dehydrogenase and PvLDH as a marker of parasite biomass. In that work, it was already demonstrated that P. vivax biomass is related to plasma concentration of LDH levels. As such, the present work cannot be considered of high novelty. Yet, through a meticulous approach including clinical data, computational approaches, machine learning, LDH measurement, multiplex analysis and quantitave RT-PCR, the authors here have extended the original observations that a large biomass of P. vivax parasites is out of blood circulation. In contrast, unlike the original observations of Anstey´s group, a correlation between total parasite biomass and systemic levels of markers of endothelial cells activation, was observed. The manuscript is very well written and the discussion brings new knowledge in this key topic for elimination of malaria. This manuscript is therefore recommended for publication after the following comments are addressed.

      **Major comments:**

      1. The vascular endothelium plays a pivotal role in malaria. Therefore, to test whether cell and/or parasite factors affect the vascular endothelium, HUVEC cells were used in this study. This is of major concern as endothelial cells from the bone marrow, where most hematological disturbances, notoriously thrombocytopenia, occur, were not used instead. HUVEC cells seems the only endothelial cell that does not express ABO blood group antigens, thus suggesting that surface expression on these cells is highly altered (O´Donnell et al., 2000 J Vasc Res). Moreover, significant functional differences between HUVEC cells and adult vascular endothelium have been reported (Chan et al., 2004). Together, this indicates that results obtained with HUVEC cells might not reflect responses of the bone marrow vascular endothelium. As one of the corresponding authors have ample experience with working with human bone marrow endothelial cells (Mantel et al., 2016 Nat Comm), it is suggested to perform some experiments with these cells to assure extrapolation of the results obtained with HUVEC cells.

      Answer: We agree with the reviewer that performing ex vivo assays with primary human bone marrow endothelial cells would be an excellent alternative. However, we would like to argue that HUVECs are also suitable for our purposes. HUVECs are widely used to study endothelial barrier function, for example in the context of angiogenesis and inflammatory responses/barrier disruption. To emphasise this point, we have now referenced examples where HUVECs were used in the context of endothelial barrier biology and in different inflammatory conditions (see also lists a, b, c below).

      1. Papers showing the use of HUVECs in studies yielding important insights about endothelial barrier function
      • Krispin S et al. Growth Differentiation Factor 6 Promotes Vascular Stability by Restraining Vascular Endothelial Growth Factor Signaling. Arterioscler Thromb Vasc Biol. 2018.
      • Aranda JF et al. MYADM controls endothelial barrier function through ERM-dependent regulation of ICAM-1 expression. Mol Biol Cell. 2013.
      • Orsenigo F et al. Phosphorylation of VE-cadherin is modulated by haemodynamic forces and contributes to the regulation of vascular permeability in vivo. Nat Commun. 2012.
      • *
      1. Papers that used HUVECs in studies about endothelial barrier function in inflammatory conditions
      • Dickinson CM et al. Leukadherin-1 ameliorates endothelial barrier damage mediated by neutrophils from critically ill patients. J Intensive Care. 2018.
      • Kuck JL et al. Ascorbic acid attenuates endothelial permeability triggered by cell-free hemoglobin. Biochem Biophys Res Commun. 2018.
      • Tramontini Gomes de Sousa Cardozo F et al. **Serum from dengue virus-infected patients with and without plasma leakage differentially affects endothelial cells barrier function in vitro. PLoS One. 2017.
      • Fox ED et al. Neutrophils from critically ill septic patients mediate profound loss of endothelial barrier integrity. Crit Care. 2013.
      • Rahbar E et al. Endothelial glycocalyx shedding and vascular permeability in severely injured trauma patients. J Transl Med. 2015.
      • *
      1. Papers showing that HUVECs behave similarly to other endothelial cell types in regard to barrier function, except when the comparison is with blood brain barrier models
      • *

      • Totani L et al. Mechanisms of endothelial cell dysfunction in cystic fibrosis. Biochim Biophys Acta Mol Basis Dis. 2017, Dec;1863(12):3243-3253.

      • Gündüz D et al. Effect of ticagrelor on endothelial calcium signalling and barrier function. **Thromb Haemost. 2017 Jan 26;117(2):371-381.
      • Deitch EA et al. Mesenteric lymph from rats subjected to trauma-hemorrhagic shock are injurious to rat pulmonary microvascular endothelial cells as well as human umbilical vein endothelial cells. ** 2001 Oct;16(4):290-3. Importantly, we were able to reproduce in the HUVEC ex vivo assays a phenotype of endothelial perturbations that is inferred based on the in vivo Luminex data using the same plasma sample. These data also support our hypothesis that patients with higher parasite biomass present higher endothelial cell perturbations, corroborating the associations between parasite accumulation in deep tissues (total parasite biomass represented by PvLDH levels) and endothelial cell activation as demonstrated in the Figure 6.

      Strikingly, the authors stated that "P. vivax infection results in different ranges of EC alterations without massive cytoadhesion". This statement has no data supporting it. In fact, their own flow cytometry data convincingly demonstrated that exposure of HUVEC cells to plasma of vivax-high patients significantly increased the surface expression of ICAM-1 and VCAM. ICAM-1 expression is a well know receptor for cytoadhesion in malaria and Dr. Costa first demonstrated the importance of this receptor in cytoadherence of P. vivax (Carvalho et al., 2010). Moreover, these data are in some contradiction with the original observations of Anstey and collaborators who demonstrated that parasite LDH concentration did not correlate with markers of endothelial activation (Barber et al., 2015 PLoS Path). Therefore, this sentence should be modified to accommodate the alternative possibility of cytoadherence, deleted from the manuscript or binding functional assays should be performed to sustain it.

      Answer: We agree with the reviewer and have removed this statement.

      Page 22, line 543: The association between endothelial activation, Syndecan-1 and parasite biomass (PvLDH) indicates a positive feedback loop between glycocalyx breakdown, activation of endothelial receptors such as ICAM-1and VCAM-1 and parasite accumulation in deep tissues9,12.

      Extracellular vesicles are key players in pathology of malaria and this includes P. vivax where concentration of circulating microparticles were associated with acute infections (Campos et al., 2010 Mal J). Moreover, Dr. Marti has pioneered this field since the original manuscript describing the role of EVs in malaria as intercellular communicators (Mantel et al., 2013 Cell). More recently, his group also demonstrated that interaction of EVs with bone marrow endothelial cells induce expression of IL-6 and IL-1 as well as vascular endothelium perturbations after trans-endothelial electrical resistance experiments (Mantel et al., 2016 Nat Comm). Furthermore, another recent report showed the physiological role of EVs in vivax malaria by demonstrating that EV uptake by human spleen fibroblast induced nuclear translocation of the NF-kB transcriptional factor, concomitant with surface expression of ICAM-1, thus facilitating cytoadherence of infected reticulocytes from P. vivax patients (Toda et al., 2020 Nat Comm). This growing evidence indicates that plasma circulating EVs are key communicators in malaria infections potentially explaining some of the findings reported in this work. Neglecting the importance of EVs in the discussion of this article is not reasonable and weakens this manuscript. Including a paragraph on EVs and accurate references in the discussion is thus strongly recommended.


      Answer: We agree with the reviewer that extracellular vesicles are key communicators in malaria infection. We have not measured them in our study, however, and therefore can only speculate about their impact on our observations. We have added a phrase in the discussion:

      Page 27, line 661: It is likely that other circulating factors that we have not directly measured in our study are also contributing to EC activation and vascular permeability. In particular, extracellular vesicles (EV) originating from ECs, platelets, and RBCs are present during malaria infection and are known to modulate the host immune response to the parasite54-56 . In P. falciparum, infected RBCs release EVs containing immunogenic parasite antigens, that activate macrophages, induce neutrophil migration and alter endothelial barrier function54,55. In P. vivax, plasma-derived EVs from iRBCs are taken up by human spleen fibroblasts (hSFs). This event signals NF-kB translocation and upregulation of ICAM-1 expression, facilitating cytoadherence of P. vivax-infected reticulocytes56.

      **Minor comments:**

      1. The lack of a group including severe vivax malaria patients is a drawback of this article as this group would have firmly validated the predictor of severe disease.

      Answer: This study was investigating a cohort of uncomplicated P. vivax malaria compared to controls. We agree that it will be important to extent our analysis to severe vivax malaria in future studies.

      In the selection criteria of the patients to be included in the study, no information on other co-infections were mentioned. Is this information available? If so, this should be mentioned.

      Answer: As described in the Methods sections, Page 6, line 132, mono infection by P. vivax was confirmed by analysis of blood smears and quantitative PCR (qPCR) for both P. vivax and P. falciparum. We agree that excluding other coinfections could have been of interest. However, the differential diagnosis for an acute febrile illness is very broad and it would be impractical to track all other possible diseases. In addition, the patients included in the present work had mild disease, and therefore were discharged from hospital after a positive malaria diagnosis. No further investigation on other infections was done.

      The main coinfection to be considered for an acute febrile illness with no localizing signs in our context is Dengue Fever. Although Dengue coinfection in our cohort is possible, the incidence at the Hospital is only 2.8% (P. vivax/Dengue coinfection) (Magalhães et al, Plos NTD 2014). Thus, it is unlikely that such a coinfection would have a major impact on our findings.

      This work determined the levels of PvLDH in a cohort of uncomplicated P. vivax patients as well as healthy volunteers using a double-sandwich ELISA assay: (i) are the clones to determine PvLDH values freely available to facilitate similar studies by independent groups? (ii) How was the cut-off of positivity defined? This is not evident, neither in the materials and methods, nor in the results.

      Answer: Clones are commercially available and were purchased from Vista Diagnostics International LLC, WA, USA. Information has been amended to the text in the Methods section.

      Page 8, line 186: “Cut-off of positivity was defined by correcting absorbance values generated in the plasma samples from healthy donors (controls) by blank values (plate controls), with both values being in the same range. Absorbance values higher than controls were considered positive. In parallel, we used schizont extracts to perform standard curves and lower absorbance values were in the range of O.D = 0.03-0.04. All positive patient samples gave O.D. values equal or higher than 0.05. This information has also been added in the Methods section.:

      It is not clear why varying percentages of pooled plasma (30% for imaging and flow cytometry, and 20% for impedance changes) from the different clusters were used for the functional EC assays. Moreover, no information about the concentration of plasma used for transcriptional analysis is available. Please clarify.

      Answer: The concentration of 30% pooled plasma was also used for transcriptional analysis, as indicated in the Methods section, page 11, line 250. This information was also added in the legend of Figure 5B. We had run several optimisation time-course and titration experiments with individual plasma samples, testing concentrations of plasma varying from 10% up to 30% v/v and we did not observe differences in mRNA expression between 20% and 30% v/v plasma conditions.

      As for the ECIS, our collaborators (Erich V de Paula group) have optimised this assay and they use a range of 15 to 20% (Santaterra et al 2020). Higher concentrations of plasma reduces the reproducibility, probably to fibrin formation.

      Reference 9 is a nonhuman primate study where no LDH is used. Please remove it.

      Answer: Reference 9 has been removed following the reviewer suggestion.

      Reference 39 is a review on the subject and cannot be included in the sentence on line 556 "In agreement with a previous study8,39, where reference 8 is accurate. Please remove reference 39 from here.

      Answer: The text has been amended as suggested.

      Reviewer #3 (Significance (Required)):

      This paper further contributes to explain the conundrum of low peripheral blood parasitemia and clinical severity in P. vivax. Moreover, by including new human markers and solidly applying computational tools, this paper further contributes to advance clinical research in P. vivax.

      Clinical diagnosis of hematological disorders including anemia, lymphopenia and thrombocytopenia, are routinely obtained from a complete blood count. Therefore, I believe the major significance of this work is to raise public health awareness of including in these clinical examinations, the determination of PvLDH levels. They might prognose, as suggested by the authors, better diagnosis and treatment of P. vivax,

      My main expertise is the biology of host-pathogen interactions with a focus on P. vivax.


      Reviewer #4 (Evidence, reproducibility and clarity (Required)):

      The study evaluates P. vivax biomass (serum LDH) versus peripheral parasitemia with multiple variables. From the biomass Vivax high vs. Vivax low, they compare multiple determination in patients with uncomplicated P. vivax. This raises questions about disease and the presence of parasites in various organs. The question is if P. vivax sequesters and the answer is yes in the bone marrow and spleen. Does it sequester like P. falciparum that causes disease by sequestration by binding endothelium in various organs. That is less clear. As P. vivax is rarely fatal, the sequestration has not been studied. The presence of parasites in organs of P. vivax infected splenectomized squirrel and Aotus monkeys has been found in bone marrow and liver (note: splenecotomized monkeys so parasitemia can rise to higher levels than in non-splenectomized monkeys). There are studies of binding of schizonts infected red cells to lung endothelium in vitro does not answer the question of whether sequestration occurs in vivo.

      The most important complication of P. vivax is generally anemia. This did not correlate with vivax biomass, but this raises the question of the length of infection and the possibility that parasite biomass may vary at different times of infection. Anemia was seen in P. vivax infected patients, but it did not relate to biomass at the time of study. Note the caveat mentioned in the previous sentence on long term effects of infection on anemia.

      The finding of biomass with reduced platelet counts and endothelial effects that may be related to a serum factor and not sequestration. This is the main limitation of the paper besides the unknown long term effect infection. If one could identify an effect of P. vivax infected human serum, this may be worth a study in the future on what is in serum causing the effects.

      Reviewer #4 (Significance (Required)):

      This study is unique with the caveats mentioned above. It has a good review of the literature.

      Answer: We appreciate the reviewer comments. In our cohort, the frequency of anaemia was not as high or severe as the frequency of thrombocytopenia and lymphopenia. However, we still find associations between endothelial cell activation marker Ang-2 and the pro-inflammatory cytokine IL-1 IL-1 negatively associated with several markers of anaemia, such as haemoglobin, haematocrit and RBC numbers. Although we did not further investigate this association, it may indicate indirect effects of parasite biomass on anaemia mediated by inflammation and EC activation, which will be further investigated in other current longitudinal cohort studies.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response/revision plan

      (Point-by-point response)


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The manuscript by Pennauer et al is the first to systematically investigate the role of class I&II Arfs using a knockout approach. It builds on earlier work by the Kahn lab who used an RNAi approach (Volpicelli-Daley et al. 2005) and is complementary to the overexpression approach used by the Hauri lab (Ben-Tekaya et al, 2010). The work is elegant and the data are strong. I am strongly in favor of publishing this work and my comments are technical in nature (2-5) and a request for some text changes (1). have the following comments for improvements:

      1- When it comes to evaluating the role of depletions of Arfs on cell fitness, it would be better to use a non-transformed cell line. I am not asking the authors to go through the painstaking process of generating knockout cell lines in RPE1 cells for instance. Rather, I suggest that the authors make the reader aware that conclusions about cell survival have to be taken with care due to the use of a transformed cell line.

      We will add this valid point to the Discussion.

      2- Why do Arf1 and Arf4 ko cells grow more slowly. Is it a higher rate of cell death? Is it a block in a certain phase of the cell cycle. Given the link of the Golgi to G2-M entry, I think that an analysis of the cell cycle distribution would add more depth to these data. If the cell cycle distribution is unaffected, then I would conclude that that the difference in doubling time are due to reduced cell survival. If there is an effect on the cell cycle distribution, then the conclusion of the authors is safe that no single Arf is required for survival

      We plan to analyze cell cycle distribution.

      3- It is not clear to me how many cells were quantified in Figure 2D-F. I suppose that each dot represents a cell. In this case, the number of cells quantified is a bit low. Such a quantification of fluorescence intensities in two channels in the same region is a simple task and I think it should be no problem obtaining at least 100 cells per condition.

      We will add the number of cells analyzed to the figure legends: At least 40 Golgis were quantified in each experiment. thus >100 in total.

      4- Is the drop in the ratio of beta-COP/GM130 in Arf1 depleted cells reflecting reduced recruitment to the Golgi? Because the Golgi is bigger, it might be reflecting a reduced density in the number of coatomer molecules per surface area. If it is due to reduced recruitment, then the ratio of membrane/cytosolic betaCOP should be altered. This of course requires to show that the knockout does not affect total levels of coatomer. I think that such fractionation experiments would be a valuable addition to the manuscript and increase the depth of the data.

      We are currently performing immunoblot analysis to determine bCOP levels.

      In the Figure below, we have plotted the total intensity of GM130 or bCOP per Golgi from our immunoflurescence data. Total intensity of GM130 significantly increased in the cell lines lacking Arf1, consistent with the increase in Golgi volume. The amount of bCOP at the Golgi remained constant, resulting in reduced bCOP/GM130 ratio. Deletion of Arf1 thus results in reduced rate of coat recruitment that is compensated by an increase in Golgi mass. In the simplest model, reduced formation of Golgi-exit carriers causes Golgi growth until exit carrier formation allows for the required flux.

      We propose to include this data in the revised manuscript.

      FIGURE

      5- The finding that Arf4-ko cells exhibit a defect on retrieval of ER-resident proteins is exciting, and in my opinion, it is the most significant finding in this manuscript. How can this be reconciled with the lack of an ARf4 ko effect on coatomer recruitment to the Golgi. Looking carefully at the data, I see that in 2 out of 3 experiments, Arf4 ko reduced the betaCOP/GM130 ratio. This is why I think it is crucial to perform more experiments and add more cells to increase the confidence in the data. Reduced retrieval of ER chaperones is frequently found in tumors and we still don't understand the reason behind this. Therefore, this finding is of significance beyond the community of cell biologists.

      We plan to repeat quantitation with COPI for better statistical validity.

      6- I find Figure 6A confusing. Why do Arf1 overexpressing parental HeLa cells exhibit less Arf1 than control cells?

      In order not to overload the immunoblot of Arf overexpressing lysates, a smaller aliquot (1/20) was loaded. We will indicate this directly below the blots to make this more obvious in the revised figure.

      7- Why was the following condition not tested: Arf4ko cells with Arf1 overexpression. Given the importance of Arf1 in retrograde (Golgi-to-ER) trafficking, I would expect a partial rescue of the retrieval of ER chaperones.

      We will to do this experiment.

      Reviewer #1 (Significance (Required)):

      **Significance of the work:**

      The paper is important because it is the first to examine the role of Arfs using a knockout approach. Another very important finding is that Arf4 depleted cells exhibit problems with retrieval of ER chaperones. This is a very novel finding and to the best of my knowledge

      **Audience:**

      The primary audience is of course the community working on membrane trafficking, organelle biology and proteostasis. However, I think that the data on the role of Arf4 in retrieval of ER chaperones might be of relevance for cancer biologists. Secretion of ER chaperones is frequently found in many tumors and we still do not understand why this is happening and what the significance thereof is.

      **My own expertise:**

      Export from the endoplasmic reticulum Golgi fragmentation in cancer cell migration Rho GTPases Kinase signaling Pseudoenzymes Cell migration of breast cancer cells Proteostasis in multiple myeloma

      **Referee Cross-commenting**

      Just a follow-up comment from my side:

      I agree that it has not been unequivocally established that Arf1 is the main/sole of retrograde transport. However, even less established is the role of Arf4 in this process. The authors show that it is mainly Arf1 depletion that reduces the amount of COPI at the Golgi (ratio of COPI/GM130). Thus, I remain very surprised that it is actually the Arf4 depletion that results in reduced retrieval.

      What is the significance of having less COPI at the Golgi in Arf1-ko cells? Certainly, the Golgi is not more "leaky". Does the level of COPI at the Golgi not reflect the strength of retrograde trafficking? Maybe there is no less COPI at the Golgi, and it only appears to be less, because the Golgi is bigger. This is why a simple fractionation experiment would be good. Something like making a cytosol and a microsome fraction and looking at the ratio of COPI (Cyt/Mem).

      If both reviewers think it is too much, or unlikely to work, then I am happy to drop this point.

      Below are my comments to the evaluations by the other two reviewers:

      1- I agree with most comments that the two other reviewers made. Some of them are actually overlapping with mine (e.g. the use of a cell line other than HeLa).

      2- I am not sure whether the impact of the paper would improve by adding data on Arf6.

      3- To the comment on Golgi polarity. Maybe we could be more specific here and say that it would be sufficient to show that a trans-Golgi protein and a cis-Golgi protein can be separated by fluorescence microscopy, or whether we alternatively want them to actually do it by immunogold labeling for EM (which is more difficult).

      4- I agree with reviewer 2 that the work proposed needs 1-3 months. I think reviewer 3 is a bit too optimistic with 1 month, because her/his comment on using a cell line other than HeLa cannot be addressed in just a month.


      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Pennauer uses HeLa cells and CRISPR/Cas9 to delete the 5 members of the class I and class II ARF family of small regulatory GTPases either individually or in combinations. The characterization of the KO cells is excellent and convincingly demonstrates that true KOs were generated. The quality of the data presented is high. Using the KO cells she documents minor alterations in Golgi architecture and the recruitment of vesicular coats in cells deleted of all ARFs except ARF4. In contrast, there is a significant lack of retention/recycling to the ER of KDEL-containing ER proteins in ARF4 KO cells, with numerous ER chaperones now released into the medium (the ARF4 KO secretome). This is a well-done study that showcases the ability of ARF4 alone to sustain cellular life (quite a surprise to this reviewer). Yet, the characterization of the phenotypes is somewhat minimal and the conclusions would be more robustly supported by additional experiments. Specifically:

      1. The authors completely ignore class III ARF6 and this paper would be much more comprehensive and informative if analysis of that ARF was also included (ARF6 has been seen at the Golgi and also mediates endosomal trafficking that intersects with the TGN).

      In agreement with the reviewers' consensus in cross-commenting, we consider Arf6ko to be beyond the scope of this study.

      Although the overall Golgi architecture seems to be largely conserved, it remains essential to test whether Golgi polarity is similarly maintained, and such data would significantly expand the significance of the reported findings

      We have performed super-resolution microscopy of wild-type and Arf1ko Golgis for GM130 and TGN46 as cis- and trans-Golgi markers, respectively, showing that polarity is still intact for Arf1ko, the morphologically most affected knockout cell line. We plan to include the following Figure in the revised manuscript.

      FIGURE

      Golgi complexes were imaged by superresolution microscopy for GM130 (green) and TGN46 (red), and displayed as maximum intensity projections, or tomographic 2D slices. Scale bar, 3 μm.

      Since there is a defect in retrieval of KDEL-proteins, it would be important to show the intracellular localization of the KDEL-R in the cells (especially in the ARF4 KO cells that don't retrieve KDEL-GFP) - is the receptor degraded, stuck in some specific place - knowing that would increase the impact of this study and provide a mechanistic explanation for the observed phenotype

      We plan to perform immunoblot analysis for KDELR to test for changes in levels in Arf deletion cells, and immunofluorescence microscopy to analyze changes in KDELR localization.

      The rescue experiments in Figure 6 are good as far as they go, but this experiment would be much more informative if in addition to the same class rescue, the other class ARFs (at least one!) were also characterized.

      We will to do this experiment.

      This is maybe a little too much to ask, but since the authors propose a mechanistic explanation for the ARF4 KO KDEL phenotype as being due to different effectors recruited by this ARF (in this case different COPI isotypes - this study would increase in impact by actually testing this mechanisms by assessing whether ARF4Q71L mutant preferentially bound any particular isotype of COPI or even try to do mass spec to identify relevant effectors for this extremely interesting ARF.

      We also think that this additional analysis is beyond the scope of this study.

      The Discussion is a very limited and would be more impactful by adding some discussion of organismal effects of ARF deletions (many are embryonic lethal while cells seems to live quite happily) or mutations (links to cancer come to mind here), as well as some mention of data from yeast ARF (what is and isn't essential in those cells). As is, the authors miss an opportunity to highlight the importance of their findings as they relate to current knowledge of ARFology.

      We agree to add a discussion of information on embryonic lethality and disease.

      Reviewer #2 (Significance (Required)):

      This is an important paper for the ARF field and people interested in ARF signaling will be glad to read about the findings and perhaps also use the developed KO cell lines - this is a significant advancement. The impact would be even higher if some of the experiments suggested above were incorporated into the manuscript.


      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      This paper describes the application of CRISPR/Cas9 to systematically delete from HeLa cells all four Arf genes, either singly or in combination. The authors find it is possible to generate a number of double deletions (notably one lacking both Class I Arfs), and a triple deletion lacking all but Arf4. The authors characterise the structure of the Golgi in these mutants as well as retention of ER residents. The work is a comprehensive study of an exceptionally high technical standard. There is excellent validation of the deletions, and then the application of a wide range of methods including immunofluorescence, electron microscopy and mass-spectrometry, all with careful and extensive quantitation. The finding that cells can survive without Class I Arfs is interesting and unexpected, as is the fact that Arf4 alone is sufficient. This work will provide an excellent platform for future studies on Arf protein function in human cells. There are of course many questions that arise from these findings, but given the scope and quality of the work they would seem better left for future publications. There is one experiment that could be added, and some additions needed to the text for clarity and minor adjustments to the figures (all listed below), but if these are addressed, this would be a high quality paper of wide audience to a cell biological audience.

      **Specific comments:**

      1) Have the authors tested the levels and/or localisation of the KDEL receptor in the various lines? This is not essential, but if it were easily done, it would add to the work on ER resident secretion.

      We plan to perform immunoblot analysis for KDELR to test for changes in levels in Arf deletion cells, and immunofluorescence microscopy to analyze changes in KDELR localization.

      2) The work is entirely done in HeLa cells. The authors should note that the situation might be different in other cells types and cell lines. For instance, the DepMap CRISPR database suggests that quite a lot of cell lines are strongly affected by loss of Arf1.

      We agree to add a discussion on known effects in other tissues.

      3) Figure 2. Please show single channels as grey scale, and only merge as RGB. This is easier to see, especially for the colour blind. Likewise, Figure 3D would be clearer in greyscale rather than green, and 6B better in grey than in red.

      We will make these changes.

      4) Figure 5C. A brief comment is needed as to why it might be that BiP and calreticulin are not so efficiently secreted when Arf5 is knocked out in addition to Arf4.

      This was a mistake in labeling that lane and will be corrected. It should read "Arf3+5ko" not "Arf4+5ko. Thank you for pointing this out.

      5) Discussion:

      a) The authors should relate these studies to work in other species. For instance, in yeast reduction of Arf levels causes the Golgi to enlarge (PubMed ID 9487133).

      We can discuss this.

      1. b) Some more discussion is needed of the fact that Arfs may not all act in the same part of the Golgi, which could explain some of the differences observed between the various deletions.

      We can add this point in the discussion.

      Reviewer #3 (Significance (Required)):

      The Arf GTPases have been studied extensively for over 30 years as major regulators of Golgi function. They are essential for the recruitment to Golgi membranes of both COPI and clathrin/AP-1 coats, as well as various other proteins that regulate Golgi function. In addition, they have been reported to have roles in viral replication, and even other cellular processes such as lipid droplet formation and mitochondrial division. In humans there are four Arfs, Arf1 and Arf3 (Class I Arfs), and Arf4 and Arf5 (Class II Arfs). All are present on the Golgi, but their precise individual roles have remained unclear. Attempts have been made to deplete individual Arfs using RNAi, but incomplete knockdowns have made the results hard to interpret.

      **Referee Cross-commenting**

      There is probably no need for a prolonged debate about this, but I agree that the importance of Arf4 is striking, but it reflects the nature of this work that CRISPR has finally allowed these sorts of questions to be addressed unequivocally. COPI is also involved in recycling of Golgi resident enzymes, and it may be that Arf1 acts in this role.

      If the authors check levels of COPI by blotting, and measure the intensity over the Golgi by quantitative IMF, that will reveal whether stability or membrane association if affected without fractionation which is probably less reliable.

      If they want to do some extra experiments, then it would be quite easy to check the levels of some Golgi enzymes, or look at lectin binding as a proxy for glycosylation enzyme levels.

      Overall, I agree with the positive comments of Reviewers 1 and 2, and it good that we all recognise the quality and importance of the work. However, I feel that one or two of their requests go beyond the scope of a single publication, or would add rather little for a lot of additional work. It is of course easy to propose experiments that someone else has to do!

      **[On] Reviewer 1:**

      Point 4. I agree that it would be useful to perform a blot to determine if the levels of coatomer are effected in the various KO lines. I am not sure if Reviewer 1 is also proposing fractionation to determine cytosol vs membrane ratio, but if so, then this would be less useful as peripheral membrane proteins tend to fall off membranes during fractionation and so such analysis is generally questionable. A blot, and clarification of the way the COPI/GM130 ratio is determined, would answer the key points in a relatively straightforward way.

      Point 5. I agree that the defect in retrieval of ER residents in Arf4-KO is striking, but it a clear effect even if the reviewer does not understand it themselves! It does not seem so surprising to me, given that Arf4 is likely to act on the early Golgi were such retrieval occurs from. However, the experiment suggested by myself and Reviewer 32 of checking the levels and localisation of the KDEL-receptor would seem to me a good first step to addressing possible mechanism, and certainly sufficient for an initial publication.

      Point 7. I am not sure that it has been unequivocally established that Arf1 is important in retrograde traffic. The reality is that many labs have taken Arf1 as being representative of all others and so concentrated biochemical and in vivo studies on this protein. This paper is really important as it highlights the need to investigate both Class I and Class II Arfs, and to bear in mind that their roles in vivo may well be more distinct than their in vitro properties would lead one to suspect. Perhaps, the simplest explanation for this is that the GEFs that activate them have a strong preference for one or the other.

      Follow up Comment 1. I was not suggesting that the authors repeat all this in a cell line other than HeLa cells, as this is clearly impractical. HeLa cells are widely used, and so the findings are useful, and whilst it seems certain that some other cell lines would give different data (and indeed the DepMap data show this), then testing one other line would not change the conclusions much. All I wanted the authors to do is to clearly state in the text that what they see in HeLa cells may well be different in other cell lines. This does not detract from the fact that their HeLa cells will provide an excellent platform for focused studies on the role of individual Arfs.

      Follow up Comment 2. I agree that Arf6 is not relevant to this paper (as discussed in detail below).

      Follow up Comment 3. agree that a simple IMF experiment would suffice to check polarity and immuno-EM is technically very demanding and would add little in this context. The authors have already shown that the Golgi forms stacks in the KO cell lines, and I cannot see how this could occur without the stack being polarised - it has to form at one end and then mature to the other. In addition, after decades of working on the Golgi I have yet to see a credible report of a change to cells causing a loss of Golgi polarity, but maintaining a stacked structure. If the Golgi is not polarised it could not form a stack.

      Follow up Comment 4. I agree that one month is perhaps too short to look at KDEL-R, COPI levels and checking polarity by IMF. As noted above, I am NOT suggesting that they repeat all this in a different cell line.

      **[On] Reviewer 2.**

      Point 1. I agree with Reviewer 1 that the authors are correct to ignore Arf6. It is a completely different GTPase with a distinct function in a different part of the cell. The family of Arf1-Arf5 arose in metazoans from a single Arf, but Arf6 had already split away from the Arf1-5 family in the last eukaryotic common ancestor, as Arf6 is present in plants and yeasts. There is overwhelming evidence that Arf1-Arf5 are partially redundant and this has hampered their study. Arf6 does not share these roles. The fact that it is acts on endosomes and has been reported to be on the Golgi (which is not widely agreed), is also true of many other GTPases. Indeed, other distant relatives of Arf1-5 are actually on the Golgi (Arl1, Arl5 etc), but these are also not relevant as like Arf6 they do not bind coat proteins and other major effectors of Arf1-5.

      Point 2. As noted above, it is hard to see how polarity could be affected given that a Golgi stack is formed, but, at most, a simple application of IMF would seem sufficient to confirm this.

      Point 3. Agreed.

      Point 5. I agree with the reviewer that this is (much!) too much to ask for an initial publication. Various labs have already reported analysis of the effectors of Class II Arfs and they tend to overlap with Class I. Moreover, it is quite possible that the difference of role in vivo reflects differing interactions with regulators.

      Point 6. Agreed.

    1. Author Response:

      Reviewer #1 (Public Review):

      In this work, Panigrahi et. al. develop a powerful deep-learning-based cell segmentation platform (MiSiC) capable of accurately segmenting bacteria cells densely packed within both homogenous and heterogeneous cell populations. Notably, MiSiC can be easily implemented by a researcher without the need for high-computational power. The authors first demonstrate MiSiC's ability to accurately segment cells with a variety of shapes including rods, crescents and long filaments. They then demonstrate that MiSiC is able to segment and classify dividing and non-dividing Myxococcus cells present in a heterogenous population of E. coli and Myxococcus. Lastly, the authors outline a training workflow with which MiSiC can be trained to identify two different cell types present in a mixed population using Myxococcus and E. coli as examples.

      While we believe that MiSiC is a very powerful and exciting tool that will have a large impact on the bacterial cell biological community, we feel explanations of how to use the algorithm should be more greatly emphasized. To help other scientists use MiSiC to its fullest potential, the range of applications should be clarified. Furthermore, any inherent biases in MiSiC should be discussed so that users can avoid them.

      We thank the reviewer for the positive feedback and comments to help disseminate MiSiC to the broad bacterial cell biology community as it is meant to. As described above we have largely addressed this comment via the redaction of a comprehensive handbook. As detailed below, we now also provide precise measurements of the MiSiC segmentation accuracy compared to ground truth for the various imaging modalities and bacterial species segmentation.

      Major Concerns:

      1) It is unclear to us how a MiSiC user should choose/tune the value for the noise variance parameter. What exactly should be considered when choosing the noise variance parameter? Some possibilities include input image size, cell size (in pixels), cell density, and variance in cell size. Is there a recommended range for the parameter? These questions along with our second minor correction can be addressed with a paragraph in the Discussion section.

      Setting the noise parameters is now detailed in the handbook (section 1.d). A set of thumb rules and recommendations are provided. In addition a paragraph explaining the importance of noise addition for images with sparse bacterial cell density has been added in the results section.

      “Associated Figure S1. Background noise can lead to spurious cell detection by MiSiC. SI images retain the shape/curvature information of the intensities in a raw image through eigenvalues of the hessian of the image and an arctan function, creating the smooth areas corresponding to cell bodies and propagating noisy regions where there is no shape information. Thus, MiSiC segments the cells by discriminating between “smooth” and “rough” regions. In effect, when adjusting the size parameter, scaling smooths out the image noise, leading to background regions that have a smoother SI than in the raw image. Some of these areas could be falsely detected as bacterial cells. This effect is shown here: When an image with uniform and random intensity values is segmented with MiSiC with increasing smoothening (here using a gaussian blur filter), spurious cell detection becomes apparent. In addition, since the SI keeps the shape information and not the intensity values, background objects that are of relatively low contrast (ie dead cells or debris) may be detected as cells. All these artifacts can be mitigated by adding synthetic noise to the scaled images.”

      2) Could the authors expand on using algorithms like watershed, conditional random fields, or snake segmentation to segment bacteria when there is not enough edge information to properly separate them? How accurate are these methods at segmenting the cells? Should other MiSiC parameters be tuned to increase the accuracy when implementing these methods?

      We thank the reviewer for raising this point as it is important to make clear that post-processing algorithms can certainly improve the accuracy of MiSiC masks downstream. To show this specifically, we further processed MiSiC masks of Bacillus subtilis filamentous cells to resolve division septa using the watershed algorithm. This example is now provided as Figure S3. Importantly, there is no particular MiSiC adjustment that needs to be performed prior to running these processing steps, which can be done directly in Image-J or its bacterial cell analysis plug-in, MicrobeJ. It is worth noting that the post- processing strategy may depend on the scientific question under consideration. In the handbook, we also give an example of post-processing methods that may be used.

      “Associated Figure S3. Refining cell separations with watershed. Watershed methods may be used to obtain a more accurate segmentation of septate filaments such as Bacillus subtilis. In this example applying this method to the MiSiC mask effectively resolves cell boundaries that are not captured in the prediction but are visible by eye (arrows).”

      3) Can the MiSiC's ability to accurately segment phase and brightfield images be quantitatively compared against each other and against fluorescent images for overall accuracy? A figure similar to Fig. 2C, with the three image modalities instead of species would nicely complement Fig. 2A. If the segmentation accuracy varies significantly between image modalities, a researcher might want to consider the segmentation accuracy when planning their experiments. If the accuracy does not vary significantly, that would be equally useful to know.

      This is a very important issue that was also raised by reviewer 3 and which we decided to address in full. For each imaging modality and distinct species, we measured the Jaccard Index as a function of the threshold set for the Intersection over Union (ioU). The resulting curves are now provided in two separate Figures 2 and 3 and a supplemental Figure S2; they provide a robust measure of the segmentation for each modality/tested species.

      “Figure 2. MiSiC predictions under various imaging modalities. a) MiSiC masks and corresponding annotated masks of fluorescence, phase contrast and bright field images of a dense E. coli microcolony. b) Jaccard index as a function of IoU threshold for each modality determined by comparing the MiSiC masks to the ground truth (see Methods). The obtained Jaccard score curves are the average of analyses conducted over three biological replicates and n=763, 811, 799 total cells for Fluorescence, Phase Contrast and Bright Field, respectively (bands are the maximum range, the solid line is the median). The fluorescence images were pre-processed using a Gaussian of Laplacian filter to improve MiSiC prediction (see methods).”

      “Associated Figure S2. MiSiC predictions under various imaging modalities. a) MiSiC masks and corresponding annotated masks of fluorescence, phase contrast and bright field images of a dense M. xanthus microcolony. b) Jaccard index as a function of IoU threshold for each modality determined by comparing the MiSiC masks to the ground truth (see Methods). The obtained curves are the average of analyses conducted over three biological replicates and n=193,206,211 total cells for Fluorescence, Phase Contrast and Bright Field, respectively. The fluorescence (bands are the maximum range, the solid line is the median) images were pre-processed using a Gaussian of Laplacian filter to improve MiSiC prediction (see methods). c) A human observer is slightly less performant than MiSiC. The same ground truth as used in Figure 2 (dashed lines) was compared to an independent observer’s annotation (solid lines) and Jaccard score curves were constructed as shown in Figure 2. BF: Bright Field, PC: Phase Contrast, Fluo: Fluorescence.”

      “Figure 3. MiSiC predictions in various bacterial species and shapes. a) MiSiC masks and corresponding annotated masks of phase contrast images of another Pseudomonas aeruginosa (rod-shape), Caulobacter crescentus (crescent shape) and Bacillus subtilis (filamentous shape). b) Jaccard index as a function of IoU threshold for each species determined by comparing the MiSiC masks to the ground truth (see Methods). The obtained Jaccard score curves are the average of analyses conducted over three biological replicates and n=1149,101,216 total cells for P. aeruginosa, B. subtilis and C. crescentus, respectively (bands are the maximum range, solid line the median). Note that the B. subtilis filaments are well predicted but edge information is missing for optimal detection of the cell separations.”

      4) The ability of MiSiC to segment dense clusters of cells is an exciting advancement for cell segmentation algorithms. However, is there a minimum cell density required for robust segmentation with MiSiC? The algorithm should be applied to a set of sparsely populated images in a supplemental figure. Is the algorithm less accurate for sparse images (perhaps reflected by an increase in false-positive cell identifications)? Any possible biases related to cell density should be noted.

      In fact, MiSiC performs well both with densely or sparsely populated images. In the case of sparsely populated images it is however possible that non-cell objects can occasionally appear in the MiSiC mask. As mentioned above, inclusion of noise can help remove these objects in the sparsely populated images. This issue is now fully explained in a supplemental Figure S1. Of note, non-cell objects -if they were to remain after noise addition- can be eliminated using additional general morphometric filters or specific models fitting bacterial cells, as for example those included in Microbe-J and Oufti. These points are now clarified in the text.

      “Associated Figure S1. Background noise can lead to spurious cell detection by MiSiC. SI images retain the shape/curvature information of the intensities in a raw image through eigenvalues of the hessian of the image and an arctan function, creating the smooth areas corresponding to cell bodies and propagating noisy regions where there is no shape information. Thus, MiSiC segments the cells by discriminating between “smooth” and “rough” regions. In effect, when adjusting the size parameter, scaling smooths out the image noise, leading to background regions that have a smoother SI than in the raw image. Some of these areas could be falsely detected as bacterial cells. This effect is shown here: When an image with uniform and random intensity values is segmented with MiSiC with increasing smoothening (here using a gaussian blur filter), spurious cell detection becomes apparent. In addition, since the SI keeps the shape information and not the intensity values, background objects that are of relatively low contrast (ie dead cells or debris) may be detected as cells. All these artifacts can be mitigated by adding synthetic noise to the scaled images.”

      and:

      “Along similar lines, non-cell objects can appear in the MiSiC masks and while some can be removed by the introduction of noise, an easy way to do it is to apply a post-processing filter, for example using morphometric parameters to remove objects that are not bacteria. This can be easily done using Fiji, MicrobeJ or Oufti."

      5) It is exciting to see the ability of MiSiC to segment single cells of M. xanthus and E. coli species in densely packed colonies (Fig. 4b). Although three morphological parameters after segmentation were compared with ground truth, the comparison was conducted at the ensemble level (Fig. 4c). Could the authors use the Mx-GFP and Ec-mCherry fluorescence as a ground truth at the single cell level to verify the results of segmentation? For example, for any Ec cells identified by MiSiC in Fig. 4b, provide an index of whether its fluorescence is red or green. This single-cell level comparison is most important for the community.

      We have now performed this comparison and determined Jaccard indexes for E. coli and Myxococcus detection using the individual fluorescence images as a reference (figure 5b). Since we were only able to make this comparison in relatively small fields we also kept the comparison of expected morphometric parameters in large images. Taken together, these data now demonstrate that semantic classification as performed does well separate Myxococcus cells from E. coli cells (see more details in our response to reviewer 3).

      Reviewer #2 (Public Review):

      Panigrahi and co-authors introduce a program that can segment a variety of images of rod-shaped bacteria (with somewhat different sizes and imaging modalities) without fine-tuning. Such a program will have a large impact on any project requiring segmentation of a large number of rod-shaped cells, including the large images demonstrated in this manuscript. To my knowledge, training a U-Net to classify an image from the image's shape index maps (SIM) is a new scheme, and the authors show that it performs fairly well despite a small training set including synthetic data that, based on Figure 1, does not closely resemble experimental data other than in shape. The authors discuss extending the method to objects with other shapes and provide an example of labelling two different species - these extensions are particularly promising.

      The authors show that their network can reproduce results of manual segmentation with bright field, phase and fluorescence input. Performance on fluorescence data in Fig. 1 where intensities vary so much is particularly good and shows benefits of the SIM transformation. Automated mapping of FtsZ show that this method can be immediately useful, though the authors note this required post-processing to remove objects with abnormal shapes. The application in mixed samples in Fig. 4 shows good performance. However, no Python workflow or application is provided to reproduce it or train a network to classify mixtures in different experiments.

      We thank the reviewer for the positive comment. As discussed in our answer to reviewer 1, the classification presented in Figure 4 (now Figure 5) is meant to provide an example of how MiSiC can be further used to train networks to classify species in interspecies communities by generating two datasets, one per species of interest, to further train a U-Net. Here, the secondary U-Net was developed to specifically discriminate Myxococcus from E. coli, which is a very specialized application. Hence it was not included in the MiSiC package. Nevertheless the code is accessible at https://github.com/pswapnesh/MyxoColi (which is mentioned in the Methods).

      Performance was compared between SuperSegger with default parameters and MiSiC with tuned parameters for a single data set. Perhaps other SuperSegger parameters would perform better with the addition of noise, and it's unclear that adding Gaussian noise to a phase contrast image is the best way to benchmark performance. An interesting comparison would be between MiSiC and other methods applying neural networks to unprocessed data such as DeepCell and DeLTA, with identical training/test sets and an attempt to optimize free parameters.

      In fact, we believe that it does make sense to test how MiSiC performs in the presence of noise and show that it is robust, making it suitable for use on complex multi-tile images. For this analysis we kept the comparison with Superseger, which provides a reference as it is done on a data set optimized for Superseger segmentation. Importantly, we keep the parameters constant throughout the analysis because it would not be feasible to tweek parameters tile-by-tile in a multi-tile image. This analysis shows that MiSiC is more adapted for this application.

      INSTALLATION: I installed both the command line and GUI versions of MiSiC on a Windows PC in a conda environment following provided instructions. Installation was straightforward for both. MiSiCgui gave one error and required reinstallation of NumPy as described on GitHub. Both give an error regarding AVX2 instructions. MiSiCgui gives a runtime error and does not close properly. These are all fairly small issues. Performance on a stack of images was sufficiently fast for many applications and could be sped up with a GPU implementation.

      We have updated the pip install script available in GitHub for MiSiCgui that remediates some of these issues : There is no more numpy error, it closes properly and there are only warning messages concerning future deprecations in the napari packages. We have tested in Windows 10, Linux Ubuntu 18, and Mac OS Catalina. For the moment it seems impossible to install in Mac OS BigSur maybe due to the python 3.7 requirement. We will work on this problem in the near future. We have removed the command line interface as we are developing future version with an easiest way to provide MiSiC as Napari or FIJI/ImageJ plugin

      TESTING: I tested the programs using brightfield data focused at a different plane than data presumably used to train the MiSiC network, so cells are dark on a light background and I used the phase option which inverts the image. With default settings and a reasonable cell width parameter (10 pixels for E. coli cells with 100-nm pixel width; no added noise since this image requires no rescaling) MiSiCgui returned an 8-bit mask that can be thresholded to give segmentation acceptable for some applications. There are some straight-line artifacts that presumably arise from image tiling, and the quality of segmentation is lower than I can achieve with methods tuned to or trained on my data. Tweaking magnification and added noise settings improved the results slightly. The MiSiC command line program output an unusable image with many small, non-cell objects. Looking briefly at the code, it appears that preprocessing differs and it uses a fixed threshold.

      We thank the reviewer for testing the programs. Tiling related artifacts may now be avoided by excluding a few pixels at the border in the new version of MiSiC code. This is now implemented in the MiSiC.segment function as segment(im,invert = False,exclude = 16). Without seeing the reviewers data it is difficult for us to see how the segmentation (which is said to be acceptable) could be further improved. The command line program has now been removed in favor of continuous development on the graphical interface.

      Reviewer #3 (Public Review):

      The authors aimed to develop a 2D image analysis workflow that performs bacterial cell segmentation in densely crowded colonies, for brightfield, fluorescence, and phase contrast images. The resulting workflow achieves this aim and is termed "MiSiC" by the authors.

      I think this tool achieves high-quality single-cell segmentations in dense bacterial colonies for rod-shaped bacteria, based on inspection of the examples that are shown. However, without a quantification of the segmentation accuracy (e.g. Jaccard coefficient vs. intersection over union, false positive detection, false negative detection, etc), it is difficult to pass a final judgement on the quality of the segmentation that is achieved by MiSiC.

      We thank the reviewer for this comment. To address it we divided the previous Figure 2 into two figures (and associated supplemental figures) separately showing how MiSiC performs (i), to segment two very distinct bacterial species E. coli and Myxococcus under various imaging modalities. (ii) to segment other bacterial species: rods (P. aeruginosa), filaments (B. subtilis) and crescent shapes (C. crescentus). The results now clearly show both the strength and limitations of the system.

      A particular strength of the MiSiC workflow arises from the image preprocessing into the "Shape Index Map" images (before the neural network analysis). These shape index maps are similar for images that are obtained by phase contrast, brightfield, and fluorescence microscopy. Therefore, the neural network trained with shape index maps can apparently be used to analyze images acquired with at least the above three imaging modalities. It would be important for the authors to unambiguously state whether really only a single network is used for all three types of image input, and whether MiSiC would perform better if three separate networks would be trained.

      A single network is using a shape-index-map rather than the original images as an input. As mentioned by the reviewer this is a major strength of the workflow given that it permits segmentation, independent of the imaging modality, which we now measure for each modality.

      As the reviewer hints, three different models specific to each modality (CP, Fluorescence and BF) could also be used to train three networks, allowing the direct end-to-end segmentation of raw images. In theory, this could improve the segmentation (although this might lead to negligible benefits given the actual segmentation quality).

    1. Author Response:

      Reviewer #1:

      In this paper, Alhussein and Smith set out to determine whether motor planning under uncertainty (when the exact goal is unknown before the start of the movement) results in motor averaging (average between the two possible motor plans) or in performance optimization (one movement that maximizes the probability of successfully reaching to one of the two targets). Extending previous work by Haith et al. with two new, cleanly designed experiments, they show that performance optimization provides a better explanation of motor behaviour under uncertainty than the motor averaging hypothesis.

      We thank the reviewer for the kind words.

      1) The main caveat of experiment 1 is that it rules out one particular extreme version of the movement averaging idea- namely that the motor programs are averaged at the level of muscle commands or dynamics. It is still consistent with the idea that the participant first average the kinematic motor plans - and then retrieve the associated force field for this motor plan. This idea is ruled out in Experiment 2, but nonetheless I think this is worth adding to the discussion.

      This is a good point, and we have now included it in the paper as suggested – both in motivating the need for Expt 2 in the Results section and when interpreting the results of Expt 1 in the Discussion section.

      2) The logic of the correction for variability between the one-target and two-target trials in Formula 2 is not clear to me. It is likely that some of the variability in the two-target trials arises from the uncertainty in the decision - i.e. based on recent history one target may internally be assigned a higher probability than the other. This is variability the optimal controller should know about and therefore discard in the planning of the safety margin. How big was this correction factor? What is the impact when the correction is dropped ?

      Short Answer:

      (1) If decision uncertainty contributed to motor variability on 2-target trials as suggested, 2-target trials should display greater motor variability than 1-target trials. However, 1-target and 2-target trials display levels of motor variability that are essentially equal – with a difference of less than 1% overall, as illustrated in Fig R2, indicating that decision uncertainty, if present, has no clear effect on motor variability in our data.

      (2) The sigma2/sigma1 correction factor is, therefore, very close to 1, with an average value of 1.00 or 1.04 depending on how it’s computed. Thus, dropping it has little impact on the main result as shown in Fig R1.

      Longer, more detailed, answer:

      We agree that it could be reasonable to think that if it were true that motor variability on 2-target trials were consistently higher than that on 1-target trials, then the additional variability seen on 2-target trials might result from uncertainty in the decision which should not affect safety margins if the optimal controller knew about this variability. However, detailed analysis of our data suggests that this is not the case. We present several analyses below that flush this out.

      We apologize in advance that the response we provide to this seemingly straightforward comment is so lengthy (4+ pages!), especially since capitulating to the reviewer’s assertion that “correction” for the motor variability differences between 1 & 2-target trails should be removed from our analysis, would make essentially no difference in the main result, as shown Fig R1 above. Note that the error bars on the data show 95% confidence intervals. However, taking the difference in motor variability (or more specifically, it’s ratio) between 1-target and 2-target trials into account, is crucial for understanding inter-individual differences in motor responses in uncertain conditions. As this reviewer (and reviewer 2) points out below, we did a poor job of presenting the inter-individual differences analysis in the original version of this paper, but we have improved both the approach and the presentation in the current revision, and we think that this analysis is important, despite being secondary to the main result about the group-averaged findings.

      Therefore, we present analyses here showing that it is unlikely that decision uncertainty accounts for the individual-participant variability differences we observe between 1-target and 2-target trials in our experiments (Fig R2). Instead, we show that the variability differences we observe in different conditions for individual participants are due to (largely idiosyncratic) spatial differences in movement direction (Fig R3), which when taken into account, afford a clearly improved ability to predict the size of the safety margins around the obstacles, both in 1-target trials where there is no ‘decision’ to be made (Figs R4-R6) and in 2-target trials (Figs R5-R6).

      Variability is, on average, nearly identical on 1-target & 2-target trials, indicating no measurable decision-related increase in variability on 2-target trials

      At odds with the idea that decision uncertainty is responsible for a meaningful fraction of the 2-target trial variability that we measure, we find that motor variability on 2-target trials is essentially unchanged from that on one-target trials overall as shown in Fig R2 (error bars show 95% confidence intervals). This is the case for both the data from Expt 2a (6.59±0.42° vs 6.70±0.96°, p > 0.8), and for the critical data from Expt 2b that was designed to dissociate the MA hypothesis from the PO hypothesis (4.23 ±0.17° vs 4.23±0.27°, p > 0.8 for the data from Expt 2b), as well as when the data from Expts 2a-b are pooled (4.78±0.24° vs 4.81±0.35°, p > 0.8). Note that the nominal difference in motor variability between 1-target and 2-target trials was just 1.7% in the Expt 2a data, 0.1% in the Expt 2b data, and 0.6% in the pooled data. This suggests little to no overall contribution of decision uncertainty to the motor variability levels we measured in Expt 2.

      Correspondingly, the sigma2/sigma1 ‘correction factor’ (which serves to scale the safety margin observed on 1-target trials up or down based on increased or decreased motor variability on 2-target trials) is close to 1. Specifically, this factor is 1.01±0.13 (mean±SEM) for Expt 2a and 1.04±0.09 for Expt 2b, if measured as mean(sigma2i/sigma1i), where sigma1i and sigma2i are the SDs of the initial movement directions on 1-target and 2-target trials. This factor is 1.02 for Expt 2a and 1.00 for Expt 2b, if instead measured as mean(sigma2i)/mean(sigma1i), and thus in either case, dropping it has little effect on the main population-averaged results for Expt 2 presented in Fig 4b in the main paper. Fig R1 shows versions of the PO model predictions in Fig 4b computed with or without dropping the sigma2/sigma1 ‘correction factor’ that reviewer asks about. These with vs without versions are quite similar for the results from both Expt 2a and Expt 2b. In particular, the comparison between our experimental data and the population-average-based model predictions for the MA vs the PO hypotheses, show highly significant differences between the abilities of the MA and PO models to explain the experimental data in Expt 2b (Fig R1, right panel), whether or not the sigma2/sigma1 correction is included for the comparison between MA and PO predictions (p<10-13 whether or not the sigma2/sigma1 term included, p=4.31×10-14 with it vs p=4.29×10-14 without it). Analogously, for Expt 2a (where we did not expect to show meaningful differences between the MA and PO model predictions), we also find highly consistent results when the sigma2/sigma1 term is included vs not (Fig R1, left panel) (p=0.37 for the comparison between PO and MA predictions with the sigma2/sigma1 term included vs 0.38 without it).

      Analysis of left-side vs right-side 1-target trial data indicates the existence of participant-specific spatial patterns of variability.

      With the participant-averaged data showing almost identical levels of motor variability on 1-target and 2-target trials, it is not surprising that about half of participants showed nominally greater variability on 1-target trials and about half showed nominally greater variability on 2-target trials. What was somewhat surprising, however, was that 16 of the 26 individual participants in Expt 2b displayed significantly higher variability in one condition or the other at α=0.05 (and 12/26 at α=0.01). Why might this be the case? We found an analogous result when breaking down the 1-target trial data into +30° (right-target) and -30° (left-target) trials that could offer an explanation. Note that the 2-target trial data come from intermediate movements toward the middle of the workspace, whereas the 1-target trial data come from right-side or left-side movements that are directed even more laterally than the +30° or -30° targets themselves (the average movement directions to these obstacle-obstructed lateral targets were +52.8° and -49.0°, respectively, in the Expt 2b data, see Fig 4a in the main paper for an illustration). Given the large separation between 1 & 2-target trials (~50°) and between left and right 1-target trails (~100°), differences in motor variability would not be surprising. The analyses illustrated in Figs R3-R6 show that these spatial differences indeed have large intra-individual effects on movement variability (Fig R3) and, critically, large a subsequent effect on the ability to predict the safety margin observed in one movement direction from motor variability observed at another (Figs R4-R6).

      Fig R3 shows evidence for intra-individual direction-dependent differences in motor variability, obtained by looking at the similarity between within-participant spatially-matched (e.g. left vs left or right vs right, Fig R3a) compared to spatially-mismatched (left vs right, Fig R3b) motor variability across individuals. To perform this analysis fairly, we separated the 60 left-side obstacle1-target trial movements for each participant into those from odd-numbered vs even-numbered trials (30 each) to be compared. And we did the same thing for the 60 right-side obstacle 1-target trial movements. Fig R3a shows that there is a large (r=+0.70) and highly significant (p<10-6) across-participant correlation between the variability measured in the spatially-matched case, i.e. for the even vs odd trials from same-side movements, indicating that the measurement noise for measuring movement variability using n=30 movements (movement variability was measured by standard deviation) did not overwhelm inter-individual differences in movement variability.

      The strength of this correlation would increase/decrease if we had more/less data from each individual because that would decrease/increase the noise in measuring each individual’s variability. Therefore, to be fair, we maintained the same number of data points for each variability measurement (n=30) for the spatially-mismatched cases shown in Fig R3b and R3c. The strong positive relationship between odd-trial and even-trial variability across individuals that we observed in the spatially-matched case is completely obscured when the target direction is not controlled for (i.e. not maintained) within participants, even though left-target and right-target movements are randomly interspersed. In particular, Fig R3b shows that there remains only a small (r=+0.09) and non-significant (p>0.5) across-participant correlation between the variability measured for the even vs odd trials from opposite-side movements that have movement directions separated by ~100°. This indicates that idiosyncratic intra-individual spatial differences in motor variability are large and can even outweigh inter-individual differences in motor variability seen in Fig R3a. Fig R3c shows that an analogous effect holds between the laterally-directed 1-target trials and the more center-directed 2-target trials that have movement directions separated by ~50°. In this case, the correlation that remains when the target direction is not is maintained within participants, is also near zero (r=-0.13) and non-significant (p>0.3). It is possible that some other difference between 1-target & 2-target trials might also be at play here, but there is unlikely to be a meaningful effect from decision variability given the essentially equal group-average variability levels (Fig R2).

      Analysis of left-side vs right-side 1-target trial data indicates that participant-specific spatial patterns of variability correspond to participant-specific spatial differences in safety margins.

      Critically, dissection of the 1-target trial data also shows that the direction-dependent differences in motor variability discussed above for right-side vs left-side movements predict direction-dependent differences in the safety margins. In particular, comparison of panels a & b in Fig R4 shows that motor variability, if measured on the same side (e.g. the right-side motor variability for the right-side safety margin), strongly predicts interindividual differences in safety margin (r=0.60, p<0.00001, see Fig R4b). However, motor variability, if measured on the other side (e.g. the right-side motor variability for the left-side safety margin), fails to predict interindividual differences in safety margin (r=0.15, p=0.29, see Fig R4a). These data show that taking the direction-specific motor variability into account, allows considerably more accurate individual predictions of the safety margins used for these movements. In line with that idea, we also find that interindividual differences in the % difference between the motor variability measured on the left-side vs the right-side predicts inter-individual differences in the % difference between the safety margin measured on the left-side vs the right-side as shown in Fig R4c (r=0.52, p=0.006).

      Analyses of both 1-target trial and 2-target trial data indicate that participant-specific spatial patterns of variability correspond to participant-specific spatial differences in safety margins.

      Not surprisingly, the spatial/directional specificity of the ability to predict safety margins from measurements of motor variability observed in the 1-target trial data in Fig R4, is present in the 2-target data as well. Comparison of panels a-d in Fig R5 shows that motor variability from 1-target and 2-target trial data in Expt 2b strongly predict interindividual differences in 1-target and 2-target trial safety margins (r=0.72, p=3x10-5 for the 2-target trial data (see Fig R5d), r=0.59, p=1x10-3 for the 1-target trial data (see Fig R5a)).

      This is the case even though the 1-target and 2-target trial data display essentially equal population-averaged levels of motor variability. However, in Expt 2b, motor variability, if measured on 1-target trials fails to predict inter-individual differences in the safety margin on 2-target trials (r=0.18, p=0.39, see Fig R5c), and motor variability, if measured on 2 target trials fails to predict inter-individual differences in the safety margin on 1-target trials (r=-0.12, p=0.55, see Fig R5b). As an aside, note that Fig 5a is similar to 4b in content, in that 1-target trial safety margins are plotted against motor variability levels in both cases. But in 5a, the left and right- target data are averaged whereas in 4b the left and right-target data are both plotted resulting in 2N data points. Also note that the correlations are similar, r=+0.59 vs r=+0.60, indicating that in both cases the amount of motor variability predicts the size of the safety margin.

      A final analysis indicating that the spatial specificity of motor variability rather than the presence of decision variability accounts for the ability to predict safety margins is shown in Fig R6. This analysis makes use of the contrast between Expt 2b (where there is a wide spatial separation (51° on average) between 1-target trials and 2-target trials because participants steer laterally around the Expt 2b 1-target trial obstacles, i.e. away from the center), and Expt 2a (where there is only a narrow spatial separation (10.4° on average) between the movement directions of 1-target trials and 2-target trials because participants steer medially around the Expt 2a 1-target trial obstacles, i.e. toward the center). If the spatial specificity of motor variability drove the ability to predict safety margins (and thus movement direction) on 2-target trials, then such predictions should be noticeably improved in Expt 2a compared to Expt 2b, because the spatial match between 1-target trials and 2-target trials is five-fold better in Expt 2a than in Expt2b. Fig R6 shows that this is indeed the case. Specifically, comparison of the 3rd and 4th clusters of bars (i.e. the data on the right side of the plot), shows that the ability to predict 2-target trial safety margins from 1-target trial variability and conversely the ability to predict 1-target trial safety margins from 2-target trial variability are both substantially improved in Expt 2a compared to Expt 2b (compare the grey bars in the 4th vs the 3rd clusters of bars).

      Moreover, comparison of the 1st and 2nd clusters of bars (i.e. the data on the left side of the plot), shows that the ability to predict left 1-target trial safety margins from right 1-target trial variability and conversely the ability to predict right 1-target trial safety margins from left 1-target trial variability are also both substantially improved in Expt 2a compared to Expt 2b (compare the grey bars in the 1st vs the 2nd clusters of bars). This corresponds to a spatial separation between the movement directions on left vs right 1-target trials of 20.7° on average in Expt 2a in contrast to a much greater 102° in Expt 2b.

      The analyses illustrated in Figs R4-R6 make it clear that accurate prediction of interindividual differences in safety margins critically depend on spatially-specific information about motor variability, and we have, therefore, included this information for the analyses in the main paper, as it is especially important for the analysis of inter-individual differences in motor planning presented in Fig 5 of the manuscript.

      3) Equation 3 then becomes even more involved and I believe it constitutes somewhat of a distractions from the main story - namely that individual variations in the safety margin in the 1-target obstacle-obstructed movements should lead to opposite correlations under the PO and MA hypotheses with the safety margin observed in the uncertain 2-target movements (see Fig 5e). Given that the logic of the variance-correction factor (pt 2) remains shaky to me, these analyses seem to be quite removed from the main question and of minor interest to the main paper.

      The reviewer makes a good point. We agree that the original presentation made Equation 3 seem overly complex and possibly like a distraction as well. Based on the comment above and a number of comments and suggestions from Reviewer 2, we have now overhauled this content – streamlining it and making it clearer, in both motivation and presentation. Please see section 2.2 in the point-by-point response to reviewer 2 for details.

      Reviewer #2:

      The authors should be commended on the sharing of their data, the extensive experimental work, the experimental design that allows them to get opposite predictions for both hypotheses, and the detailed of analyses of their results. Yet, the interpretation of the results should be more cautious as some aspects of the experimental design offer some limitations. A thorough sensitivity analysis is missing from experiment 2 as the safety margin seems to be critical to distinguish between both hypotheses. Finally, the readability of the paper could also be improved by limiting the use of abbreviations and motivate some of the analyses further.

      We thank the reviewer for the kind words and for their help with this manuscript.

      1) The text is difficult to read. This is partially due to the fact that the authors used many abbreviations (MA, PO, IMD). I would get rid of those as much as possible. Sometimes, having informative labels could also help FFcentral and FFlateral would be better than FFA and FFB.

      We have reduced the number of abbreviations used in the paper from 11 to 4 (Expt, FF, MA, PO), and we thank the reviewer for the nice suggestion about changing FFA and FFB to FFLATERAL and FFCENTER. We agree that the suggested terms are more informative and have incorporated them.

      2) The most difficult section to follow is the one at the end of the result sections where Fig.5 is discussed. This section consists of a series of complicated analyses that are weakly motivated and explained. This section (starting on line 506) appears important to me but is extremely difficult to follow. I believe that it is important as it shows that, at the individual level, PO is also superior to MA to predict the behavior but it is poorly written and even the corresponding panels are difficult to understand as points are superimposed on each other (5b and e). In this section, the authors mention correcting for Mu1b and correcting for Sig2i/Sig1Ai but I don't know what such correction means. Furthermore, the authors used some further analyses (Eq. 3 and 4) without providing any graphical support to follow their arguments. The link between these two equations is also unclear. Why did the authors used these equations on the pooled datasets from 2a and 2b ? Is this really valid ? It is also unclear why Mu1Ai can be written as the product of R1Ai and Sig1Ai. Where does this come from ?

      We agree with the reviewer that this analysis is important, and the previous explanation was not nearly as clear as it could have been. To address this, we have now overhauled the specifics of the context in Figure 5 and the corresponding text – streamlining the text and making it clearer, in both motivation and presentation (see lines 473-545 in the revised manuscript). In addition to the improved text, we have clarified and improved the equations presented for analysis of the ability of the performance optimization (PO) model to explain inter-individual differences in motor planning in uncertain conditions (i.e. on 2-target trials) and have provided more direct graphical support for them. Eq 4 from the original manuscript has been removed, and instead we have expanded our analyses on what was previously Eq 3 (now Eq 5 in the revised manuscript). We have more clearly introduced this equation as a hybrid between using group-averaged predictions and participant-individualized predictions, where the degree of individualization for all parameters is specified with the individuation index 𝑘. For example, a value of 1 for 𝑘 would indicate complete weighting of the individuated model predictors. The equation that follows in the revised manuscript, Eq 6, is a straightforward extension of Eq 5 where each model parameter was instead multiplied by a different individuation index. With this, we now present the partial-R2 statistic associated with each model predictor (see revised Figs 5a and 5e) to elucidate the effect of each. We have, additionally, now plotted the relationships between the each of the 3 model predictors and the inter-individual differences that remain when the other two predictors are controlled (see revised Figs 5b-d and Fig 5f-h). These analyses are all shown separately for each experiment, as per the reviewer’s suggestion, in the revised version of Fig 5.

      Overall, this section is now motivated and discussed in a more straightforward manner, and now provides better graphical support for the analyses reported in the manuscript. We feel that the revised analysis and presentation (1) more clearly shows the extent to which inter-individual differences in motor planning can be explained by the PO model, and (2) does a better job of breaking down how the individual factors in the model contribute to this. We sincerely thank the reviewer for helping us to make the paper easier to follow and better illustrated here.

      3) In experiment 1, does the presence of a central target not cue the participants to plan a first movement towards the center while such a central target was never present in other motor averaging experiment.

      Unfortunately, the reviewer is mistaken here, as central target locations were present in several other experiments that advocated for motor averaging which we cite in the paper. The central target was not present on any 2-target trials in our experiments, in line with previous work. It was only present on 1-target center-target trials.

      In the adaptation domain, people complain that asking where people are aiming would induce a larger explicit component. Similarly, one could wonder whether training the participants to a middle target would not induce a bias towards that target under uncertainty.

      Any “bias” of motor output towards the center target would predict an intermediate motor output which would favor neither model because our experiment designs result in predictions for motor output on different sides of center for 2-target trials in both Expt 1 and Expt 2b. Thus we think any such effect, if it were to occur, would simply reduce the amplitude of the result. However, we found an approximately full-sized effect, suggesting that this is not a key issue.

      4) The predictions linked to experiment 2 are highly dependent on the amount of safety margin that is considered. While the authors mention these limitations in their paper, I think that it is not presented with enough details. For instance, I would like to see a figure similar to Fig.4B when the safety margin is varied.

      We apologize for any confusion here. The reviewer seems to be under the impression that we can specifically manipulate safety margins around the obstacle in making model predictions for experiment 2. This is, however, not the case for either of the two safety margins in the performance-optimization (PO) modelling. Let us clarify. First, the safety margin on 1-target trials, which serves as input to the PO model, is experimentally measured on obstacle-present 1-target trials, and thus cannot be manipulated. Second, the predicted safety margin on 2-target trials is the output of the PO model and thus cannot be manipulated. There is only one parameter in the main PO model (the one for making the PO prediction for the group-average data presented in Fig 4b, see Eq 4), and that is the motor cost weighting coefficient (𝛽). 𝛽 is implicitly present in Eq 2 as well, fixed at 1/2 in this baseline version of the PO model. It is of course true that changing the motor cost weighting will affect the model output (the predicted 2-trial safety margin), but we do not think that the reviewer is referring to that here, since he or she asks about that directly in section 2.4.4 and in section 2.4.6 below, where we provide the additional analysis requested.

      For exp1, it would be good to demonstrate that, even when varying the weight of the two one-target profiles for motor averaging, one never gets a prediction that is close to what is observed.

      Here the reviewer is referring an apparent inconsistency between our analysis of Expts 1 and 2, because in Expt 2 (but not in Expt 1) we examine the effect of varying the relative weight of the two 1-target trials for motor averaging. However, we only withheld this analysis in Expt 1 because it would have little effect. Unlike Expt 2, the measured motor output on left and right 1-target trials in Expt 1 is remarkably similar (see the left panel in Fig R7a below (which is based on Fig 2b from the manuscript)). This is because left and right 1-target trials in Expt 1 were adapted to the same FF perturbation ( FFLATERAL in both cases), whereas left and right 1-target trials in Expt 2 received very different perturbation levels, because one of these targets was obstacle-obstructed and the other was not. Therefore, varying the relative weightings in Expt 1 would have little effect on the MA prediction as shown in Fig R7b at right. We now realize that is point was not explained to readers, and we have now modified the text in the results section where the analysis of Expt 1 is discussed in order to include a summary of the explanation offered above. We thank the reviewer for surfacing this.

      It is unclear in the text that the performance optimization prediction simply consists of the force-profile for the center target. The authors should motivate this choice.

      We’re a bit unclear about this comment. This specific point is addressed in the first paragraph under the Results section, the second paragraph under the subsection titled “Adaptation to novel physical dynamics can elucidate the mechanisms for motor planning under uncertainty”, the Figure 2 captions, and in the second paragraph under the subsection titled “Adaptation to a multi-FF environment reveals that motor planning during uncertainty occurs via performance-optimization rather than motor averaging”. Direct quotes from the original manuscript are below:

      Line 143: “However, PO predicts that these intermediate movements should be planned so that they travel towards the midpoint of the potential targets in order to maximize the probability of final target acquisition. This would, in contrast to MA, predict that intermediate movements incorporate the learned adaptive response to FFB, appropriate for center-directed movements, allowing us to decisively dissociate PO from MA.”

      Line 200: “In contrast, PO would predict that participants produce the force pattern (FFB) appropriate for optimizing the planned intermediate movement since this movement maximizes the probability of successful target acquisition5,34 (Fig 1d, right).”

      Line 274: “The 2-target trial MA prediction corresponds to the average of the force profiles (adaptive responses) associated with the left and right 1-target EC trials plotted in Fig 2b, whereas the 2-target trial PO prediction corresponds to the force profile associated with the center target plotted in Fig 2b, as this is appropriate for optimizing a planned intermediate movement.”

      For the second experiment 2, the authors do not present a systematic sensitivity analysis. Fig. 5a and d is a good first step but they should also fit the data on exp2b and see how this could explain the behavior in exp 2a. Second, the authors should present the results of the sensitivity analysis like they did for the main predictions in Fig.4b.

      We thank the reviewer for these suggestions. We have now included a more-complete analysis in Fig R8 below, and presented it in the format of Fig 4b as suggested. Please note that we have included the analysis requested above in a revised version of Fig 4b in the manuscript, and ta related analysis requested in section 2.4.6 in the supplementary materials.

      Specifically, the partial version of the analysis that had been presented (where the cost weighting for PO as well as the target weighting for MA were fit on Expt 2a and cross-validated using the Expt 2b data, but not conversely fit on Expt 2b and tested on Expt 2a) was expanded to include cross-validation of the Expt 2b fit using the Expt 2a data. As expected, the results from the converse analysis (Expt2b à Expt2a) mirror the results from the original analysis (Expt 2a à Expt 2b) for the cost weighting in the PO model, where the self-fit mean squared prediction errors modestly by 11% for the Expt 2a data, and by 29% for the Expt 2b data. In contrast, for the target weighting in the MA model, the cross-validated predictions did not explain the data well, increasing the self-fit mean squared prediction errors by 115% for the Expt 2a data, and by 750% for the Expt 2b data. Please see lines 411-470 in the main paper for a full analysis.

      While I understand where the computation of the safety margin in eq.2 comes from, reducing the safety margin would make the predictions linked to the performance optimization look more and more towards the motor averaging predictions. How bad becomes the fit of the data then ?

      We think that this is essentially the same question as that asked in above in section 2.4.1. Please see our response in that section above. If that response doesn’t adequately answer this question, please let us know!

      How does the predictions look like if the motor costs are unbalanced (66 vs. 33%, 50 vs. 50% (current prediction), 33 vs. 66% ). What if, in Eq.2 the slope of the relationship was twice larger, twice smaller, etc.

      Fig R8 above shows how PO prediction would change using the 2:1 (66:33) and 1:2 (33:66) weightings suggested by the reviewer here, in comparison to the 1:1 weighting present in the original manuscript, the Expt 2a best fit weighting present in the original manuscript, and the Expt 2b best fit weighting that the reviewer suggested we include in section 2.4.2. Please note that this figure is now included as a supplementary figure to accompany the revised manuscript.

      The safety margin is the crucial element here. If it gets smaller and smaller, the PO prediction would look more and more like the MA predictions. This needs to be discussed in details. I also have the impression that the safety margin measured in exp 2a (single target trials) could be used for the PO predictions as they are both on the right side of the obstacle.

      We again apologize for the confusion. We are already using safety margin measurements to make PO predictions. Specifically, within Expt 2a, we use safety margin measurements from 1-target trials (in conjunction with variability measurements on 1 & 2 target trials) to estimate safety margins on 2-target trials. And analogously within Expt 2b, we use safety margin measurements from 1-target trials (in conjunction with variability measurements on 1 & 2 target trials) to estimate safety margins on 2-target trials. Fig 4b in the main paper shows the results of this prediction (and it now also includes the cross-validated predictions of the refined models as requested in Section 2.4.4 above. Relatedly Fig R1 in this letter shows that, at the group-average level, these predictions for 2-target trial behavior in both Expt 2a and Expt 2b are essentially identical whether they are based solely on the safety margins observed on 1-target trials or on these safety margins corrected for the relative motor variabilities on 1-target and 2-target trials.

      5) On several occasions (e.g. line 131), the authors mention that their result prove that humans form a single motor plan. They don't have any evidence for this specific aspect as they can only see the plan that is expressed. They can prove that the latter is linked to performance optimization and not to the motor averaging one. But the absence of motor averaging does not preclude the existence of other motor plans…. Line 325 is the right interpretation.

      Thanks for catching this. We agree and have now revised the text accordingly (see for example, lines 53, 134, and 693-695 in the revised manuscript).

      6) Line 228: the authors mention that there is no difference in adaptation between training and test periods but this does not seem to be true for the central target. How does that affect the interpretation of the 2-target trials data ? Would that explain the remaining small discrepancy between the refined PO prediction and the data (Fig.2f) ?

      There must be some confusion here. The adaptation levels in the training period and the test period data from the central target are indeed quite similar, with only a <10% nominal difference in adaptation between them that is not close to statistically significant (p=0.14). We also found similar adaptation levels between the training and test epochs for the lateral targets (p=0.65 for the left target and p=0.20 for the right target). We further note that the PO predictions are based on test period data. And so, even if there were a clear decrease in adaptation between training and test periods, it would not affect the fidelity of the predictions or present a problem, except in the extreme hypothetical case where the reduction was so great that the test period adaptation was not clearly different from zero (as that would infringe on the ability of the paradigm to make clearly opposite predications for the MA and PO model) – but that is certainly not the case in our data.

      Reviewer #3:

      In this study, Alhussein and Smith provide two strong tests of competing hypotheses about motor planning under uncertainty: Averaging of multiple alternative plans (MA) versus optimization of motor performance (PO). In this first study, they used a force field adaptation paradigm to test this question, asking if observed intermediate movements between competing reach goals reflected the average of adapted plans to each goal, or a deliberate plan toward the middle direction. In the second experiment, they tested an obstacle avoidance task, asking if obstacle avoidance behaviors were averaged with respect to movements to non-obstructed targets, or modulated to afford optimal intermediate movements based on a commuted "safety margin." In both experiments the authors observed data consistent with the PO hypothesis, and contradictory of the MA hypothesis. The authors thus conclude that MA is not a feasible hypothesis concerning motor planning under uncertainty; rather, people appear to generate a single plan that is optimized for the task at hand.

      I am of two minds about this (very nice) study. On the one hand, I think it is probably the most elegant examination of the MA idea to date, and presents perhaps the strongest behavioral evidence (within a single study) against it. The methods are sound, the analysis is rigorous, and it is clearly written/presented. Moreover, it seems to stress-test the PO idea more than previous work. On the other hand, it is hard for me to see a high degree of novelty here, given recent studies on the same topic (e.g. Haith et al., 2015; Wong & Haith, 2017; Dekleva et al., 2018). That is, I think these would be more novel findings if the motor-averaging concept had not been very recently "wounded" multiple times.

      We thank the reviewer for the kind words and for their help with this manuscript.

      The authors dutifully cite these papers, and offer the following reasons that one of those particular studies fell short (I acknowledge that there may be other reasons that are not as explicitly stated): On line 628, it is argued that Wong & Haith (2017) allowed for across-condition (i.e., timing/spacing constraints) strategic adjustments, such as guessing the cued target location at the start of the trial. It is then stated that, "While this would indeed improve performance and could therefore be considered a type of performance-optimization, such strategic decision making does not provide information about the implicit neural processing involved in programming the motor output for the intermediate movements that are normally planned under uncertain conditions." I'm not quite sure the current paper does this either? For example, in Exp 1, if people deliberately strategize to simply plan towards the middle on 2-target trials and feedback-correct after the cue is revealed (there is no clear evidence against them doing this), what do the results necessarily say about "implicit neural processing?" If I deliberately plan to the intermediate direction, is it surprising that my responses would inherit the implicit FF adaption responses from the associated intermediate learning trials, especially in light of evidence for movement- and/or plan-based representations in motor adaptation (Castro et al., 2011; Hirashima & Nozacki, 2012; Day et al., 2016; Sheahan et a., 2016)?

      The reviewer has a completely fair point here, and we agree that the experiments in the current study are amenable to explicit strategization. Thus, without further work, we cannot claim that the current results are exclusively driven by implicit neural processing.

      As the reviewer alludes to below, the possibility that the current results are driven by explicit processes in addition to or instead of implicit ones does not directly impact any of the analyses we present – or the general finding that performance-optimization, not motor averaging, underlies motor planning during uncertainty. Nonetheless, we have added a section in the discussion section to acknowledge this limitation. Furthermore, we highlight previous work demonstrating that restriction of movement preparation time suppresses explicit strategization (as the reviewer hints at below), and we suggest leveraging this finding in future work to investigate how motor output during goal uncertainty might be influenced under such constraints. This portion of the discussion section is quoted below:

      “An important consideration for the present results is that sensorimotor control engages both implicit and explicit adaptive processes to generate motor output47. Because motor output reflects combined contributions of these processes, determining their individual contributions can be difficult. In particular, the experiments in the present study used environmental perturbations to induce adaptive changes in motor output, but these changes may have been partially driven by explicit strategies, and thus the extent to which the motor output measured on 2-target trials reflects implicit vs explicit feedforward motor planning requires further investigation. One method for examining implicit motor planning during goal uncertainty might take inspiration from recent work showing that in visuomotor rotation tasks, restricting the amount of time available to prepare a movement appears to limit explicit strategization from contributing to the motor response48–51. Future work could dissociate the effects of MA and PO on intermediate movements in uncertain conditions at movement preparation times short enough to isolate implicit motor planning.”

      In that same vein, the Gallivan et al 2017 study is cited as evidence that intermediate movements are by nature implicit. First, it seems that this consideration would be necessarily task/design-dependent. Second, that original assumption rests on the idea that a 30˚ gradual visuomotor rotation would never reach explicit awareness or alter deliberate planning, an assumption which I'm not convinced is solid.

      We generally agree with the reviewer here. We might add that in addition to introducing the perturbation gradually, Gallivan and colleagues enforced a short movement preparation time (325ms). However, we agree that the extent to which explicit strategies contribute to motor output should clearly vary from one motor task to another, and on this basis alone, the Gallivan et al 2017 study should not be cited as evidence that intermediate movements must universally reflect implicit motor planning. We have explained this limitation in the discussion section (see quote below) and have revised the manuscript accordingly.

      “We note that Gallivan et al. 2017 attempted to control for the effects of explicit strategies by (1) applying the perturbation gradually, so that it might escape conscious awareness, and (2) enforcing a 325ms preparation time. Intermediate movements persisted under these conditions, suggesting that intermediate movements during goal uncertainty may indeed be driven by implicit processes. However, it is difficult to be certain whether explicit strategy use was, in fact, effectively suppressed, as the study did not assess whether participants were indeed unaware of the perturbation, and the preparation times used were considerably larger than the 222ms threshold shown to effectively eliminate explicit contributions to motor output."

      The Haith et al., 2015 study does not receive the same attention as the 2017 study, though I imagine the critique would be similar. However, that study uses unpredictable target jumps and short preparation times which, in theory, should limit explicit planning while also getting at uncertainty. I think the authors could describe further reasons that that paper does not convince them about a PO mechanism.

      We had omitted a detailed discussion of the Haith et al 2015 study as we think that the key findings, while interesting, have little to do with motor planning under uncertainty. But we now realize that we owe readers an explanation of our thoughts about it, which we have now included in the Discussion. This paragraph is quoted below, and we believe it provides a compelling reason why the Haith et al. 2015 study could be more convincing about PO for motor planning during uncertainty.

      “Haith and colleagues (2015) examined motor planning under uncertainty using a timed-response reaching task where the target suddenly shifted on a fraction (30%) of trials 150-550ms] before movement initiation. The authors observed intermediate movements when the target shift was modest (±45°), but direct movements towards either the original or shifted target position when the shift was large (±135°). The authors argued that because intermediate movements were not observed under conditions in which they would impair task performance, that motor planning under uncertainty generally reflects performance-optimization. This interpretation is somewhat problematic, however. In this task, like in the current study, the goal location was uncertain when initially presented; however, the final target was presented far enough before movement onset that this uncertainty was no longer present during the movement itself, as evidenced by the direct-to-target motion observed when the target location was shifted by ±135°. Therefore the intermediate movements observed when the target location shifted by ±45° are unlikely to reflect motor planning under uncertain conditions. Instead, these intermediate movements likely arose from a motor decision to supplement the plan elicited by the initial target presentation with a corrective augmentation when the plan for this augmentation was certain. The results thus provide beautiful evidence for the ability of the motor system to flexibly modulate the correction of existing motor plans, ranging from complete inhibition to conservative augmentation, when new information becomes available, but provide little information about the mechanisms for motor planning under uncertain conditions.”

      If the participants in Exp 2 were asked both "did you switch which side of the obstacle you went around" and "why did you do that [if yes to question 1]", what do the authors suppose they would say? It's possible that they would typically be aware of their decision to alter their plan (i.e., swoop around the other way) to optimize success. This is of course an empirical question. If true, it wouldn't hurt the authors' analysis in any way. However, I think it might de-tooth the complaint that e.g. the Wong & Haith study is too "explicit."

      The participants in Expts 1, 2a, and 2b were all distinct, so there was no side-switching between experiments per se. However, the reviewer’s point is well taken. Although we didn’t survey participants, it’s hard to imagine that any were unaware of which side they traveled around the obstacle in Expt 2. Certainly, there was some level of awareness in our experiments, and while we would like to believe that the main findings arose from low-level, implicit motor planning, we frankly do not know the extent to which our findings may have depended on explicit planning. We have now clarified this key point and discussed it’s implications in the discussion section of the revised paper. That said, we do still think that the direct-to-target movements in the Wong and Haith study were likely the result of a strategic approach to salvaging some reward in their task. Please see the new section in the discussion titled: “Implicit and explicit contributions to motor planning under uncertainty” which for convenience is copied below:

      Implicit and explicit contributions to motor planning under uncertainty An important consideration for the present results is that sensorimotor control engages both implicit and explicit adaptive processes to generate motor output. Because motor output reflects combined contributions of these processes, determining their individual contributions can be difficult. In particular, the experiments in the present study used environmental perturbations to induce adaptive changes in motor output, but these changes may have been partially driven by explicit strategies, and thus the extent to which the motor output measured on 2-target trials reflects implicit vs explicit feedforward motor planning requires further investigation. One method for examining implicit motor planning during goal uncertainty might take inspiration from recent work showing that in visuomotor rotation tasks, restricting the amount of time available to prepare a movement appears to limit explicit strategization from contributing to the motor response. Future work could dissociate the effects of MA and PO on intermediate movements in uncertain conditions at movement preparation times short enough to isolate implicit motor planning.

      We note that Gallivan et al. 2017 attempted to control for the effects of explicit strategies by (1) applying the perturbation gradually, so that it might escape conscious awareness, and (2) enforcing a 325ms preparation time. Intermediate movements persisted under these conditions, suggesting that intermediate movements during goal uncertainty may indeed be driven by implicit processes. However, it is difficult to be certain whether explicit strategy use was, in fact, effectively suppressed, as the study did not assess whether participants were indeed unaware of the perturbation, and the preparation times used were considerably larger than the 222ms threshold shown to effectively eliminate explicit contributions to motor output.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): **Summary:** In this study authors investigated the role of NAMPT, NAD+ and PARP1/parthanatos in skin inflammation using a zebrafish psoriasis model with an hypomorphic mutation of spint1a and human organotypic 3D skin models of psoriasis. Authors showed that genetic deletion and/or pharmacological inhibition of Nampt/PARP1/AIFM1/NADPH oxidases reduced oxidative stress, inflammation, keratinocyte DNA damage, hyperproliferation and cell death in zebrafish models of chronic skin inflammation. Authors also showed the expression of pathology-associated genes in human organotypic 3D skin models of psoriasis with pharmacological inhibition of Nampt/PARP1/AIFM1/NADPH oxidases. The key finding of this study is that PARP1 hyperactivation caused by ROS-induced DNA damage mediates skin inflammation through parthanatos. **Major comments:** This is a very comprehensive study to investigate the role of PARP1 in skin inflammation. The main conclusion was made based on the genetic inhibition and/or pharmacological inhibition of Nampt/PARP1/AIFM1/NADPH oxidases. Although the finding of this study that NAMPT-derived NAD+ fuels PARP1 to promote skin inflammation through parthanatos is interesting and important, there are lots of major concerns and questions, which have to be addressed to better support the main conclusion. In addition, the data and methods were not presented with sufficient detail.

      1. This study is heavily relied on pharmacology inhibition. However, the specificity and selectivity of many inhibitors were not tested in this study.

      At least 3 concentrations of each inhibitor were tested and the lowest one able to rescue the phenotype was then used for further testing (please, see Table S1). More importantly, the specificity of all compounds used were confirmed by genetic inhibition of their targets.

      Fig. 1: it is quite confusing how NAD+ increases H2O2 levels? Is NAD+ cell permeable? It is not clear if NAD+ has been really up taken by cells in the larvae. If NAD+ fuels PARP1 to promote skin inflammation, why NAM treatment increased H2O2 levels but NMN precursor failed to increase skin oxidative stress? No reasonable explanation has been provided.

      This is an interesting point. We have shown that exogenous NAD+ added in the water of larvae increased larval NAD+ (please, see Fig. 2K). It has been shown that neurons can take up NAD+ through CX43 (Fig. S7), so a similar mechanism may operate in larval skin. As regards, the effect of NAM and NMN, a recent study has demonstrated that NAM supplementation increased zebrafish larval NAD+; however, NA, NMN and NR failed to boost larval NAD+ level (PMID: 32197067). These results are consistent with our data.

      Fig. 1E and 1G: it is not clear what is the green channel. Similarly, there is no clear description what is red or green in many other figures.

      To help the interpretation of larval pictures, we have indicated in all figures what is analyzed in each fluorescent channel.

      1. Fig. 1K and 1L: It is hard to understand why FK-866 reduced H2O2 release, but it increased neutrophils infiltration. How to interpret this conclusion?

      Fig. 2C-D: Why low doses FK-866 reduced neutrophil infiltration whereas high dose FK-866 increased neutrophil infiltration?

      Answer to 4&5: As it was explained in lines 145-156, FK-866 induces NF-kB activation in the muscle and neutrophil infiltration in this tissue when used at 100 uM. This result may be deleted if the reviewers think it is confusing, since a 10 uM dose was used in all subsequent experiments to study the impact of Nampt in skin inflammation. This dose has no effects in the muscle but robustly reduced skin H2O2 production and neutrophil skin infiltration.

      Fig. 2I-J: it is not clear how NF-kB activity was measured. Is that based on green fluorescence shown in Fig. 2J? if so, the representative images were not consistent with the quantification data shown in I. Similarly, many other representative images were also not consistent with their quantification data throughout the manuscript. For example, Fig. 3C/D, 3E/F, 3G/H, 3L/M, Figure S2C/D, S2G/H, Fig. 4C/D, 4J/K.

      The quantification of NFkB was measured in the skin, as it has already been reported previously (Candel et al., 2014). This is indicated in M&M section. The images show the whole larvae and NFkB is expressed at high levels in different tissues, such as neuromasts. To clarify this, we have included an additional figure to explain the ROI used for quantification of H2O2 and NfkB (Fig. S1G).

      Figure S1C, Nampta/Namptb protein expression should be checked and shown after its KO using crispr/cas9 technique.

      Unfortunately, we have used to different antibodies and both failed to crossreact with zebrafish Nampta/b. However, we have included the efficiency of CRISPR-Cas9 in Fig. S1F of the revised version. The efficiency is relatively low, probably indicating that is indispensable for zebrafish development, as occurs in mice (PMID 28333140).

      Fig. 3I: protein expression of nox1, nox4 and nox 5 should be checked after genetic inhibition using CRISPR/Cas9 technique.

      Unfortunately, we do not have antibodies able to recognize zebrafish Nox1, Nox4 and Nox5. However, we have provided the efficiency of the gRNA used for each gene (Fig. S3) and it is about 65%.

      Fig. 4: If Olaparib treatment increased DNA damage, will it increase PARP1 activation and PAR formation?

      As it has widely used in mammalian models, parthanatos is triggered by overactivation of PARP1 following DNA damage. Therefore, although inhibition of olaparib may further induces DNA damage, it blocks parthanatos. This is consistent with our results showing that olaparib reduces PARylation (Fig. S4H) and cell death (Figs. 4J, 4K).

      Fig. 4M: it is not clear what staining has been done. No difference was observed among different groups.

      As indicated in the figure legends, pγH2Ax+ (green) keratinocytes (red) are shown. We have indicated this in the figure and include arrows to show pγH2Ax+ cells. The quantitation of this experiment (Fig. 4L) show that FK-866 robustly reduced, while olaparib increases, keratinocyte DNA damage.

      Authors used N-phenylmaleimide (NP) to block AIF nuclear translocation. How does this inhibitor work? what is its actual effect on AIF nuclear translocation? Experiments are required to show this inhibitor actually blocks AIF nuclear translocation.

      NP has been shown to block AIFM1 nuclear translocation, since it inhibits cysteine proteases which are required for its cleavage which precedes nuclear translocation (PMID 8879205). Although we have shown that genetic inhibition of Aifm1 rescues skin inflammation, confirming the specificity of the inhibitor, we agree on this point. Therefore, we have performed additional experiments and showed nuclear Aifm1 in keratinocyte aggregates of Spint1-deficient larvae and that NP treatment blocked nuclear translocation (Fig. S6C). In addition, we have also shown increased nuclear translocation of AIFM1 in keratinocytes of lesional skin from psoriasis patients (Figs. 6C, 6D).

      Figure S4: it is hard to understand why lane #2 with Olaparib has the highest PAR signal.

      We are sorry for this mistake labeling the WB. The right legend is: 1 +/+, 2 -/- treated with DMSO, 3 -/- treated with FK-866 and 4 -/- treated with olaparib.

      Does spint1a-/- zebrafish show parthanatos cell death? It is not clear how cell death was measured.

      We have shown that skin keratinocytes from Spint1a-deficient fish show increased cell death, as assayed by TUNEL, that is fully reversed by olaparib (Figs. 4J, 4K). In addition, skin keratinocytes from the mutant fish also have increased PARylation that is reversed by either FK-866 or olaparib (Fig. S4G, S4H). Further, pharmacological and genetic inhibition of Aifm1 inhibition and forced expression of Parga also rescue skin inflammation. Finally, we have included new experiments showing Aifm1 nuclear translocation in both Spint1a-deficient larvae and psoriasis patient lesional skin. Therefore, all these results show that Spint1a-deficient fish show parthanatos cell death-induced inflammation.

      NAD+ levels were regulated by 3 different pathways. Expression of many genes involved in these 3 pathways were altered in psoriasis. However, it is not clear if the other two pathways play a role in PARP1-mediated inflammation.

      NAD+ salvage pathway has been shown to be the major pathway regulating NAD+ levels in most tissues. The inhibition of this pathway with FK-866 rescues all skin phenotypes observed in Spint1a-deficient larvae as well as in organotypic 3D skin models of psoriasis. These results were further validated using another inhibitor (GMX1778) and genetic inhibition. Therefore, our results support that the salvage pathway is the one involved in psoriasis and inhibition of this pathway would rescue inflammation. However, it will be worthy to investigate if other pathways play a role in psoriasis and specifically upon inhibition of the salvage pathway.

      **Minor comments:**

      1. Page 9: To test this hypothesis, we used N-phenylmaleimide (NP), a chemical inhibitor of Aifm1 translocation from the nucleus to the mitochondria (Susin et al., 1996). The statement is not correct.

      We are sorry for this mistake. It has been amended to: “To test this hypothesis, we used N-phenylmaleimide (NP), a chemical inhibitor of Aifm1 translocation from the mitochondria to the nucleus (Susin et al., 1996).”

      Page 12: To the best of our knowledge, this is the first study demonstrating the existence of parthanatos in vivo. This statement is not correct.

      We have removed this statement.

      Figure S3 and S6E: they should be presented in an easy understandable way for the general readers.

      We have explained in the legends the graph output of TIDE analysis.

      Figure legends should be presented in a clearer way.

      We have tried our best writing the legends. All suggestions and request were made.

      Reviewer #1 (Significance (Required)): Parthanatos is a new type of cell death distinct from apoptosis, necrosis, necroptosis and plays a pivotal role in ischemic stroke and neurodegenerative diseases (Wang Y et a., Science. 2016; Kam TI et al., Science 2018). The current study may provide new evidence of the importance of PARP1 and parthanatos in skin inflammation and potential targets for the treatment of skin inflammation. We thank the reviewer’s opinion on the significance of our study.

      The reviewer has the expertise in oxidative stress, PARP1 and parthanatos research. Reviewer #2 (Evidence, reproducibility and clarity (Required)): **Summary:** The manuscript entitle "NAMPT-derived NAD+ fuels PARP1 to promote skin inflammation through parthanatos" is well written, divided and organized. This work demonstrated that models of psoriasis are characterized by ROS stress, inflammation and cell death. It was clear that NAMPT, a rate-limiting enzyme of NAD salvage pathway, and PARP1, a Poly-ADP-ribose polymerase, could be targeted to decrease ROS stress and inflammation that are contributing to cell death through parthanatos pathway. However, it was not clear that NAD+ are the responsible for fuel these processes in the psoriasis models analyzed. Nevertheless, the present work demonstrated that the cell death observed in the psoriasis model analyzed was correlated to an unidentified programmed cell death pathway, parthanatos that up to date has not been demonstrated.

      We are pleased with the reviewer’s comments on our study.

      **Major comments:** Most of the data showed confirmed that inhibition of NAMPT or PARP1 seems to be beneficial for the relief of some characteristics related to oxidative stress and inflammation in the skin. However, the author should show data about NAD+ levels only instead of the ratio NAD+/NADH to state that NAMPT-derived NAD+ is promoting oxidative stress (line 366-368) (fig2K).

      The data shown in Fig 2K are NAD+ plus NADH. Considering that cytosolic and nuclear NAD+/NADH ratio typically ranges from 100 to 1000 (PMID: 21982715), these data mainly show intracellular NAD+ concentration in larvae.

      Some data images are not convincing, or they don't really show an increase or decrease as the author showed in graph data. (Fig1D, 1E - 1F,1G).

      The quantification of H2O2 and NFkB was measured in the skin, as it has already been reported previously (Candel et al., 2014). To clarify this, we have shown the ROI used for quantification of H2O2 and NfkB in Fig. S1G.

      What is the relevance to analyze muscle and what is the relevance of the results obtained, since the effect of FK-866 in muscle increases the NFKB activity?

      This is essentially a similar concern raised by reviewer 1. FK-866 induces NF-kB activation in the muscle and neutrophil infiltration in this tissue when used at 100 uM. This result may be deleted if the reviewers think it is confusing, since a 10 uM dose was used in all subsequent experiments to study the impact of Nampt in skin inflammation. This dose has no effects in the muscle but robustly reduced skin H2O2 production and neutrophil infiltration.

      Figure S4H is not convincing with what the author wrote.

      We are sorry for this mistake labeling the WB. The right legend is: 1 +/+, 2 -/- treated with DMSO, 3 -/- treated with FK-866 and 4 -/- treated with olaparib. Both FK-866 and olaparib rescue PARylation in the skin of Spint1a-deficient larvae.

      The author should make the keratinocyte aggregation experiment with FK-866 treatment to better substantiate what they are proposing.

      These results are shown in Figs. 2E and 2F.

      **Minor comments:** Line 281: "NP, a chemical inhibitor of Aifm1 translocation from the nucleus to the mitochondria..." should be the opposite: NP, a chemical inhibitor of Aifm1 translocation from mitochondria to nucleus.

      We are sorry for this mistake. It has been amended.

      Line 299 "figure 6A" should be Figure 6B.

      We have checked and it is correct.

      How the author explains the relationship between all the results being related to NAMPT and supposedly to NAD+, but an important precursor to make NAD through salvage pathway (NMN) and a well NAD+ booster didn't show any effect?

      This is an interesting point that was also raised by reviewer 1. A recent study has demonstrated that NAM supplementation increased zebrafish larval NAD+; however, NA, NMN and NR failed to boost larval NAD+ level (PMID: 32197067). This explains our results. We have discussed this point in the revised manuscript.

      Line 178: should be NAMPT inhibitor stead of FK-866 inhibitor.

      Thanks a lot. It has been amended.

      Line 191-192: I suggest reformulating this sentence since the result showed was only the ratio NAD/NADH.

      Please, see our response above. We are measuring NAD+ plus NADH. We have amended the text to clarify this fact.

      Reviewer #2 (Significance (Required)): The present work greatly demonstrated the relevance of PARP1 and NAMPT in the field of inflammation and ROS in the skin that contribute to diseases like psoriasis. Although it is not a lethal disease, as the author mentioned, it affects the physical and mental health of the individual. Understanding the mechanism that underlie this condition would help to trigger new and more efficient treatments. It was clear that the result showed a promising strategy in targeting NAMPT and PARP1. Furthermore, inhibitor for them is already know and may be useful for future treatment of psoriasis disease. We thank this comments on the impact of our study.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)): This study shows NAMPT derived NAD facilitates PARP activation to promote skin inflammation via parthanatos. The authors used the zebrafish model and organoid models of psoriasis and observed that inhibition of NAMPT reduces inflammation in zebrafish and human skin organoid models. They also observed that NADPH oxidase-derived oxidative stress activates PARP, and PARP inhibition or over-expression of PARG or AIF mimics protection mediated by NAMPT inhibition. This is an interesting study, but there are several weaknesses to support the conclusions of this study. While pharmacological inhibition is a powerful tool, complementary methods (knock out of PARP-1) are critical for this paper's conclusions. PARP inhibitor used in this study may not specifically inhibit PARP1 but other PARPs too. Therefore, genetic knockout of PARP will make the make this conclusions/interpretation of this study strong.

      We thank these comments on our manuscript. All pharmacological inhibitions used in this study were confirmed by genetic experiments, including Parp1. The genetic inhibition of Parp1 is shown in Figs. S4C-S4F.

      Additional comments include: This study's primary focus is PARP activation and PAR-mediated parthanatos, but it is not shown how different inhibitors used in this study and supplementations of NAD alter PARP activation and PAR formation.

      We have shown through the quantitation of PARylation that Spint1a-deficient skin shows increased PAR activity and that pharmacological inhibition of either Nampt or Parp was able to fully reverse it (Figs S4g & S4H). In addition, we have also shown a dramatically increased PAR activity in lesional skin biopsies from psoriasis patients (Fig. 6E).

      NAMPT is not the only NAD biosynthesis pathway; how other NAD pathways respond when NAMPT is inhibited with FK-866.

      NAD+ salvage pathway has been shown to be the major pathway regulating NAD+ levels in most tissues. The inhibition of this pathway with FK-866 rescues all skin phenotypes observed in Spint1a-deficient larvae as well as in organotypic 3D skin models of psoriasis. Therefore, our results support that the salvage pathway is the one involved in psoriasis and inhibition of this pathway would rescue inflammation. However, we agree that it will be worthy to investigate if other pathways play a role in psoriasis and specifically upon inhibition of the salvage pathway. However, this is out of the scope of this manuscript.

      PARG is used in this study, but the protein levels of PARG are not shown, and it is not clear whether the PARG overexpression is sufficient to reduce PAR levels in the models used. AIF pharmacological and genetic manipulation of AIF is used, but it is not shown that AIF translocates to the nucleus in this model.

      We agree on these points, so we have analyzed Aifm1 translocation in Spint1a-deficiet larvae and psoriasis patient lesional skin (please, see above our response to reviewer 1) and PARylation upon forced expression of Parga (Fig. 5M).

      Does NAMPT inhibition reduce NAPD oxidase activity?

      Our results indicate that Nampt inhibition reduce NAPDH oxidase activity, since a drastic reduction of H2O2 production was observed in the skin of Spint1a-deficient larvae treated with FK-866.

      PAR plots provided in fig S4 need quantification, and the blots (Fig S4 G&H) should be run on the same gel to make sure the exposure levels are the same. It is not clear which group is represented in lane 4 of Fig S4 G.

      We have provided the quantitation. The problem is that we mislabeled the legend of Fig. S4H. The right legend is: 1 +/+, 2 -/- treated with DMSO, 3 -/- treated with FK-866 and 4 -/- treated with olaparib. Therefore, either Nampt or Parp inhibition robustly reduces PARylation of Spint1a-deficient skin to the levels of their wild type counterparts.

      Reviewer #3 (Significance (Required)): This study in interesting potentially showing the role of PARP-1 activation and Parthanatos in skin inflammation. It could be very significant if above identified weaknesses are addressed.

      We are pleased with this reviewer’s assessment on the significance of our study.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for the positive assessment of our work and for the constructive comments that helped us to improve the quality of our manuscript. We have carefully considered each point and have addressed most by modifying the manuscript text to increase clarity of our work. Based on a suggestion by Reviewer 2 we have also included the results of a new experiment.

      In addition to addressing all comments of the reviewers, we have expanded the part of the study analysing the functionality of Caulobacter’s DnaA Nt in the heterologous host E. coli. Furthermore, we have replaced our original set of fluorescence data by a new data set that has been acquired using optimized measurement parameters (bottom read and 100 for the detector gain - see Material and Methods for details), which have improved the signal-to-noise ratio and the overall quality of the fluorescence profiles. Importantly, these new data do not change, but rather strengthen, our conclusions.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Felletti et al provide compelling new evidence that a CDS element in the dnaA mRNA is required for nutrient dependent translationol control. This provides a mechanisms by which dnaA translation is shut off during carbon starvation, and is supported by a rather rigorous analysis of the mRNA performed both in vitro and in vivo. Overall it was a pleasure to read and the data are generally very compelling. My specific comments are below:

      **Major Comments:**

      While the authors rule out differences in charging of different ala-tRNAs as controlling the nutrient dependent repression in translation, the authors assume that this must be due to the nascent sequence. However, could it also be possible that all ala-tRNA isoacceptors have lower charging after C-starvation?

      We thank the reviewer for raising this important point. As Reviewer 1 pointed out, we cannot conclusively exclude that carbon starvation could lead to reduced charging levels of all isoacceptor Ala-tRNAs. However, based on the available literature, we consider it unlikely. In a first work by Elf et al 2003 (confirmed later by Dittmar et al 2005 and Subramaniam et al 2014) the authors argued that under amino acid-limiting conditions the charging levels of the different isoacceptor tRNAs depend directly on their codon usage during translation. Importantly, in our work we could show that Nt mediates the inhibition of translation independent of the synonymous codon choice, suggesting that aa-tRNA levels are not limiting in our experimental conditions. To address this comment of Reviewer 1, we discussed this matter in a greater detail in the revised version of the manuscript (line 374-379).

      **Minor comments:**

      It was observed many years ago that tmRNA is required for the proper timing of DNA replication initiation in Caulobacter (Cheng and Keiler J Bact 2009). Since the AAI motif is appearing to alter translation elongation, it might be interesting to discuss the AAI motif may be linked to ribosome arrest and rescue.

      We appreciate this suggestion. Cheng and Keiler 2009 proposed an indirect involvement of the tmRNA in the transcriptional regulation of DnaA over Caulobacter’s cell cycle. In the revised version of the manuscript, we mention the tmRNA and ArfB protein as possible factors involved in ribosome rescue following Nt-induced ribosome stalling and we refer to Keiler et al 2000 and Feaga et al 2014.

      Line 49 - add "initiation"

      The word “initiation” was added to the text.

      Line 61 - is "cleared" meant to be proteolyzed or simply meaning to have a lower protein level?

      We apologize if we were not clear. We rephrased the text as follows: “[…] DnaA levels decrease at the onset of carbon starvation […]”.

      Line 92-93 - is this 5' UTR based on a previously defined TSS determined in their previous study?

      dnaA TSS has been first determined by primer extension (Zweiger and Shapiro 1994) and later by global 5’RACE (Schrader et al 2014 and Zhou et al 2015). In the new version of the manuscript, we include references to these previous studies (line 94).

      Line 115-118 - this is interesting, might this conserved 5' UTR be added to rfam?

      We thank the reviewer for this suggestion. We will submit our alignment to rfam after publication of the manuscript in a journal.

      Line 126-127, 131,189 - Is the 3nt sequence the authors found here considered a Shine-Dalgarno site? I would imagine that this would be too small to consider this. Perhaps calling it SD-like sequence might be more appropriate.

      We agree with this comment. In the new version of the manuscript, we refer to the identified 3-nucleotide sequence as a “SD-like sequence”.

      Lines 136-140, 208-210 - Would the authors consider this upstream site with a potential CUG start codon a standby site? It appears to fit many of the criteria which could be used to define one.

      According to our probing data, the mRNA region in proximity of the CUG start codon forms a very stable stem-loop structure. Based on our previous experience (especially the extensive work by the Wagner lab), typical ribosome standby sites only occur in largely unstructured regions. Furthermore, in Supplementary Fig. 4 we show that the deletion of stem P4 does not affect eGFP expression levels. For these reasons, we consider it unlikely that the putative CUG start codon is part of a ribosome standby site.

      Lines 253-255 - this is a beautiful experiment, but very hard to understand from the text. Perhaps add a sentence or two to explain it in more detail.

      We thank the reviewer for this comment. In the revised version of the manuscript, we provide a more detailed description of the dfsNt reporter mutant. We hope this will address the reviewer’s concerns.

      Line 307 - add "synonomous"

      The word “synonymous” was added in the revised version of the manuscript

      When dnaA is depleted, it was observed that the chromsome can be erroneously segregated by the ParA/B/S system (mera et al PNAS). Does this occur in C-starvation when DnaA levels drop?

      In a separate study we have also observed that in a fraction of DnaA depleted cells the origin of replication is erroneously translocated from the stalked to the swarmer cell pole. We have not studied this phenomenon under carbon starvation, as it lies outside the scope of this paper. However, if the ParA/B/S remains functional under carbon starvation, this might also happen in G1-arrested starved cells.

      Reviewer #1 (Significance (Required)):

      Appears to be quite significant to researchers studying regulation of bacterial cell cycle and translation. Since DnaA is conserved across bacteria, and this mechanism works in E. coli, it appears that the findings will likely be important in many bacterial systems.

      Referee Cross-commenting

      All the reviewer comments I read seem reasonable. Specifically, I found the point about E. coli 30S ribosomes is very important that the authors address. This could be done in writing, but should be better listed as a caveat to those experiments.

      As suggested by the reviewers, we have partially rephrased some parts of the text describing the toeprint results. Moreover, we have inserted in the main text and in Fig. 1 legend explicit references to the use of purified E. coli 30S subunits and tRNA-fMet. We believe these changes will address the reviewers’ concerns.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      **Summary:** The Jonas lab provide good evidence that they have found a new mechanism to regulate the amount of the DnaA protein by a starvation signal. The DnaA protein is the key chromosome replication initiator probably for most bacteria and as such DnaA is the target of many regulatory inputs. The authors created an accurate reporter system that allows them to dissect the 5' mRNA translated and untranslated sequences of dnaA and they have convincingly demonstrated that the N-terminal DnaA peptide sequence and not the RNA mediate the response to starvation by glucose exhaustion. This is potentially a model example for global translational responses in bacteria.

      **Major comments:**

      The main conclusion, i.e. that the DnaA leader peptide "Nt" mediates this response is convincing. However, there were 2 major problems that should be easily addressed. These do not subtract from the main conclusion.

      Problem 1

      E. coli 30S subunits were used in the "Toeprint" assay of Fig. 1. Obviously Caulobacter 30S Ribosome subunits should have been used, or a justification should be given. One remedy would be to make this supplementary information.

      We thank Reviewer 2 for this comment. We agree that it would be better to use Caulobacter 30S ribosome subunits in our toeprint experiments. However, because toeprint assays with E. coli 30S ribosome were already established in our lab (i.e. the Wagner lab, where the assays were performed) and because works by other groups have shown that E. coli 30S subunits can be used to study the translation of mRNAs from other bacteria, we decided to use this experimental set up. Based on our results, we also had no reason to doubt the suitability of the E. coli 30S subunits. The toeprint showed that translation starts at the in silico predicted translation start site, which was further confirmed by our in vivo mutagenesis experiments. For these reasons, we are confident that the toeprint assays indicate the true translational start site. However, we acknowledge that we could have been more explicit about the use of the purified E. coli 30S subunits and tRNA-fMet in toeprinting assay. To increase clarity and transparency, in this revised version of the manuscript, some parts of the main text were rephrased and references to the use of E. coli 30S and tRNA-fMet were introduced (including Fig. 1 legend). We hope that these changes will address the reviewer’s concerns.

      Problem 2

      The results in Fig. 6B could be due to the Nt simply making the hybrid protein more unstable in E. coli. This is the main impression given by the drop in signal. In this case, the conclusion would be wrong, and Nt is not transferring a starvation translation block from C. crescentus to E. coli. Nt is just making the protein unstable. These results should be treated as preliminary pending protein stability measurements. However, this defect does not subtract from the other main points and without the Fig. 6 E. coli experiments they still make a complete and interesting story. One remedy would be to make this also supplementary information.

      It is indeed striking that a drop of normalised fluorescence is observed for the 5’UTRdnaA-Nt construct in E. coli but not in Caulobacter. In order to address if this behavior can be explained by reduced protein stability, we have performed a translation shut-off assay using the 5’UTRdnaA-Nt E. coli reporter construct. The results of this experiment (shown in Supplementary Fig. 9A and described in line 327-329) show that the normalised fluorescence remains stable over 10 hours after chloramphenicol addition to the culture, ruling out that the presence of Nt significantly affects eGFP protein stability in E. coli. Importantly, this experiment also showed that in contrast to the chloramphenicol treated culture, in which the OD600 decreased after reaching stationary phase, the OD600 of the non-treated cultures slightly increased between 2 and 10 hours (Supplementary Fig. 9A). Because this increase was not observed in carbon starved Caulobacter cultures, we consider the different growth dynamics between E. coli and Caulobacter to be the most likely explanation for differences in eGFP accumulation at later time points during the experiment.

      To further strengthen our E. coli data, we have analysed additional relevant Nt mutants that we identified as most critical mutants in our Caulobacter experiments presented in Fig. 5, namely dfsNt, mutD1, mutD2, ΔAAI and AAI>DDK. Determination of Δt and Δf values for the E. coli strains carrying these different Nt constructs showed similar results as for the corresponding constructs in Caulobacter. Collectively, these new data further support the notion that Nt operates in E. coli through a conserved inhibitory mechanism of translation. These data are now included in a reorganized new version of Fig. 6 (panels A, B) as well as in Supplementary Fig. 9.

      **Minor comments:**

      There are also 6 minor issues that are easily addressed, most by small changes to the text, and these should improve this otherwise fine manuscript.

      Issue 1

      Line 88 Fig. 1A shows DnaA degradation upon entering stationary phase from a low glucose media and not a simple starvation response to one component like glucose. Did the authors consider trying simple washout experiments, i.e. resuspend the cells in glucose-free media? This would have the advantage of suddenly exposing the cells to starvation and thereby studying the sudden response rather than the slower lingering response which would be due to many factors and not just glucose removal.

      In a previous work from our lab (Leslie et al 2015), we have conclusively shown that the downregulation of DnaA synthesis depends primarily on the nutrient content of the growth medium.

      Besides being in continuity with our previous work, we think that the starvation protocol that we used in the present study, and that was also used by the Sean Crosson lab (Boutte et al. 2012), might better reproduce what happens in the natural environment when nutrient levels gradually decrease until becoming limiting for bacterial growth.

      Issue 2

      Reference 16 should be cited are the first publication to show that glucose and other starvations induce DnaA degradation in Caulobacter.

      We have added Reference 16 to the first sentence of the results section, in which we state that DnaA levels decrease when cells are shifted from a glucose-supplemented minimal medium to a glucose-limiting medium.

      Issue 3

      Fig. 1D shows that the TOEprint is not changed by adding the ribosome, very surprising considering its size and SD docking & alignment. 2 Minor bands then appear when the tRNA-Met is further added. These are presumably the "toeprints". A control with just the added tRNA-Met would make this result much more significant.

      In the existing literature, there is a common consensus to consider real toeprints (i.e., indicative of the presence of an assembled translation pre-initiation complex) as only those bands that appear faintly in the presence of the 30S ribosome subunit but that become clearly enhanced upon addition of the initiator tRNA-fMet. Some examples can be found in Hoekzema et al 2019, Romilly 2014, Romilly 2020. In cases when the translation start site is buried in a structural element, the intensity of the toeprint signal is further increased when the mRNA is rendered unfolded, as seen in our data.

      tRNA-30S-independent bands always show up in toeprint experiments, but their intensities differ with the sequence of the mRNA and sometimes the choice of RT used for primer extension. Addition of initiator tRNA-fMet alone is commonly not done in toeprint experiments (see references mentioned above). Finally, we want to point out again (see also our answer on “Problem 1”) that the toeprint data are very much consistent with our in silico predictions and our in vivo mutagenesis data. Therefore, we are confident that the observed toeprint upstream of the AUG corresponds to the true ribosome binding site.

      Issue 4

      Why does the cell OD drop, e.g. in Fig. 2, is it cell death from starvation?

      We don’t think that the slight reduction of OD600 observed in our experiments is due to cell death. Based on our knowledge, carbon starved cells remain viable up to 24 hours after the starvation onset. Instead, we have observed a cell volume reduction, which may at least partially explain the observed OD600 decrease.

      Issue 5

      Line 327 Discussion "This study reveals a new mechanism, by which some bacteria can regulate the synthesis of the replication initiator DnaA in response to nutrient availability by modulating the rate of translation." Rate of translation or rate of translation abortions (as implied in Fig. 6)?

      The rate of translation is the result of multiple contributions such as initiation, elongation, abortion and termination. Our data indicate that Nt is a regulator of DnaA translation elongation responding specifically to the nutritional state of the cell. Translation abortion could be one of the possible outcomes (but not the only one) of ribosome stalling. For these reasons, in the new version of the manuscript, we added the word “elongation” at the end of the sentence mentioned by Reviewer 2 (line 354).

      Issue 6

      It seems that that for most experiments with the eGFP the translation and protein decay components of the signal could have been easily uncoupled by running a parallel +chloramphenicol control. For example, this would simplify the interpretation of Fig. 6 where Nt eGFP stabilities are an issue and it is important to establish that comparable protein stability with and without the Nt peptide.

      To address the reviewer’s comment, we have now included a chloramphenicol control experiment (stability assay) performed with E. coli carrying the 5’UTRdnaA-Nt reporter construct (Supplementary Fig. 9A). Please, see the response above for more details. For the experiments with the Caulobacter 5’UTRdnaA-Nt reporter we show in Supplementary Fig. 7 that the Nt peptide has no destabilising effect on eGFP.

      Reviewer #2 (Significance (Required)):

      Caulobacter crescentus is a model bacterium that has provided many insights into bacterial physiology that are now exploited to understand many organisms. These present results may provide one such example. It is known that the first amino acids of translated peptides can influence increase or impede exit from the ribosome, so this is a potential translation-level regulatory point that might be used by many organisms. This manuscript gives a concrete and important example of such usage suggesting that it many be widespread. Therefore, this work should find a wide audience and it should stimulate research in many other systems.

      My lab also studies Caulobacter crescentus and we studied the same dnaA gene and protein including starvation responses. We at present do not have projects on dnaA but we do study other regulators and regulatory mechanisms of chromosome replication in Caulobacter crescentus.

    1. Author Response:

      Reviewer #1:

      This meta analysis addresses a double-edged sword in evolutionary biology. Group living may be beneficial for many reasons, but has costs in terms of increased rates of parasitism. Furthermore, if groups are highly related, parasites that are genetically able to infect on member of the group may be able to infect all of them, putting the entire group at risk. In the her presented meta analysis, many original studies working on questions related to parasitism, relatedness and group living are brought together in one unifying framework. The authors find that indeed, group living can facilitate the spread of infectious diseases. However, they also find that the negative effects of disease can be overcompensated by the benefits of being social. The authors stress that experimental studies are necessary to disentangle these effects. The study is of high standard and well-conducted. The take home message is clear and of general interest.

      The study highlights that experimental work is important to understand the relationship between parasitism, relatedness and living in groups. However, I missed an important aspect here. Experiments tend to stretch factors (sometimes to extremes), which may go square to the biology of the species. In some cases, this results in non-social organisms to be pressed in a group-environment. For example, the monoculture effect as we know it in agriculture is highly artificial. Clonal lines of crop are planted in high density, promising high yield, if pathogens stay out. These plants do not have a history of evolving mechanisms to deal with the effect of high relatedness. In contrast animals living in social groups, may never experience setting with non-relatives. Social insects evolved to deal with parasites by expressing specific adaptations, such a grooming, hygiene and social structure in the colony. Many social insects may never experience conditions of low relatedness. Thus, I expect it makes a difference if you experimentally force a non-social organism to be social, or a social organism to be asocial. I would be happy if this factor could be included in the reasoning, and maybe even analyzed quantitatively. For example, I would expect that non-social species made artificially to grow in groups of relatives, suffer much more from parasites than typical social animals with the same degree of relatedness.

      This is an important point. One of the main motivations for conducting this study was to test if species that typically live with kin have evolved adaptations to minimise any increase in susceptibility to pathogens brought about by living in groups with relatives. We therefore collected data on whether species are: a) typically social or non-social, and b) average levels of relatedness between individuals in groups under natural conditions (see Methods section ‘Data on species characteristics’).

      a) Testing differences between social and non-social species. All species included in our dataset had some part of their life-cycle where they were social (note we specifically excluded any studies on non-natural systems such as crops and domesticated species). This meant that only comparisons between species that are obligately social versus species that are social during specific life stages could made. This is problematic as assumptions need to be made about the strength of selection during different life cycle phases. For example, mortality caused by pathogens maybe particular high during the social juvenile phases of otherwise non-social species, resulting in selection for adaptions to reduce pathogen spread being similar to species that are obligately social. An additional problem was that experimental studies (a key factor highlighted by our analyses) of species that are non-social apart from specific life-cycle phases were rare (n=1, Rana latastei) precluding any meaningful comparisons.

      We have now added the following sentences to the methods to clarify this point:

      “We also collected data on whether species always lived in social groups (‘obligately social’) or whether species were only social during specific life stages (‘periodically social’). However, it was not possible to analyse this data as experimental manipulations of pathogens, a key factor influencing the relationship between relatedness and mortality and pathogen abundances, were only performed for one periodically social species (Rana latastei)” (Lines 425-430).

      b) Testing differences between species that typically live with kin and non-kin. The third aim of the paper was to test if species that typically live with kin have evolved to deal with pathogens as the referee suggests. We found that species that live with kin, such as social insects, have similar rates of mortality and pathogen abundances to species that live with non-kin (Figure 3). However, species that typically live with kin had lower rates of mortality in groups with higher relatedness when pathogens were absent compared to species that typically live with non-kin. This suggests that pathogens represent an omnipresent threat to all species, but that adaptations have evolved to reap the benefits of living with relatives in social species.

      In summary, as suggested by the referee we analysed whether “species made artificially to grow in groups of relatives, suffer much more from parasites than typical social animals with the same degree of relatedness” as much as was possible given the limitations of the published data. We have edited parts of the manuscript to emphasise that this was a key aim of the paper (Lines 66-74; 92-94; 136-153).

      The term (and concept) "monoculture" is typically used to describe clonal populations, predominantly in agricultural settings. I understand that the authors like to expand this term (as have others done before) to include social animals. However, for most people this would be a change in terminology and may cause misunderstandings. I would prefer if you could stick with the mainstream terminology and avoid pressing this concept into a new costume.

      We included the term “monoculture effect” to facilitate links to existing literature, both in the fields of agriculture and evolutionary biology (e.g. Ekroth at al 2019). While we think that making the reader aware of relevant work in other fields is valuable, we understand its prominence could give the impression that we included agricultural studies. Therefore, we have removed it from the abstract, but have chosen to keep one reference to the monoculture effect in the introduction.

      Reviewer #2:

      This study uses an unusually broad comparative data set to disentangle the positive (relatedness) and negative (pathogen pressure) effects of living in groups. The authors largely succeed in this task even though the data do not allow answers to all outstanding issues. Not unexpectedly, experimental manipulation studies appear to be most informative. The results are broadly consistent with expectations based on kin-selection theory and clarify the effects of a number of important covariables. The study is thoroughly executed and innovative in its approach. I expect this study to be interesting for a broad readership and this method of searching literature data to have considerable impact. Some suggestions strengthening this paper are below:

      • I think it would be helpful for readers to have the Discussion start with a few lines on what your study achieved in language that is complementary to the abstract, perhaps followed by a brief explanation of which angles/ambiguities/challenges you will be taking up in the paragraphs to follow.

      We have now edited the beginning of the discussion in accordance with this suggestion. It reads:

      “Our analyses show that pathogens can increase rates of mortality in groups of relatives. The detrimental effects of pathogens were, however, counteracted by high relatedness reducing mortality when pathogens were rare, particularly in species that live in kin groups. Such contrasting effects of relatedness meant that experimental manipulations were crucial for detecting the costs and benefits of living with relatives when the presence of pathogens varied. Additionally, high relatedness resulted in more even abundances of pathogens across groups, but more variable rates of mortality, highlighting the importance of population genetic structure in explaining the epidemiology of diseases. We discuss these findings in relation to the environments favouring the evolution of different social systems, the mechanisms that have evolved to prevent disease spread in social groups, and the types of study system where more experimental data are required” (Lines 171-181).

      • The rationale of this study is (often implicitly) that tendencies to live with relatives or not is a continuous variable. This surprised me because the senior author has written influential papers showing that family groups are different from non-family groups. In some contexts of this study it seems crucial to make that distinction. For example, a number of data points come from studies of social insects (bumblebees, honeybees, ants). Here, living with non-relatives is not an option but a given. It is well documented that these caste-differentiated colonies originated from ancestors that had exclusively full-sib colonies, so maximal relatedness was ancestral and became only diluted secondarily in some lineages. Would it be possible to check statistically whether the social insect data points always showed the same pattern as the other data points? That would test whether it matters that low relatedness is either derived or ancestral (as I think we implicitly assume to be the case in all other organisms).

      The primary studies included in our analyses were conducted on a diverse set of species where relatedness was often reported and measured on a continuous scale (range 0 to 1). Our rationale and statistical treatment of the data (the effect size of Pearson’s correlation coefficient captures continuous variation in relatedness) reflect the measures reported in the primary studies. This does not mean, however, that we believe groups evolve from along a continuum of within-group relatedness.

      As the referee points out there are two distinct routes to group formation that set the limits to relatedness within groups. In species, where offspring do not disperse from their natal patch (‘family’ groups) the opportunity for interacting with relatives is high, whereas in species where groups form after individuals disperse from natal patches (‘non-family’ groups) relatedness is typically low. Some variation in within-group relatedness subsequently arises within these two categories because of a number of modifying factors (breeder turnover, number of males and females founding groups, ‘budding’ dispersal and so on). However, the potential for kin selection to favour adaptations, including those that limit pathogen spread, remains fundamentally different between family and non-family groups. We tried to capture such differences by classifying species as typically living with kin and non-kin using life- history information (dispersal patterns, mating systems) and direct estimates of relatedness.

      We used the terms kin and non-kin rather than family and non-family because across such a diverse set of study species, with variable types of information (e.g. some species only had molecular genetic estimates of relatedness others had only life-history information), it was not possible to ascertain exactly how groups form for each species. Nevertheless, our analyses are aimed at addressing if species that typically live with kin, such as the social insects, have more effective mechanisms for reducing the impact of pathogens amongst relatives than species that live with non-kin.

      The referee makes an additional valuable point that for social insects ancestral levels of relatedness in groups are known to be high, with lower levels of relatedness being derived. Examining whether species with low versus high contemporary estimates of relatedness may therefore shed light on the importance of current versus past evolutionary responses to pathogens.

      Unfortunately, the sample sizes are just too limited to conduct any meaningful analyses. Only one species of social insect in our dataset was classified as living with non-kin (r <0.25). We also examined finer scale predictors of relatedness applicable to social insects (queen mating frequency: monogamous (r = 0.5) versus polyandrous (r > 0.25 & <0.5)). Sample sizes for crucial comparisons were again too small for formal analysis (Number of monogamous species with experimental data: pathogens present = 3, Pathogens absent = 3. Number of polyandrous species with experimental data: pathogens present = 2, Pathogens absent = 1).

      We have extended the discussion highlighting that more work on species with ancestral and derived levels of high and low levels of relatedness will aid our understanding of the evolutionary history of adaptations to minimise pathogen spread in groups (Lines 248-250). We have also checked and edited the manuscript to remove any implication that groups originate from a continuum of relatedness.

      • I wondered whether you could (interpretationally, i.e. in the discussion) do more with comparative data on pathogen pressure in the wild. The 1987 Hamilton chapter that you cite has lots of interesting natural history observations, which are now often supported by better data. I think he speculates about how altruistic soldiers evolved in aphids and thrips and connects their sociality with living in their own food (galls), which should mean low parasite pressure. The same is true for the lower termites. Would your results allow you to conjecture that all independent lineages that evolved differentiated castes (only possible in families with full siblings; or clones as in aphids) likely had to do that in disease free habitats?

      This is an interesting point and an area where further research would be very valuable. It fits in nicely with our current discussion of how the evolution of groups with high relatedness maybe more likely to occur in environments where pathogens are rare. This was rather vertebrate focused before and so we are grateful for the referee’s suggestion, which has broadened this point. The section now reads:

      “Parallel arguments have been made for social insects. Species with sterile worker castes, that only evolved in groups with high levels of relatedness, are thought to have arisen in environments protected from pathogens (Hamilton 1987). For example, sterile soldier castes have evolved at least six independent times in clonal groups of aphids, and the majority of these cases form galls that provide protection against pathogens (Hamilton, 1987; Stern and Foster, 1996). Escape from pathogens may therefore be a general feature governing the evolutionary origin, as well as the current ecological niches, of species living in highly related groups” (Lines 190-197).

      • I think some effort should be made to make Figures 2,3 and 4 easier to interpret. The ultra-brief acronyms along the y-axis take a while to digest and to realize the nestedness of the analyses. Could you give one piece of information on the left axis (spelled out like 'experimental data' and 'observational data' and the other piece on the right axis (spelled out as 'pathogens absent' and pathogens present'? It would also be helpful if the reader could fully understand the figures without first having to go through the entire method section, so I recommend you extend the legend to explain: 1. What Zr stands for. 2. What the directionality is (so the cryptic line just below Zr can become a proper sentence in the legend), and 3. The rationale of the multifactorial analyses with four or eight combinations (as you describe in the methods; I believe Figure 4 is an example of eight, but this remains rather hazy).

      Many thanks for these suggestions. We have now revised the axis labels and figure legends to improve interpretability.

    1. Reviewer #3 (Public Review): 

      Brochet et al. find that four species of the Lactobacillus Firm-5 lineage, one of the core bacterial lineages of the honey bee microbiome, are able to coexist because they utilize different pollen-derived flavonoids and sugars. They demonstrated this both in vivo, in gnotobiotic bees, and in vitro with laboratory co-cultures. Simple yet robust experiments involving diet or growth media with just simple sugars resulted in loss of diversity, whereas diets and media supplemented with pollen allowed the persistence of all four Firm-5 species over multiple serial passages. The authors then proceeded to examine the genes that were differentially expressed in response to different nutrient growth conditions, as well as the presence of metabolites to infer utilization of pollen-derived nutrients. The results paint a convincing picture of niche partitioning via differentiation in both encoded metabolic capabilities and in the differential expression of commonly encoded genes among co-resident bacterial species. 

      Overall, the paper is strong and the arguments and conclusions put forth are well supported by the data. I only have a few suggestions: 

      1) The study focuses on one strain each of the 4 Firm-5 species; however, there is diversity within each species. This is only briefly mentioned in the paper at the very end, and I think the authors should address this a bit more directly. In particular, they have previously generated a large amount of genomic data from some of these other strains, so it is likely possible to infer or speculate, based on this data, whether they expect different strains within each species to utilize similar nutrients. Also, I'm wondering if the authors can comment on how their findings could extend to the related bumble bee gut microbiome. Such a discussion would help enhance the applicability and importance of this study. 

      2) It is interesting that different species ended up dominating in the in vivo vs. in vitro simple sugar-based communities. What do the authors think may be behind this difference? 

      3) Since the observed coexistence of these gut microbes is largely due to nutritional niche partitioning, it would be helpful if the authors can comment on the natural variation of key pollen derived metabolites, and if/how we could expect ecological variation in the bee microbiome due to plant pollen availability based on biogeography and seasonality. 

      4) The supplementary information is nicely documented and accessible, but I think it would be even more useful if genome-wide data for the RNA-seq results, not just for select genes, are made available. Furthermore, I suggest including descriptive titles and labels within the supplementary Excel files, as there are many separate sheets and it is not always clear what each one shows.

    1. A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.

      His definition of a Memex is simply a mechanized (or what we would now call digitized) commonplace book, which has a long history in the literature of knowledge management.


      I'll note here that he's somehow still stuck on the mechanical engineering idea of mechanized. Despite the fact that he was the advisor to Claude Shannon, father of the digital revolution, he is still thinking in terms of mechanical pipes, levers, and fluids. He literally had Shannon building a computer out of pipes and fluid while he was a student at MIT.

    2. One cannot hope thus to equal the speed and flexibility with which the mind follows an associative trail, but it should be possible to beat the mind decisively in regard to the permanence and clarity of the items resurrected from storage.

      the idea of an "[[associative trail]]" here brings to mind both the ars memorativa and the method of loci as well as--even more specifically--the idea of songlines.

      Bush's version is the same thing simply renamed.

      <small><cite class='h-cite ht'> <span class='p-author h-card'>Jeremy Dean</span> in Via: ‘What I Really Want Is Someone Rolling Around in the Text’ - The New York Times (<time class='dt-published'>06/09/2021 14:50:00</time>)</cite></small>

    3. just as though he had the physical page before him.

      Strange that we also want to do more than the material is capable of, but we still want the sense of material interaction. Why?

    4. No human vocal chords entered into the procedure at any point;

      I thought he was going to get to benefits for health/medicine/disabilities but alas...it is all reviewed as a benefit for masculine and able-bodied intellectualism

    5. A record if it is to be useful to science, must be continuously extended, it must be stored, and above all it must be consulted.

      I'd disagree with this. Old and forgotten media have had great use; if they were not extended or needed that doesn't cancel out their former use (thinking of the work of Lisa Gitelman here)

    6. square-rigged ships.

      What a weird metaphor! A square-rigged ship—I think—was a type of vessel that had been improved over several hundreds of years and was commonly used in the nineteenth century. Bush seems to be using it to represent something that was formerly considered a hallmark of civilization (a tool of conquest, nationalism, exploration), but was outdated in a 20thC technological environment

    7. The investigator is staggered by the findings and conclusions of thousands of other workers—conclusions which he cannot find time to grasp, much less to remember, as they appear.

      This astonishment at the "findings and conclusions of thousands of other workers" seems connected to our "information overload" not only in the amount of information available, but in how we relate to it. There is so much out there; is this useful in de-centering individual intellectual authority, or harmful in making all discourse relative?

    1. Author Response:

      Reviewer #1:

      In this paper, authors did a fine job of combining phylogenetics and molecular methods to demonstrate the parallel evolution across vRNA segments in two seasonal influenza A virus subtypes. They first estimated phylogenetic relationships between vRNA segments using Robinson-Foulds distance and identified the possibility of parallel evolution of RNA-RNA interactions driving the genomic assembly. This is indeed an interesting mechanism in addition to the traditional role for proteins for the same. Subsequently, they used molecular biology to validate such RNA-RNA driven interaction by demonstrating co-localization of vRNA segments in infected cells. They also showed that the parallel evolution between vRNA segments might vary across subtypes and virus lineages isolated from distinct host origins. Overall, I find this to be excellent work with major implications for genome evolution of infectious viruses; emergence of new strains with altered genome combination.

      Comments:

      I am wondering if leaving out sequences (not resolving well) in the phylogenic analysis interferes with the true picture of the proposed associations. What if they reflect the evolutionary intermediates, with important implications for the pathogen evolution which is lost in the analyses?

      We fully appreciate this concern and have explored this extensively. One principle assumption underlying the approach we outline in this manuscript is that the trees analyzed are robust and well- resolved. We use tree similarity as a correlate for relationships between genomic segments, so the trees must be robust enough to support our claims, as we have clarified in lines 128-131. We initially set out to examine a broader range of viral isolates in each set of trees, but larger trees containing more isolates consistently failed to be supported by bootstrapping. Bootstrapping is by far the most widely used methodology for demonstrating support for tree nodes. We provided the closest possible example to the trees presented in this manuscript for comparison. We took all 84 H3N2 strains from 2005-2014 analyzed in replicate trees 1-7 and collapsed these sequences into one tree for each vRNA segment. Figure X-A, specifically provided for the reviewers, illustrates the resultant collapsed PB2 tree, with bootstrap values of 70 or higher shown in red and individual strains coded by cluster and replicate. As expected, the majority of internal nodes on such a tree are largely unsupported by bootstrapping, indicating that relaxing our constraint of 97% sequence identity increases the uncertainty in our trees.

      Because we agree with Reviewers #1 and #3 on the critical importance of validating our approach, we determined the distances between these new collapsed trees using a complementary approach, Clustering Information Distances (CID), that is independent of tree size (Supplemental Figure 4B and Figure X-B & X-C). Larger trees containing all sequences yielded pairwise vRNA relationships that are largely similar to those we report in the manuscript (R2 = 0.6408; P = 3.1E-07; Figure X-B vs. X-C), including higher tree similarity between PB2 and NA over NS. This observation strengthens the rationale to focus on these segments for molecular validation and correlate parallel evolution to intracellular localization in our manuscript (Figure 7). However, tree distances are generally higher in Figure X-C than in Figure X-B, which we might expect if poorly supported nodes in larger trees artificially inflate phylogenetic signal. Given the overall similarity between Figures X-B and X-C, both methods yield largely comparable results. We ultimately relied upon the more robust replicate trees with stronger bootstrap support.

      Lines 50-51: Can you please elaborate? I think this might be useful for the reader to better understand the context. Also, a brief description on functional association between different known fragments might instigate curiosity among the readers from the very beginning. At present, it largely caters to people already familiar with the biology of influenza virus.

      We have added additional information to reflect the complexity of intersegmental interactions and the current standing of the field (lines 49-52).

      Lines 95-96 Were these strains all swine-origin? More details on these lineages will be useful for the readers.

      We have clarified that all strains analyzed were isolated from humans, but were of different lineages (lines 115-120).

      Lines 128-132: I think it will be nice to talk about these hypotheses well in advance, may be in the Introduction, with more functional details of viral segments.

      We incorporated our hypotheses regarding tree similarity into the existing discussion of epistasis in the Introduction (lines 74-75 and 89-106).

      Lines 134-136: Please rephrase this sentence to make it more direct and explain the why. E.g. "... parallel evolution between PB1 and HA is likely to be weaker than that of PB1 and PA".

      The text has been modified (lines 165-168).

      Lines 222-223: Please include a set of hypotheses to explain you results? Please add a perspective in the discussion on how this contribute might to the pandemic potential of H1N!?.

      We have added in our interpretation of the results (lines 259-264) and expanded upon this in the Discussion (lines 418-422).

      Lines 287-288: I am wondering how likely is this to be true for H1N1.

      We have expanded on this in the Discussion (lines 409-410).

      Reviewer #2:

      The influenza A genome is made up of eight viral RNAs. Despite being segmented, many of these RNAs are known to evolve in parallel, presumably due to similar selection pressures, and influence each other's evolution. The viral protein-protein interactions have been found to be the mechanism driving the genomic evolution. Employing a range of phylogenetic and molecular methods, Jones et al. investigated the evolution of the seasonal Influenza A virus genomic segments. They found the evolutionary relationships between different RNAs varied between two subtypes, namely H1N1 and H3N2. The evolutionary relationships in case of H1N1 were also temporally more diverse than H3N2. They also reported molecular evidence that indicated the presence of RNA-RNA interaction driving the genomic coevolution, in addition to the protein interactions. These results do not only provide additional support for presence of parallel evolution and genetic interactions in Influenza A genome and but also advances the current knowledge of the field by providing novel evidence in support of RNA-RNA interactions as a driver of the genomic evolution. This work is an excellent example of hypothesis-driven scientific investigation.

      The communication of the science could be improved, particularly for viral evolutionary biologists who study emergent evolutionary patterns but do not specialise in the underlying molecular mechanisms. The improvement can be easily achieved by explaining jargon (e.g., deconvolution) and methodological logics that are not immediately clear to a non-specialist.

      We have clarified or eliminated jargon wherever possible throughout the text.

      The introduction section could be better structured. The crux of this study is the parallel molecular evolution in influenza genome segments and interactions (epistasis). The authors spent the majority of the introduction section leading to those two topics and then treated them summarily. This structure, in my opinion, is diluting the story. Instead, introducing the two topics in detail at the beginning (right after introducing the system) then discussing their links to reassortments, viral emergence etc. could be a more informative, easily understandable and focused structure. The authors also failed to clearly state all the hypotheses and predictions (e.g., regarding intracellular colocalisation) near the end of the introduction.

      We restructured the Introduction with more background on genomic assembly in influenza viruses, as requested by two reviewers (lines 43-52), more discussion of epistasis (lines 58-63) and provided a more thorough discussion of all hypotheses (lines 74-77, 88-92, 94-95, 97-106).

      The authors used Robinson-Foulds (RF) metric to quantify topological distance between phylogenetic trees-a key variable of the study. But they did not justify using the metric despite its well-known drawbacks including lack of biological rational and lack of robustness, and particularly when more robust measures, such as generalised RF, are available.

      We agree that RF has drawbacks. To address this, we performed a companion analysis using the Clustering Information Distance (CID) recently described by Smith, 2020. The mean CID can be found in Figure S4, the standard error of the mean in Figure S5, and networks depicting overall relationships between segments by CID in Figure S7E-S7H. To better assess how well RF and CID correlate with each other across influenza virus subtypes and lineages, we reanalyzed all data from both sets of distance measures by linear regression (Figure 3B, 4B-C, 5B, S6 and S9). Our results from both methods are highly comparable, which we believe strengthens our conclusions. Both analyses are included in the resubmission (lines 86-89; 162; 164; 187-188; 199-200; 207-208; 231-234; 242-244; 466-470).

      Figure 1 of the paper is extremely helpful to understand the large number of methods and links between them. But it could be more useful if the authors could clearly state the goal of each step and also included the molecular methods in it. That would have connected all the hypotheses in the introduction to all the results neatly. I found a good example of such a schematic in a paper that the authors have cited (Fig. 1 of Escalera-Zamudio et al. 2020, Nature communications). Also this methodological scheme needs to be cited in the methods section.

      We provided the molecular methods in a schematic in Figure 1D and the figure is cited in the Methods (lines 310; 440; 442; 456; 501).

      Finally, I found the methods section to be difficult to navigate, not because it lacked any detail. The authors have been excellent in providing a considerable amount of methodological details. The difficulty arose due to the lack of a chronological structure. Ideally, the methods should be grouped under research aims (for example, Data mining and subsampling, analysis of phylogenetic concordance between genomic segments, identifying RNA-RNA interactions etc.), which will clearly link methods to specific results in one hand and the hypotheses, in the other. This structure would make the article more accessible, for a general audience in particular. The results section appeared to achieve this goal and thus often repeat or explain methodological detail, which ideally should have been restricted to the methods section.

      We organized the Methods section by research aims as suggested. However, some discussion of the methods were retained in the Results section to ensure that the manuscript is accessible to audiences without formal training in phylogenetics.

      Reviewer #3:

      The authors sought to show how the segments of influenza viruses co-evolve in different lineages. They use phylogenetic analysis of a subset of the complete genomes of H3N2 or the two H1N1 lineages (pre and post 2009), and use a method - Robinson-Foulds distance analysis - to determine the relationships between the evolutionary patterns of each segment, and find some that are non-random.

      1) The phylogenetic analysis used leaves out sequences that do not resolve well in the phylogenic analysis, with the goal of achieving higher bootstrap values. It is difficult to understand how that gives the most accurate picture of the associations - those sequences represent real evolutionary intermediates, and their inclusion should not alter the relationships between the more distantly related sequences. It seems that this creates an incomplete picture that artificially emphasizes differences among the clades for each segment analyzed?

      Reviewer #1 raised the same concern. Please refer to our response at the beginning of this letter where we address this issue in depth.

      2) It is not clear what the significance is of finding that sequences that share branching patterns in the phylogeny, and how that informs our understanding of the likelihood of genetic segments having some functional connection. What mechanism is being suggested - is this a proxy for the gene segments having been present in the same viruses - thereby revealing the favored gene segment combinations? Is there some association suggested between the RNA sequences of the different segments? The frequently evoked HA:NA associations may not be a directly relevant model as those are thought to relate to the balance of sialic acid binding and cleavage associated with mutations focused around the receptor binding site and active site, length of NA stalk, and the HA stalk - does that show up in the overall phylogeny of the HA and NA segments? Is there co-evolution of the polymerase gene segments, or has that been revealed in previous studies, as is suggested?

      We clarified our working hypotheses in the Introduction (lines 89-106) and what is known about the polymerase subunits (lines 92-93). Our data do suggest that polymerase subunits share similar evolutionary trajectories that are more driven by protein than RNA (lines 291-293; Figure 2A and 6). The point about epistasis between HA and NA arising from indirect interactions is entirely fair, but these studies are nonetheless the basis for our own work. We have clarified the distinction between these prior studies and our own in the text (lines 60-63 and 74-75). Moreover, our protein trees built from HA and NA recapitulate what has been shown previously, which we highlight in the text (lines 293-296; Figure 6 and Figure S10). We also clarified our interpretation of tree similarity throughout the text (lines 165-168; 190-191; 261-264; 323-326; 419-423).

      The mechanisms underlying the genomic segment associations described here are not clear. By definition they would be related to the evolution of the entire RNA segment sequence, since that is being analyzed - (1) is this because of a shared function (seems unlikely but perhaps pointing to a new activity), or is it (2) because of some RNA sequence-associated function (inter-segment hybridization, common association of RNA with some cellular or viral protein)? (3) Related to specific functions in RNA packaging - please tell us whether the current RNA packaging models inform about a possible process. Is there a known packaging assembly process based on RNA sequences, where the association leads to co-transport and packaging - in that case the co-evolution should be more strongly seen in the region involved in that function and not elsewhere? The apparent increased association in the cytoplasm of the subset of genes examined for the single virus looks mainly in the cytoplasm close to the nucleus - suggesting function (2) and/or (3)?.

      It is difficult to figure out how the data found correlates with the known data on reassortment efficiency or mechanisms of systems for RNA segment selection for packaging or transport - if that is not obvious, maybe you can suggest processes that might be involved.

      We provided more context on genomic packaging in the Introduction, including the current model in which direct RNA interactions are thought to drive genomic assembly (lines 43-53). Although genomic segments are bound by viral nucleoprotein (NP), accurate genomic assembly is theorized to be a result of intersegment hybridization rather than driven by viral or cellular protein. We further clarified our hypotheses regarding the colocalization data in the Results section to make the proposed mechanism clearer (lines 313-326).

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      The authors wish to thank all three Reviewers for their appreciative comments regarding our ECPT and for very useful suggestions. Response to all points raised are presented below, we hope that the responses and new experiments proposed in the following pages will fully address remaining concerns.

      Reviewer’s comments to the BiorXiv paper by Chesnais et al, 2021

      “High content Image Analysis to study phenotypic heterogeneity in endothelial cell monolayers”

      https://www.biorxiv.org/content/10.1101/2020.11.17.362277v3


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The authors highlight the importance of endothelial heterogeneity using endothelial cells from different tissues. They examined aortic and pulmonary endothelium as well as HUVECs. They cultured the cells in identical conditions and also stimulated them with a physiological concentration of vascular endothelial growth factor as well high concentrations as would be found in cancers. They developed a profiling tool that allowed analysis of individual endothelial cells within a monolayer and quantification of inter-endothelial junctions, Notch activation, proliferation and other features.

      **Major comments**

      1. It would be useful to apply this technology one step beyond two-dimensional culture, to use vessels opened up longitudinally so that one can see the monolayer of endothelial cells and assess whether it is relevant in primary material in situ. I think this would be a major utility of the whole approach.

      R: We thank this reviewer for the suggestion. In vivo analysis is not in the objectives of the paper. However, we propose to perform “En face” staining of murine blood vessels following the protocol in the reference below. We will perform stainings for murine CDH5 (VE-Cadherin), NOTCH1 intracellular domain, HES1 and DNA which parallel that used in vitro on human EC. We will then apply our revised ECPT workflow and present data in a new Figure.

      En Face Preparation of Mouse Blood Vessels. Ko KA, Fujiwara K, Krishnan S, Abe JI. J Vis Exp. 2017 May 19;(123):55460. doi: 10.3791/55460. PMID: 28570508

      2. There are some very nice images here but disappointed not see a field that could show staining and markers for several of the target proteins and thus show the heterogeneity and randomness or organisation of the endothelial cells.

      R: We thank the reviewer for the appreciative comment. We propose to include representative microphotographs to illustrate the heterogeneity of different EC monolayers in the revised version of the manuscript. Furthermore, to further illustrate these aspects we will also include spatial correlation maps of cells and features measured with ECPT as explained below.





      3.

      • The Notch signalling is an important aspect of this work, particularly evidence of lateral inhibition would have been of value. For example, one might expect cells adjacent to each other to have alternating high and low NICD. R: We thank the reviewers for the suggestion. To address this, we are currently developing a new module to perform spatial autocorrelation analysis based on cell maps built using ECPT. In particular we have developed a new module to export cell maps as spatial objects in R which can be then analysed using the adespatial R package and provide correlation metrics such as the Moran’s autocorrelation index (see reference below). The index works with continuous data, removing the need to establish arbitrary thresholds and thus provides formal metrics to demonstrate heterogeneity in EC monolayers. We have derived this index as an example of such analysis for synthetic data and for one ECPT cell map as shown below.

      Figure 1: Moran’s spatial autocorrelation analysis using R and adespatial package. Moran’s index has values between –1 and 1. If adjacent cells had a consistent tendency to acquire alternate high and low NICD values, the corresponding bivariate Moran’s index would have an I+ value ~ 0 and an I- value approaching -1. In the example cell map both I+ and I- have relatively small absolute values and large p values which suggest a random cell distribution. The analysis was performed on synthetic data and ECPT derived data (HUVEC at baseline).

      • *

      Community ecology in the age of multivariate multiscale spatial analysis

      S Dray et al, Ecological Monographs, 2012. doi:10.1890/11-1183.1

      • NICD staining alone does score the extent of the signalling because of many factors that can influence the transport of the cleaved NICD. Really a marker of Notch signalling downstream e.g. HES or HEY family, DLL4 fis needed to give more information about this critical aspect. R): We thank the reviewer for the suggestion. We are currently performing HES1 staining (with no Pha staining) along with a new NICD mAb (see below). Preliminary qualitative data (Fig 2) show that HES1 staining also reveals single cell heterogeneity of NOTCH activation in the same monolayer. We will include ECPT analysis of HES1 and correlation with NICD and other features as suggested. We will reformat the current Fig 5 to include HES1 analysis and improved metrics of NOTCH pathway activation including spatial analysis (point 3 above).

      Figure 2: HES1 immunostaining on HUVEC (Image enhanced for visualisations). Cell nuclei labelled as 1, 2 and 3 have raw mean grey values of HES1 signal equal to 2271, 11210 and 48261 (C2/C1 and C3/C2 >4 folds).




      I really do not think that in Figure 5 it is justified to have a red line drawn through the cloud of points. The correlation coefficient is so low that this is meaningless. The failure to distinguish a P value from biological relevant is worrying. Much better comparison would have been between NICD staining and a downstream gene regulated by notch.

      R: We appreciate the reviewer’s concerns and are presenting our analyses of NOTCH activation using new immunostainings (HES1) and robust metrics for NOTCH activation as discussed above. We will therefore remove the mentioned corelation plots from the reviewed version of the manuscript.

      It is important to know that the antibodies used for staining have be validated by the investigators. They would need to show a single band on Western blots or be able to block staining on immunohistochemistry. We all know the manufacturers can be unreliable and use high concentrations of proteins for Western blots. These should be added as a supplementary figure.

      R: While the paper was under revision the AB8925 (NICD, Abcam) has been retracted from the market. To address this major concern, we have decided to acquire a different antibody targeting the intracellular portion of NOTCH receptor and validated its specificity by western blot. Fig 3 below, show western blots demonstrating a clean band at ~98 Kd as expected for cleaved NOTCH1 intracellular domain (NICD).

      We are currently repeating the whole experiment presented in the current version of the manuscript and the ECPT analysis using the new antibody and including HES1 one of the canonical NOTCH target genes as also suggested by this and other Reviewers. We will provide WB analysis of all antibodies used in the paper in a supplementary figure in the revised manuscript.

      Figure 1, WB analysis (NOTCH1 intracellular domain, AB52627, Abcam). of HUVEC (lanes 2,3), HAoEC (lanes 4,5) and HPMEC (lanes 6,7)

      Reviewer #1 (Significance (Required)):

      This represents a valuable and thorough methodology likely to be highly useful to many groups and show new insights into endothelial biology.

      Wide audience, cancer, cardiology, vascular disease-covid.

      My expertise >100 papers on angiogenis in cancer, basic mechanism, therapy models, bioinformatics IHC, patients, clinical trial. H score 190 Google Scholar

      R: We thank Reviewer One for their very appreciative comments and we hope that the proposed revisions will fully address remaining concerns.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The manuscript by Chesnais et al reports development of workflow for analysis of cultured endothelial cells , which they call Endothelial Cell Profiling tool (ECPT). Using ECPT they analyse several parameters in three different endothelial cell types (HuAEC, HUVEC and HPMEC), such as cell morphology, activation of cytoskeleton, VE-cadherin junctions, cell proliferation and Notch activation, under steady conditions and upon treatment with VEGF. The analysis allows to observe some predicted changes, such as increase in cell cycle and junctional activation in cells treated with VEGF-A, and such changes are highly heterogeneous. Overall, this is a potentially useful albeit not revolutionary tool for batch analysis of cultured endothelial cell phenotypes.

      I have the following comments:

      1. To make their case the authors should provide a comparison with other currently used approaches for EC phenotypic analysis in vitro - what is the advantage of using ECPT? The authors repeatedly use the term "single-cell level of analysis ", but this is in fact the case of any IF based analysis of cultured cells.

      R: We thank the reviewer for the suggestions. Indeed, several tools for imaging based single cell phenotyping are available. However, ECPT represents an improvement under several aspects. First, it allows improved segmentation of difficult-to-segment and heterogeneous cells; second, ECPT allows multi-parametric analysis on large image datasets in a semi-automated and structured way facilitating downstream data analysis; third, ECPT is open source.

      Furthermore, ECPT is a very flexible workflow including tools which facilitate and automate several tasks such as systematic images re-labelling and grouping. We will now draft a table including a complete list of features and improvements in comparison to other available tools and include it in revised manuscript in appendix1 and include analyses which are not implemented in any currently available software such as spatial autocorrelation.

      I strongly recommend to stain HPMECs for PROX1, these cells are frequently 100% lymphatic endothelial cells. In this case the authors compare different lineages and not blood endothelial cells from different locations.

      R: We thank the reviewer for the suggestion. We will address this with a new characterisation as supplementary figure in the revised manuscript. We are currently performing a qRT-PCR screening of several EC marker including arterious, venous and lymphatic markers (e.g., CXCR4, Tie2, CDH5, PROX1, LYVE1 as well as baseline NOTCH1 and Dll4 and downstream genes such as HES1 and HEY2.

      Please provide evidence for specificity of NICD antibody.

      R: We thank this reviewer for the suggestion. Please see response to Reviewer one point 5.

      Figure 1: HPMEC picture appears out of focus

      R: We thank this reviewer for noticing, we will now include a clearer picture in revised version of the manuscript.

      Figure 3 A - it is not entirely clear what is the difference between activated and stressed phenotype, they look quite similar.


      R: We will clarify the definitions of cell activation in revised version of the manuscript and present this analysis as supplementary material to demonstrate the flexibility of our ECPT rather than in main figures. We have removed Pha staining from the new experiments we are performing to allow HES1 staining and address NOTCH signalling in more details. The assessment of Pha and stress fibres in previous experiments will be moved to supplementary material. The classification is based on PhA staining using CPA classifier which was trained to distinguish among the two by the presence of stress fibres. The general rule to place cells in the stressed category during training of the CPA model was the observation of stress fibres crossing the nucleus while cells with peripheral bundles of actin were placed in the activated category.


      Figure 5 - what is the difference in NICD localization between "high" and "On" conditions?

      R:

      Since it has been noted by this and other reviewers that this classification might be difficult to interpret and in fact, the established thresholds are somehow arbitrary, we will completely revise the way we present analysis of NOTCH activation data including downstream analysis and more formal metrics of spatial correlation and extent of activation eliminating the need to impose thresholds (also see response to Reviewer one point 3).

      Since the authors make a correlation between Notch activity and junctional stabilization, it would be important to confirm this by other means, such as analysis of Notch target genes.

      R: We thank this reviewer for the comment which resonate with this and other Reviewers’ comments. We will include HES1 analysis in the revised manuscript, please see Response to point 6 and reviewer’s one point 3 above.

      • *

      **Technical and minor**

      1. Methods mentions HDMECs (human dermal microvascular endothelial cells) but the authors discuss HPMEC throughout the text 2. Please add scale bars on all microscopy pictures. 3. Please provide the information on what isoform of VEGF-A was used for stimulation and the rationale for selecting the concentration.

      R: We thank this reviewer for flagging these imprecisions and we will fix them in revised version of the manuscript.

      Reviewer #2 (Significance (Required)):

      The authors provide a workflow for the phenotypic analysis of cultured cells. Such tool is potentially useful, although the examples the authors show do not reveal striking examples of why such analysis is better in comparison to existing approaches. My guess is that the analysis may be faster and less tedious, once the training sets are generated, but this is not specified. My speciality is endothelial cells biology.

      R: We thank this reviewer for their very useful and appreciative comments. As mentioned above we will expand appendix 1 to fully explain potential and utility of our ECPT and review the main text to clearly highlight main advantages.** We hope that our plan for revision will fully address remaining concerns.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      **SUMMARY:**

      The manuscript by Chesnais et al. presents a novel endothelial cell (EC) profiling tool (ECPT) which provides spatial and phenotypic information from individual ECs, and was tested with a variety of specialized EC subtypes (arterial, venous, microvascular). They present a high throughput immunostaining and imaging-based platform using culture of human ECs on 96-well plates and capture of fixed, stained samples on a Perkin Elmer Operetta CLS system. The authors report the use of this ECPT tool to investigate EC phenotypes from human umbilical vein ECs (HUVEC), human aortic ECs (HAoEC) and human pulmonary microvascular (HPMEC) in relation to 50 ng/mL VEGF stimulation for 48 hours, and the general parameters of proliferation, Notch activation and stress fiber rearrangement (F-actin), and present this as a prospective platform to examine differences in EC phenotypes and responses at a more individualized level.

      **MAJOR COMMENTS:**

      1. Fundamentally, the advantage of single cell technologies is the ability to segregate populations to make novel observations. One area that would be of interest to explore in this manuscript using this ECPT platform would be reporting the results from single cell analysis that is then subsequently pooled within a sub-population, rather than sub-stratifying populations to reflect the multiple phenotypes that may be present within a single "confluent" well. With analysis of EC heterogeneity, it would be of interest to differentiate heterogeneity within EC subtypes at the culture/treatment conditions presented, and heterogeneity between EC subtypes.

      R: We thank this reviewer for the suggestions, we believe that the new approach to evaluate heterogeneity through spatial autocorrelation can provide a much better and clearer picture of this aspect (see responses to Reviewers One point 3 and Two points 6 and 7. Furthermore, we are currently restructuring the ECPT data structure to a more intuitive layout (list of lists rather than a single huge data frame) without affecting downstream data presentation. We will also update our Shiny App to enable the user to perform analyses on data subsets of interest without any R coding, we will present examples and walkthrough of this approach in appendix.

      2.

      The term "stable IEJ" is used and refers to 48h after seeding 40,000 cells on a 96-well plate, but it is unclear how the authors defined or demonstrated a "stable" junction. In previous reports, longer-term culturing of EC monolayers well beyond the point of confluence has been shown to result in junctional complex rearrangement (Andriopoulou P et al. Arterioscler Thromb Vasc Biol. 1999; reviewed in Bazzoni G & Dejana E. Physiol Rev. 2004). To this point, the fact that the different EC subtypes investigated had different percentages of "quiescent cells" suggests that the monolayers were not completely quiescent. The statement that the IEJ classification is "an immediate index of EC activation in contrast to quiescence" should be further supported by references or data. The definition of quiescent EC as simply non-proliferating, non-migrating is somewhat reductionist, and oversimplifies EC states. The authors state that HAoEC and HUVEC "...appeared more active...", but it is unclear what "active" means, and whether this may simply reflect that these cells had not yet reached confluence or quiescence in the 48h total culture time. As well, it is unclear how "migratory phenotypes" could occur in confluent monolayers. It would be helpful to see the data for these observations. If leaving ECs longer in culture, are the authors able to achieve a higher percentage of quiescent cells?


      R: We thank this reviewer for the very insightful comments and for suggesting the references. Indeed, we considered these aspects carefully. Regarding cell culture density and confluency, we previously tested seeding densities of 30000-60000 cell/well of 96 well plates (0.32 cm2, ~95000-190000cells/cm2) and we selected 40000 as the maximum seeding density allowing adhesion of >99% of cells. For HUVEC, a seeding density of 40000 cells/well (125000 cells/cm2) produced a high-density culture immediately after seeding (close to what reported for long-confluent cultures in Andriopoulou P et al, ATVB 1999, 140000 cells/cm2). We allowed further 48h culture aiming to achieve junctional “stabilisation” and “maximal” cell density. For consistency, we also seeded 40000 HAoEC and HPMEC per well in all our experiments, however both cell types are significantly larger than HUVEC (Fig 4). For all cells cultures we used EGM2 medium which has few differences with that reported in Andriopoulou P et al, namely, absence of antibiotics and antimycotics and use of defined cocktail of recombinant growth factors instead of Endothelial Cells Growth Supplement. In the past we compared HUVEC cultured in EGM2 and supplemented M199 medium and in our experience EGM2 promotes higher proliferation rates in sub-confluent cultures but similar morphology upon confluency. Is notable that several other factors (including flow, matrix, perivascular cells) are absent in our culture conditions and therefore the homeostatic balance found in vivo might not be fully achievable under our experimental conditions. However, we argue that the described culture conditions should be sufficient to reach a bona fide relatively quiescent EC phenotype in culture.

      Save these considerations, we agree with this reviewer that providing examples of longer-term cultures would help substantiating our findings and further validate the ECPT approach. We will perform a supplementary experiment to evaluate this aspect by comparing 48h cultures with longer culture times (72h and 96h). Furthermore, we will expand the methods section with the details discussed above and in relation to the suggested references.

      • *

      Regarding the definition of “stable IEJ” and “active EC”, we used this terminology referring exclusively to our measures of IEJ stability (STB index) and Pha based cell classification where we used the terms of “quiescent”, “active” or “stressed”. Therefore, all statements mentioning more or less “stable IEJ” or “active” EC are relative to the specific context of our experiment (not in absolute terms).

      Overall, we appreciate that the terminology we employed is a source confusion and might suggest inappropriate over-interpretation of our results. We will correct the text in the manuscript to avoid this confusion and to clarify that our observations are valid within the context of our in vitro conditions. In particular, we will present the data regarding junctions as proportions of different junction per cell, and we will rename cell “activation” categories based on PhA immunostaining using more neutral terms (e.g., No Fibres, Peripheral Bundles, Stress fibres). Finally, we will also attempt to generalise our observations to more physiologic context by performing immunostaining on “en face” preparation of murine blood vessels (cfr response to R1 point 1).

      Fig 4: Cell area density distribution for HUVEC, HAoEC and HPMEC in baseline conditions.

      Could the authors comment on the baseline NICD immunoreactivity in the nuclei in HAoEC and HPMEC compared to HUVEC? Is this a reflection of active NOTCH signaling? Or rather, is it possible contact-inhibition (and downregulation of NOTCH) may not have occurred? Demonstration of EC quiescence would help to ensure similar cell cycle states. The definition of "Notch-positive" and "Notch-negative" cells is a bit misleading, as NICD levels and localization are a better indication of canonical Notch activation, and not the presence or absence of Notch protein(s). Further, NICD activation is also dependent on the levels of Notch ligands, which was not addressed. Are the authors able to confirm "OFF", "Low", "High", and "ON" classifications based on NICD intensity and localization with downstream Notch gene activation at a single-cell level? Or correlation between NICD status and the phase of cell cycle or proliferation status?

      R: We thank this reviewer for the comment. Overall, NICD either nuclear or cytoplasmic can give a measure of how much a cell is relaying canonical notch signalling in a small timescale (minutes, which is also the timescale affected by lateral inhibition, Sjoqvist M and Andersson ER, Dev Biol, 2019). By evaluating single cells in the context of their population in multiple fields of view and samples we can get an indication of how frequently a particular cell type tends to actively transduce canonical NOTCH (under confluent conditions). As this and other reviewers have pointed out NOTCH signal transduction mediated by NICD can be affected by several factors limiting the potential to infer actual activation of the pathway (i.e., downstream gene transcription. As suggested by this and other reviewers we are including measures of downstream gene activation, in particular we have included HES1 staining in our workflow, and we will include these data in a new analysis (also see response to R1 point 3). We will also provide new metrics of spatial autocorrelation to evaluate the tendency to lateral inhibition (R1 point 3) and correlation between parameters using continuous mesures and therefore we will remove the previous classification based on thresholds. Finally, we are performing a qRT-PCR screening to assess baseline levels of DLL4, NOTCH1 and JAG1 which we will present as supplementary material.

      Do as I say, Not(ch) as I do: Lateral control of cell fate

      Sjoqvist M and Andersson ER, Dev Biol, 2019

      PMID: 28969930

      The existing workflow/platform is adapted for images obtained from the Operetta CLS system (Perkin Elmer) and Harmony software (proprietary), which may not be available for broader users in the EC field. It would be helpful to include ImageJ macros for the bulk automatic import of TIFF, renaming and upscaling of resolution/bit quality to match the formats that are compatible with the software.

      R: We thank this reviewer for the comment. We have now included an ImageJ macro (available in the GitHub repository) which in principle can import and elaborate images from any source. We didn’t include a specific option in our current user interface because the relabelling operates by parsing original filenames into fields which are then renamed according to user input and each HT platform adopt different regular expression to encode filename. Any user with a basic literacy in ImageJ macro scripting can achieve relabelling and elaboration of their own file given that their filenames use regular expressions which can be parsed. Also, it is relatively easy (again by modifying the macro) to include user defined pre-processing steps including image scaling. An example of parsing method for Operetta CLS filenames is provided in appendix 1.

      Could the authors comment on the manpower (hours from start to finish for experiments, staining, imaging, analysis, etc.) and cost of the ECPT pipeline relative to emerging single cell technologies such as single cell-RNA sequencing.

      Further, one major advantage of imaging technologies is the ability to assess live cell dynamics, which are particularly relevant in response to stimuli and agonists. Have the authors utilized the ECPT platform for these approaches, in particular, to assess the differential EC subtype dynamics in proliferative conditions?

      R: In terms of manpower the workflow is not very demanding. Our current dataset is based on images extracted form 4 independent experiments (18 wells each). The process is sequential, therefore a single user trained in cell biology, automated microscopy and in the use of the different ECPT components (ImageJ, CP, CPA and R) could perform the experiment alone. The timing of each experiment will depend on circumstantial factors, however once the ECPT is trained for specific user’s requirements (which can require some trials and errors depending on user’s experience) the whole process from cell fixation and staining, through image acquisition (~2 h acquisition for each experiment on an Operetta CLS system), to dataset build-up can take less than one week. For example, elaborating the current image database (~6000 images for four fluorescence channels) which data are presented throughout, had the following raw elaboration times on a Mac Book Pro 2017 equipped with an intel i7 processor and 16 Gb of RAM:

      - Image pre-processing and relabelling ~1h

      - Generation of probability maps for VEC and NICD ~3h

      - CP pipeline run (Cell segmentation, objects measurements and classification) ~16h

      - Data import (R studio) ~20m

      • *

      We will measure these parameters more precisely in the new experimental run and present timings for each step in a new table in appendix 1.

      • *

      After main dataset is created R studio can perform most statistical analyses and data plotting almost instantly.

      • *

      We fully appreciate the value of employing ECPT in live imaging setups and we believe it is one of the most promising future applications. We didn’t address live microscopy experiments in the context of ECPT development and validation presented in the current manuscript therefore we cannot present example data or proof of concept. However, we can confidently comment that time lapse experiment would not endow further layers of complexity in terms of image analysis workflow. Therefore, given appropriate set of live markers (e.g., transgenic fluorescently tagged CDH5 for EC segmentation and junctions analysis) we believe that the current implementation of ECPT is already fully equipped to facilitate elaboration and analysis of imaging data derived from time lapse experiments.

      The authors should discuss the ability to amend or revise of the ECPT platform to incorporate analysis of additional markers that may be obtained through imaging, and discuss greater implications and utility to specifically tailor the workflow for other researchers in vascular biology, or to other monolayer culture systems. Further, they should better highlight the novel observations obtained with the ECPT compared to traditional methodology.

      R: We thank this reviewer for the comments. We will provide evidence of ECPT flexibility within this manuscript by including, during the time of this review process, a new analysis for downstream NOCTH signalling (HES1). We will move analysis of cell “activation” (i.e., stress fibres analysis) to supplementary information and include a more through discussion of how automated single cell classification could improve content, speed, reliability and robustness of quantification tasks which are currently exposed to long and tedious processing times and conscious/unconscious observer biases.

      **MINOR COMMENTS:**

      We thank this reviewer for the very thorough revision of the manuscript. It is truly invaluable to us to improve it. Below responses to specific technical points, we will fix all stylistic, formatting and typographical issues in revised version of the manuscript.

      1. There are minor typographical, capitalization and grammatical errors throughout.

      R1: Thanks, we will fix these in updated version of the manuscript.

      Why was fibronectin used to coat plates, and what was rationale for using this ECM substrate versus gelatin (most commonly used in EC cultures) or type I collagen?

      R2: We used fibronectin for immunostaining experiments similar to what reported in our previous work (Veschini et al, 2007, 2011, Wiseman et al, 2019) and also in Andriopoulou P et al,1999. In general, in our experience FN gives better cell adhesion in comparison to gelatin when culturing EC on glass or other substrates different from cell culture plastic. FN is the cell culture substrate recommended by Promocell therefore, we also used FN for cell expansion to avoid any phenotypic change which might have been caused by switch in cell culture substrate.

      3. Based on the various box plots present throughout the figures, it appears that some parameters have a large range of values. Is it possible or helpful to set minimum and maximum exclusionary criteria? Further, in the way that these data are presented, it is difficult to appreciate the effects of a treatment such as 48h of VEGF, as the magnitude of STB Index difference, for example, appears small, and it is difficult to understand whether these significant differences are biologically relevant, as assessed.

      R3: We agree that in absence of exclusion criteria it is difficult to infer biologic meaning out of subtle differences (e.g., the tiny difference in STB index between HAoEC in presence or absence of VEGF). In the current version of the manuscript, we attempted to be agnostic in regards whether some observed small but significant mean differences could endow biologic meaning and discussed larger variation as biologically meaningful, for example the differences in STB index among cell types. We argue that tiny differences in the distribution of some selected parameter across experimental conditions could reflect underlying mechanisms masked by biologic noise, therefore catching a glimpse of these variations via ECPT could inspire novel experiments to specifically address their full biologic significance.

      To the interest of better understanding of the current manuscript we will re elaborate our data to provide more immediate metrics and highlight outstanding features.


      Use of arrows and further description in Figure 1 would help the reader understand what specific features are different in the various EC subtypes. As well, the representative micrographs for HPMEC appear blurry compared to other panels (Fig. 1).

      In Figure 2, the panels in A, B and C do not correspond horizontally, and it may be cleared to demonstrate "Segmentation & features extraction" overlays from the same representative micrographs shown in panel A. Labeling of the individual panels and software used for panel B would help the readership understand what is being quantified and how. The second panel in "C" appears blurry.

      In Figure 3, labelling the color code for quiescent, activated and stressed categories on graphs and in legend would be helpful to easily identify populations.

      R4-6: Thanks, we will fix these in updated version of the manuscript.

      For Figure 4, line separators or more obvious grouping to distinguish discontinuous, linear and stabilized junction types in panel A. What proportion of the different EC subtypes contains discontinuous, linear and stabilized junctions at confluence/quiescence? Is there a correlation between discontinuous junctions and proliferating cells?


      R7: We will perform new analyses to address correlation between proliferation and junctions and proliferation vs HES1. We will restructure data presentation on junctions to display different proportion of junctions per cell or per cell type rather than a unified value (STB index).


      It would be useful to distinguish the effects of published mediators on junctional integrity in intact EC monolayers (i.e. histamine; VEGF) from those shown in this automated quantitation. It appears that 50 ng/mL of VEGF treatment for 48h only slightly increases STB index based on panel C.

      R7c__: __We agree that increase of STB index in HAoEC and HPMEC upon VEGF treatment might not be highly biologically meaningful, save consideration in point 3 above. However, difference in HUVEC (+- VEGF) is visually appreciable in images (i.e., VEGF treated HUVEC seem to have more linear junctions) therefore we believe that the ~16 units difference in STB index is biologically meaningful. As discussed in point 7 above, we will restructure data presentation to better clarify these aspects.


      Figure 5 panel B should provide legend in graphs/figures or figure legends to highlight the color-coding matching the OFF, Low, High and ON groups. Further, it is unclear the difference between "High" and "ON" groups. The authors state that "thresholds were selected empirically", however, it is unclear whether this was derived through utilization of known Notch activators or inhibitors, and how this relates to the threshold of Notch activity necessary to enhance proliferation or maintain quiescence. In Supplementary Figure 4 (which I believe is mislabelled as Supplementary Figure 5), shows only a weak positive correlation between nuclear NICD intensity and mean STB index. It would be of interest to see the plot from Supplementary Figure 5 for each of the EC subtypes, in the presence and absence of VEGF. As well, for Figure 5, on C and D panels, it would improve clarity to revise "Low" and "High" descriptors with "Low NICD activity" and "High NICD activity".

      R8: As discussed above we will revise our analyses to remove NOTCH categories and instead show spatial autocorrelation analyses which work on continuous data.

      In Supplementary Table 1, "Widt/length" should be "Width/length"


      R9: Thanks, we will fix this in updated version of the manuscript.

      For Supplementary Figure 3, it would be of use to show DNA distribution intensities from proliferating, non-confluent EC subtypes to demonstrate the validity of this methodology to identify cells in G0/G1, S and G2/M phases, as highlighted in panel A. Could the authors comment on the discrepancy between the percentage of cells identified as quiescent by ECPT and the percentage of cells in G0/G1? The comment that "VEGF induced a small detectable increase in proliferation rate in all EC" is curious, as a dose of 50 ng/mL of VEGF should be a relatively strong stimulator of proliferation/migration in ECs.

      R10: We will perform ECPT analysis on sub-confluent or sparse cells to further validate our analysis. Qualitative data on preliminary images seems to confirm that the proliferation rate in sparse cells is very high (>70%). To perform the evaluation we followed and improved a previously published method (Roukos et al, Nat Prot, 2015)

      Regarding the relation between cell in G0/G1 and assessment of “quiescent” phenotype (which nomenclature will be revised as discussed above), it is important to highlight that we reported data on stress fibres analysis (i.e., classification into “quiescent”, “activated” and “stressed” cells) only on the cells in G0/G1 (i.e., we excluded proliferating cells from this analysis as we assumed that all proliferating cells would be “not quiescent” and bias our estimation).

      For Supplementary Figure 5, "Nuclear NOTCH intensity" on the Y-axis should read "Nuclear NICD intensity", as it does not appear that Notch was stained. It would also be of benefit to overlay the ranges for "OFF, Low, High and "ON" to appreciate ranges of activation. Is there any correlation between NICD nuclear intensity and proliferative index?

      R11: We will present correlation between NICD or HES1 and proliferation in revised version of the manuscript.

      Definitions should be provided for many terms. i.e. vascular endothelial-cadherin (VE-CAD; CDH5); HUVEC (human umbilical vein endothelial cell); HAoEC (human aortic endothelial cell); HDMEC (human dermal microvascular endothelial cell); NICD (NOTCH intracellular domain); VEGF (vascular endothelial growth factor); etc. at first appearance.


      R12: We will add this information in revised version of the manuscript.

      For EC subtypes purchased from commercial vendor, it would be of interest to understand how many unique donors these cells/data were derived from, and whether there are any differences in basic donor information such as age, sex, etc. Further, Promocell catalogs proliferative rate for each of their lot numbers, and it would be of interest how this compares to the values determined using the ECPT software analysis package.

      R13: We will add this information in revised version of the manuscript.

      1 In the "Cell culture" section of the methods, HDMEC from Promocell are listed, however, the manuscript and figures show data from HPMEC. Both EC subtypes are available from Promocell, however, HDMEC are from dermal origin.

      1 Vascular endothelial-cadherin should be abbreviated "VE-CAD" or "CDH5" and not "VEC", as this is not a standard or gene notation, and will likely be confused with the more common abbreviations for venous or vascular EC. It seems as though "CDH5" is used most commonly throughout manuscript, so this should be used throughout.

      1 The authors refer to "activated NOTCH" when describing antibodies in the methods, however, it would be clearer to the reader to simply refer to the antibody target (NICD), and mention that this reflects canonical NOTCH downstream activation.

      The sentence in the "Immunostaining" methods "CDH5 is a lineage marker..." should be moved to results/discussion as these details are out of place in methods.

      How were the 3 areas captured per wells designated? Were these locations the automated, and the same for all wells?

      "Appendix - Figure" notation should be revised to "Appendix Figure" for consistency and to avoid confusion.

      R14-19: Thanks, we will fix these in updated version of the manuscript.

      How were artifacts and mis-segmented cell objects excluded?

      R20: We will add this information in the revised appendix. As general rules, cells containing NaNs values in any of the parameters, cells fragments or merged cells (evaluated using area measurements) and cells with no detectable junctions were all excluded (total cell excluded from analysis were ~ 2.5 % of the initial dataset).

      • *

      In "Statistical analysis" "Tuckey's" should be "Tukey's". "HSD" should be defined "honestly significant difference" or simply removed, as Tukey's is most common name.

      In "Statistical analysis", "significative" should be "significant" or "statistically significant".

      Scale bars should be added to micrographs.


      R21-23: Thanks, we will fix these in updated version of the manuscript.

      Could the authors comment on the necessity of µclear plates, which substantially increases the cost per plate/experiment.

      R24: m**clear plates were used to allow image acquisition with a 40x water immersion objective in the Operetta CLS (impossible with standard 96 well plates). Cell grown on coverslips and mounted on microscopy slides could be used as well with significant increase in acquisition time (Wiseman et al, 2019).

      • *

      Were other seeding densities and times investigated?

      R25: We will evaluate sparse cells in revised version of the manuscript as discussed above.


      More description on potentially novel observations between these three primary EC subtypes would be informative for the readership to appreciate

      The references do not appear in chronological order. Further, consistency of reference formatting should be reviewed, and appropriate journal name abbreviations should be used.

      R26-27: Thanks, we will fix these in updated version of the manuscript.

      Reviewer #3 (Significance (Required)):

      • This manuscript presents a conceptual and technical advance, introducing a high throughput imaging platform to assess endothelial phenotypes
      • Within the field of angiogenesis, several tools exist, either proprietary, or leveraging ImageJ software to assist in assessment of cells. The ECPT provides a more complex analysis platform to integrate analysis of multiple endpoints
      • This work would be of interest to vascular biology laboratories to adopt a more comprehensive view of heterogeneous endothelial phenotypes in vitro
      • As a vascular biology researcher, I have had extensive experience with in vitro culture of various endothelial cell subtypes from human and mouse. My field of expertise gives me the perspective of the nuances of the direct handling and phenotyping of ECs, and have worked specifically worked with HUVEC, HAoEC and HPMEC, and assessed the impact of key factors relevant in angiogenesis such as VEGF, Notch and other mediators.

      R: We thank the reviewer for the very appreciative comments, and we hope that with the revised version of the manuscript we will be able to fully address remaining concerns.

    1. Reflecting on how new digital tools have re-invigorated annotation and contributed to the creation of their recent book, they suggest annotation presents a vital means by which academics can re-engage with each other and the wider world.

      I've been seeing some of this in the digital gardening space online. People are actively hosting their annotations, thoughts, and ideas, almost as personal wikis.

      Some are using RSS and other feeds as well as Webmention notifications so that these notebooks can communicate with each other in a realization of Vanmevar Bush's dream.

      Networked academic samizdat anyone?

    1. Author Response:

      Reviewer #1:

      The manuscript by Jasmien Orije and colleagues has used advanced Diffusion Tensor and Fixel-Based brain imaging methods to examine brain plasticity in male and female European starlings. Songbirds provide a unique animal model to interrogate how the brain controls a complex, learned behaviour: song. The authors used DT imaging to identify known and uncover new structural changes in grey and white matter in male and female brains. The choice of the European starling as a model songbird was smart as this bird has a larger brain to facilitate anatomical localization, clear sex differences in song behavior and well-characterized photoperiod-induced changes in reproductive state. The authors are commended for using both male and female starlings. The photoperiodic treatment used was optimal to capture the key changes in physiological state. The high sampling frequency provides the capability to monitor key changes in physiology, behaviour and brain anatomy. Two exciting findings was the increased role of cerebellum and hippocampal recruitment in female birds engaged in singing behaviour. The development of non-invasive, multi-sampling brain imaging in songbirds provides a major advancement for studies that seek to understand the mechanism that control the motivation and production of singing behavior. The methods described herein set the foundation to develop targeted hypotheses to study how the vocal learning, such as language, is processed in discrete brain regions. Overall, the data presented in the study is extensive and includes a comprehensive analyses of regulated changes in brain microstructural plasticity in male and female songbirds.

      Reviewer #2:

      Orije et al. employed diffusion weighted imaging to longitudinally monitor the plasticity of the song control system during multiple photoperiods in male and female starlings. The authors found that both sexes experience similar seasonal neuroplasticity in multisensory systems and cerebellum during the photosensitive phase. The authors' findings are convincing and rely on a set of well-designed longitudinal investigations encompassing previously validated imaging methods. The authors' identification of a putative sensitive window during which sensory and motor systems can be seasonally re-shaped in both sexes is an interesting finding that advances our understanding of the neural basis of seasonal structural neuroplasticity in songbirds.

      Overall, this is a strong paper whose major strengths are:

      1) The longitudinal and non-invasive measure of plasticity employed

      2) The use of two complementary MR assays of white matter microplasticity

      3) The careful experimental design

      4) The sound and balanced interpretation of the imaging findings

      I do not have any major criticism but just a few minor suggestions:

      1) Pp 6-7. While the comparative description of canonical DTI with respect to fixel-based analysis is well written and of interest to readers with formal training in MR imaging, I found this entire section (and especially the paragraphs in page 7) too technical and out of context in a manuscript that is otherwise fundamentally about neuroplasticity in song birds. The accessibility of this manuscript to non-MR experts could be improved by moving this paragraph into the methods section, or by including it as supplemental material.

      The main purpose of this section was to introduce and explain the diffusion parameters which are used throughout the rest of the paper. Furthermore, we wanted to familiarize the reader with the concept of the population based template and the different structures that can be visualized by them. We agree that the technical details might have distracted from this main message. Therefore, we have trimmed the technical details out of this section and left a short explanation of the biological relevance of the different diffusion parameters and the anatomical structures visible on the population template. The technical details that were taken out are now a part of the material and methods section.

      The section now reads as follows:

      In the current study, we analyzed the DWI scans in two distinct ways: 1) using the common approach of diffusion tensor derived metrics such as fractional anisotropy (FA) and; 2) using a novel method of fiber orientation distribution (FOD) derived fixel-based analysis. Both techniques infer the microstructural information based on the diffusion of water molecules, but they are conceptually different (table 1). Common DTI analysis extracts for each voxel several diffusion parameters, which are sensitive to various microstructural changes in both grey and white matter specified in table 1. Fixel-based analysis on the other hand explores both microscopic changes in apparent fiber density (FD) or macroscopic changes in fiber-bundle cross-section (log FC) (table 1). Positive fiber-bundle cross-section values indicate expansion, whereas negative values reflect shrinkage of a fiber bundle relative to the template (Raffelt, Tournier et al. 2017).

      A population-based template created for the fixel-based analysis can be used as a study based atlas in which many of the avian anatomical structures can be identified (figure 2). We recognize many of the white matter structures such as the different lamina, occipito-mesencephalic tract (OM) and optic tract (TrO) among others. Interestingly, many of the nuclei within the song control system (i.e. HVC, robust nucleus of the arcopallium (RA), lateral magnocellular nucleus of the anterior nidopallium (LMAN), and Area X), auditory system (i.e. intercollicular nucleus complex, nucleus ovoidalis) and visual system (i.e. entopallium, nucleus rotundus) are identified by the empty spaces between tracts. The applied fixel-based approach is inherently sensitive to changes in white matter and cannot report on the microstructure within grey matter like brain nuclei; but rather sheds light on the fiber tracts surrounding and interconnecting them. As such, it provides an excellent tool to investigate neuroplasticity of different brain networks, and in the case of a nodular song control system focusing on changes in the fibers surrounding the song control nuclei, referred to as HVC surr, RA surr and Area X surr.

      2) Similarly, many sections, especially results, are in my opinion too detailed and analytical. While the employed description has the benefit of being systematic and rigorous, the ensuing narrative tends to be very technical and not easily interpretable by non experts. I think the manuscript may be substantially shortened (by at least 20% e.g. by removing overly technical or analytical descriptions of all results and regions affected) without losing its appeal and impact, but instead gaining in strength and focus especially if the new result narrative were aimed to more directly address the interesting set of questions the authors define in the introductory sections.

      We rewrote the result section, taking out the statistic reporting when it was also reported in a figure to reduce the bulk of this section and make it more readable. We made some of the descriptions of the regions affected more approachable by replacing it with parts of the discussion. This way we incorporated some of the explanations why certain findings are unexpected or relevant, as suggested by reviewer #3. Parts of text that were originally in the discussion are indicated in purple.

      3) The possible effect of brain size has been elegantly controlled by using a medial split approach. Have the authors considered using tensor-based morphometry (i.e. using the 3D RARE scans they acquired) to account for where in the brain the small differences in brain size occur? That could be more informative and sensitive than a whole-brain volume quantification.

      We have taken into consideration to add tensor-based morphometry, but we feel that log FC calculated with MrTrix can provide a similar account of the localization of these brain differences. Both methods are based on the Jacobean warps created between the individual images and the population template. They only differ in the starting images they use (3D RARE images in tensor-based morphometry or diffusion weighted images in log FC metric of MrTrix3) and the fact that MrTrix3 limits itself to the volume changes along a certain tract.

      The log FC difference in figure 4 gives a similar account of the differences in brain size between both sexes. Additionally, figure 6 indicates the log FC differences between small and large brain birds.

      4) I think Figures Fig. 3 and Fig. 4 may benefit from a ROI-based quantification of parameters of interests across groups (similar to what has been done for Fig. 7 and its related Fig. 8). This could help readers assess the biological relevance of the parameter mapped. For instance, in Fig. 3, most FA differences are taking place in low FA (i.e. gray matter dense?) regions.

      We supplied the figures with extracted ROI-based parameters of figure 3 and figure 4. In line with this reasoning we also added the same kind of supplementary figures for figure 5 and 6.

      Figure 3 - figure supplement 1: Overview of the fractional anisotropy (FA) changes over time extracted from the relevant ROI-based clusters with significant sex differences. The grey area indicates the entire photosensitive period of short days (8L:16D). Significant sex differences are reported with their p-value under the respective ROI-based cluster. Different letters denote significant differences by comparison with each other in post-hoc t-tests with p < 0.05 (Tukey’s HSD correction for multiple comparisons) comparing the different time points to each other. If two time points share the same letter, the fractional anisotropy values are not significantly different from each other.

      Figure 4 – figure supplement 2: Overview of the fiber density (FD) changes over time extracted from the relevant ROI-based clusters with significant sex differences. The grey area indicates the entire photosensitive period of short days (8L:16D). Significant sex differences are reported with their p-value under the respective ROI-based cluster. Different letters denote significant differences by comparison with each other in post-hoc t-tests with p < 0.05 (Tukey’s HSD correction for multiple comparisons) comparing the different time points to each other. If two time points share the same letter, the FD values are not significantly different from each other. Abbreviations: surr, surroundings.

      Figure 4 –figure supplement 3: Overview of the fiber-bundle cross-section (log FC) changes over time extracted from the relevant ROI-based clusters with significant sex differences. The grey area indicates the entire photosensitive period of short days (8L:16D). Significant sex differences are reported with their p-value under the respective ROI-based cluster. Different letters denote significant differences by comparison with each other in post-hoc t-tests with p < 0.05 (Tukey’s HSD correction for multiple comparisons) comparing the different time points to each other. If two time points share the same letter, the log FC values are not significantly different from each other. Abbreviations: surr, surroundings.

      Figure 5 – figure supplement 1: Overview of the fractional anisotropy (FA) changes over time in extracted from the relevant ROI-based clusters with significant differences in brain size. The grey area indicates the entire photosensitive period of short days (8L:16D). Significant brain size differences are reported with their p-value under the respective ROI-based cluster. Different letters denote significant differences by comparison with each other in post-hoc t-tests with p < 0.05 (Tukey’s HSD correction for multiple comparisons) comparing the different time points to each other. If two time points share the same letter, the fractional anisotropy values are not significantly different from each other. Abbreviations: C, caudal; surr, surroundings.

      Figure 6- figure supplement 2: Overview of the fiber density (FD) changes over time in extracted from the relevant ROI-based clusters with significant differences in brain size. The grey area indicates the entire photosensitive period of short days (8L:16D). Significant brain size differences are reported with their p-value under the respective ROI-based cluster. Different letters denote significant differences by comparison with each other in post-hoc t-tests with p < 0.05 (Tukey’s HSD correction for multiple comparisons) comparing the different time points to each other. If two time points share the same letter, the FD values are not significantly different from each other. Abbreviations: C, caudal; surr, surroundings.

      Figure 6- figure supplement 3: Overview of the fiber-bundle cross-section (log FC) changes over time in extracted from the relevant ROI-based clusters with significant differences in brain size. The grey area indicates the entire photosensitive period of short days (8L:16D). Significant brain size differences are reported with their p-value under the respective ROI-based cluster. Different letters denote significant differences by comparison with each other in post-hoc t-tests with p < 0.05 (Tukey’s HSD correction for multiple comparisons) comparing the different time points to each other. If two time points share the same letter, the log FC values are not significantly different from each other. Abbreviations: C, caudal; surr, surroundings.

      5) In Abstract: "We longitudinally monitored the song and neuroplasticity in male.." Perhaps something should be specified after the "the song"? Did the authors mean "the neuroplasticity of song system"?

      No, this is not what we meant, we monitor song behavior and neuroplasticity independently. In our study, we do not limit ourselves to the neuroplasticity of the song system, but instead use a whole brain approach. The monitoring of the song behavior in itself might be useful for other songbird researchers.

      We clarified this in the abstract as follows:

      We longitudinally monitored the song behavior and neuroplasticity in male and female starlings during multiple photoperiods using Diffusion Tensor and Fixel-Based techniques.

      Reviewer #3:

      In their paper, Orije et al used MRI imaging to study sexual dimorphisms in brains of European starlings during multiple photoperiods and how this seasonal neuroplasticity is dependent in brain size, song rates and hormonal levels. The authors main findings include difference in hemispheric asymmetries between the sexes, multisensory neuroplasticity in the song control system and beyond it in both sexes and some dependence of singing behavior in females with large brains. The authors use different methods to quantify the changes in the MRI data to support various possible mechanisms that could be the basis of the differences they see. They also record the birds' song rates and hormonal levels to correlate the neural findings with biological relevant variables.

      The analysis is very impressive, taking into account the massive data set that was recorded and processed. Whole-brain data driven analysis prevented the authors from being biased to well-known sexually dimorphic brain areas. Sampling of a large number of subjects across many time points allowed for averaging in cases where individual measurements could not show statistical significance. The conclusions of the paper are mostly well supported by data (except of some confounds that the authors mention in the text). However, the extensive statistically significant results that are described in the paper, make it hard to follow at times.

      1) In the introduction the authors mention the pre optic area as a mediator for increase singing and therefore seasonal neuroplasticity. Did the authors find any differences in that area or other well know nuclei that are involved in courtship (PAG for example)?

      Interestingly, we did not detect any seasonal changes in the pre-optic area or PAG. Whereas prior studies reported volume changes in the POM within 1-2 days after testosterone administration in canaries (Shevchouk, Ball et al. 2019). In male European starlings, POM volumes changed seasonally, although this seems to depend on whether or not the males possessed a nest box (Riters, Eens et al. 2000). In our setup, our starlings are not provided with nest boxes. The lack of seasonal change in POM could have a biological reason, besides the limitations of our methodology. Since these are small regions and are grey matter like structures, they are less likely to be picked up with our diffusion MRI methods.

      2) Following the first comment, what is the minimum volume of an area of interest that could be detected using the voxel analysis?

      The up-sampled voxel size is (0.1750.1750.175) mm3. In the voxel-based statistical analysis a significance threshold is set at a cluster size of minimum 10 voxels: 0.05 mm3.

      3) It would be useful to have a figure describing the song system in European starlings and how the auditory areas, the cerebellum and the hippocampus are connected to it, before describing the results. It would make it easier for the broader community to make a better sense of the results.

      An additional figure was added to the introduction to give a schematic overview of the song control system, the auditory system and the proposed cerebellar and hippocampal projections. This scheme includes both a 2D, and a 3D representation as well as a movie of the 3D representation of the different nuclei and the tractography.

      Figure 1: Simplified overview of the experimental setup (A), schematic overview of the song control and auditory system of the songbird brain and the cerebellar and hippocampal connections to the rest of the brain (B) and unilateral DWI-based 3D representation of the different nuclei and the interconnecting tracts as deduced from the tractogram (C). Male and female starlings were measured repeatedly as they went through different photoperiods. At each time point, their songs were recorded, blood samples were collected and T2-weighted 3D anatomical and diffusion weighted images (DWI) were acquired. The 3D anatomical images were used to extract whole brain volume (A). The song control system is subdivided in the anterior forebrain pathway (blue arrows) and the song motor pathway (red arrows). The auditory pathway is indicated by green arrows. The orange arrows indicate the connection of the lateral cerebellar nucleus (CbL) to the dorsal thalamic region further connecting to the song control system as suggested by (Person, Gale et al. 2008, Pidoux, Le Blanc et al. 2018) (B,C). Nuclei in (C) are indicated in grey, the tractogram is color-coded according to the standard red-green-blue code (red = left-right orientation (L-R), blue = dorso-ventral (D-V) and green = rostro-caudal (R-C)). For abbreviations see abbreviation list.

      Figure 1 – figure supplement 1: Movie of the unilateral 3D representation of the different nuclei and the interconnecting tracts rotating along the vertical axis.

      4) In the results section the authors clearly describe which brain areas are sexually dimorphic or change during the photoperiod and what is the underlying reason for the difference. However, only in the discussion section it is clearer why some of those differences are expected or surprising. It would be useful to incorporate some of those explanations in the results section other than just having a long list of brain areas and metrics. For example, I found the involvement of visual and auditory areas in the female brain in the mating season very interesting.

      Next to the reductions in technical explanation suggested by reviewer #2, We replaced some of the description of significant regions with parts of the discussion and vice versa(indicated in purple). This way we incorporated some of the explanations why certain findings are unexpected or relevant. Furthermore, we added some extra info on the reason why these changes are relevant for the visual system and the cerebellum.

      In line 420: Neuroplasticity of the visual system could be relevant to prepare the birds for the breeding season, where visual cues like ultraviolet plumage colors are important for mate selection (Bennett, Cuthill et al. 1997).

      In line 424: This shows that multisensory neuroplasticity is not limited to the cerebrum, but also involves the cerebellum, something that has not yet been observed in songbirds.

    2. Reviewer #2 (Public Review): 

      Orije et al. employed diffusion weighted imaging to longitudinally monitor the plasticity of the song control system during multiple photoperiods in male and female starlings. The authors found that both sexes experience similar seasonal neuroplasticity in multisensory systems and cerebellum during the photosensitive phase. The authors' findings are convincing and rely on a set of well-designed longitudinal investigations encompassing previously validated imaging methods. The authors' identification of a putative sensitive window during which sensory and motor systems can be seasonally re-shaped in both sexes is an interesting finding that advances our understanding of the neural basis of seasonal structural neuroplasticity in songbirds. 

      Overall, this is a strong paper whose major strengths are: 

      1) The longitudinal and non-invasive measure of plasticity employed 

      2) The use of two complementary MR assays of white matter microplasticity 

      3) The careful experimental design 

      4) The sound and balanced interpretation of the imaging findings 

      I do not have any major criticism but just a few minor suggestions: 

      # Pp 6-7. While the comparative description of canonical DTI with respect to fixel-based analysis is well written and of interest to readers with formal training in MR imaging, I found this entire section (and especially the paragraphs in page 7) too technical and out of context in a manuscript that is otherwise fundamentally about neuroplasticity in song birds. The accessibility of this manuscript to non-MR experts could be improved by moving this paragraph into the methods section, or by including it as supplemental material. 

      # Similarly, many sections, especially results, are in my opinion too detailed and analytical. While the employed description has the benefit of being systematic and rigorous, the ensuing narrative tends to be very technical and not easily interpretable by non experts. I think the manuscript may be substantially shortened (by at least 20% e.g. by removing overly technical or analytical descriptions of all results and regions affected) without losing its appeal and impact, but instead gaining in strength and focus especially if the new result narrative were aimed to more directly address the interesting set of questions the authors define in the introductory sections. 

      # The possible effect of brain size has been elegantly controlled by using a medial split approach. Have the authors considered using tensor-based morphometry (i.e. using the 3D RARE scans they acquired) to account for where in the brain the small differences in brain size occur? That could be more informative and sensitive than a whole-brain volume quantification. 

      # I think Figures Fig. 3 and Fig. 4 may benefit from a ROI-based quantification of parameters of interests across groups (similar to what has been done for Fig. 7 and its related Fig. 8). This could help readers assess the biological relevance of the parameter mapped. For instance, in Fig. 3, most FA differences are taking place in low FA (i.e. gray matter dense?) regions. 

      # In Abstract: "We longitudinally monitored the song and neuroplasticity in male.." Perhaps something should be specified after the "the song"? Did the authors mean "the neuroplasticity of song system"?

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Point-by-point response, comments in (blue), our response in (black)

      Note: we included 6 Figures in our response, yet the ReviewCommons system does not appear to support including images as part of the response. These Figures are in the original "Initial Response" file available to ReviewCommons. We requested that Review Commons post our "Initial Response" file that contains these figures so that this information is available.

      Reviewer #1

      *In the paper by Gowthaman et al., the authors aim at better understanding the molecular mechanisms controlling divergent non-coding transcription (DNC). They describe a high-throughput yeast genetic screen using two strains in which two loci consisting of a coding and a divergent non-coding transcription unit (CGC1-SUT098 or ORC2-SUT014) were replaced by a bidirectional fluorescent reporter construct encoding mCherry in the coding direction and YFP in the non-coding direction. The two reporter strains were crossed with the yeast deletion library and mutants leading to increased or decreased YFP signal were selected as potential DNC repressors or activators. The two screens identified a number of common potential repressors and activators. Components of the Hda1C histone deacetylase complex were identified as DNC repressors in both screens. This phenomenon was confirmed genome-wide by performing NET-Seq in WT as well as hda1D and hda3D strains. This experiment allowed to identify 1517 DNC transcripts repressed by Hda1. Further analyses indicate that Hda1C represses DNC genome-wide independently of expression levels and that loss of Hda1 does not substantially affect coding transcription.

      Live-cell imaging of transcription was then used to show that loss of Hda1 increases DNC transcription frequency rather than duration providing novel information on the link between DNC transcription initiation kinetics and chromatin regulation. Finally, using Chip-seq, the authors show that the level of acetylation over the divergent non-coding units is increased in the absence of Hda1 and some experiments suggest that H3K56 acetylation also contributes to DNC regulation, further strengthening the importance of elevated histone acetylation in efficient DNC.

      Importantly, several components of the SWI/SNF chromatin remodeling complex were identified as activators confirming earlier observations (Marquardt et al., 2014). SAGA subunits were also among potential DNC activators, however these effects could not be confirmed through validation experiments. The authors conclude that DNC may be independent of specific activators and mainly due to transcriptional noise resulting from the adjacent NDR.

      Overall this paper is very well structured, clearly written and the experiments are well controlled. The genetic screen identifies novel factors involved in the regulation of DNC. The study clearly demonstrates that the level of acetylation is a key regulator of divergent non-coding transcription and that histone deacetylation by Hda1 reduces the frequency of DNC initiation events. While this conclusion is strongly supported by the Net-Seq and Chip-seq metagene analyses, the fluorescence mCherry and YFP values or qRP-PCR analyses of specific genes do not always behave as expected when looking at absolute values rather than mCherry/YFP or GCG1/SUT098 ratios, which is sometimes disturbing when reading the paper. Therefore, the following points should be clarified.*

      We are grateful for the kind appreciation of our manuscript and clarify the remaining questions in the revised manuscript.

      **Major points**

      #1.1: Figures 2 and S2A: Figures 2C and D show the mCherry/YFP fluorescence and GCG1/SUT098 RT-qPCR gene expression ratios respectively, which are consistent with a repressive effect of Hda1C on DNC transcription and a potential DNC activating effect of SAGA components. However, the absolute mCherry and YFP or GCG1 and SUT098 expression values presented in Figures S2A and S2B show the opposite: loss of Hda1C subunits rather leads to a decrease in mCherry with not much effect on YFP; moreover loss of Hda3 results in decreased SUT098, which is inconsistent with the whole model. The same comment is valid for the SAGA mutants. It would be good to provide some explanation for these a priori contradictory observations, especially for the Hda1c mutants, which are the major focus of the study. The Net-Seq analyses are certainly more reliable since less subject to protein or RNA stability effects, which may underlie some of the inconsistencies between protein and RNA absolute levels.

      Thank you for this comment. We offer enhanced clarity in the revised manuscript.

      In general, transcription in each direction shows a weak yet highly statistically relevant positive correlation (Spearman rho = 0.26, p-value = 4.94e-24). We are enclosing a plot based on NET-seq data that supports the correlation in each direction of a NDR as part of our response below (RFig.1). To unpick relative effects the ratio captures these effects well, in our experience better than the individual fluorescence measurements or RT-qPCR. Of course, we are ultimately interested in transcription and fluorescence measurements or RT-qPCR of steady-state RNA are only an approximation. Resources and time constraints limit how many mutations we can examine by techniques such as NET-seq, which are arguably most informative. The positive correlation between transcription in each direction has the effect that relative differences can manifest themselves through detectable effects of the other fluorophore. As this reviewer mentions, we can be most confident of results that we could further validate by NET-seq or live-cell imaging.

      (INSERT Rfig1)

      RFig1: Scatterplot of NET-seq data for DNC/host gene pairs. Each point corresponds to a bidirectional gene promoter overlapping with a nucleosome-depleted region (NDR). The values represent NET-seq FPKM values in protein-coding (x-axis) vs non-coding (y-axis) directions. These data support a statistically significant correlation (Spearman test: rho = 0.2554876, p-value = 4.939658e-24).

      #1.2: Figure 3: this figure examines the effect of Hda1 and Hda3 on the 1517 DNC transcripts. Does loss of this HDAC also increase the expression of all the other 2219 non-coding transcripts identified by Net-Seq, which would make Hda1C a more general repressor of non-coding transcription?

      We have performed the analysis for all other non-coding transcripts in Hda1C mutant NET-seq data and added it as part of this response RFig2. Quantification of CUTs, SUTs and other lncRNAs that are not resulting from DNC in Hda1C mutants results in a slight increase in the nascent transcription that is not statistically significant. These data do not offer strong support for the idea that Hda1C represents a more general repressor. We added this plot as novel supplementary figure S3D and adjusted the text of the revised manuscript (line 214).

      (INSERT Rfig2)

      RFig2: Metagene plot of NET-seq data for non-coding RNA that are not classified as DNC. Metagene plot shows genomic windows [TSS - 100 bp, TSS + 500 bp] relative to the annotated starts of ncRNA transcripts.

      #1.3: Moreover, does loss of Hda1 or Hda3 reveal DNC transcripts that were not detected in wild-type? This may increase even more the number of genes with divergent transcription.

      We are grateful for the opportunity to clarify this point. We noticed that the yeast genome shows evidence for much more non-coding transcription than annotated. In this paper, we used TranscriptomeReconstructoR for a data-driven annotation of yeast non-coding transcripts, with an emphasis on the boundaries. See also:__ ( DOI: 10.1186/s12859-021-04208-2 ). The set of non-coding transcripts was for example informed by the previously published NET-seq data on wild-type samples (Churchman et al., 2009; Marquardt et al., 2014; Harlen et al., 2016; Fischl et al., 2017). We have clarified relevant Methods sections to make this point more accessible (line 733). The combination of these NET-seq datasets gives a very good sequencing coverage. The Hda1C mutant NET-seq data does not have a better coverage than this combined reference set, so it would be very hard to find new transcripts without prior evidence in our exhaustive set of combined NET-seq data. However, our Supplementary table S3 contains the fold-change values for all DNC transcripts in mutant compared to wild type. Loci with a high fold-change could arguably be regarded as hda-specific. __

      #1.4: Figures S3A, B, C: are the 3 groups of DNCs derepressed to the same extent by loss of Hda1 or Hda3? This is difficult to judge given the differences in y-axis scales. Figures S3D, E: the authors show the Net-Seq snapshots for the GCG1 and ORC2 loci. It would be good to add the quantifications as presented in Figure 3 for YPL172C and YDRr216C.

      Thank you for the suggestion. We replaced S3A-C with plots that show the same range of the y-axis in the supplementary figure. Hda1C represses DNC in all three cohorts stratified by DNC expression strength. We also added a quantification boxplot for NET-seq signal in the GCG1 and ORC2 loci in revised S3F-I.

      #1.5: Figures S4A, B, C and D are not well explained. What does the y axis frequency correspond to? Is it the % of cells showing a signal? Is the intensity of SUT098 higher because the transcription initiation frequency is higher and therefore the transcription site signal is more intense?

      We improved the annotation for the supplementary figure S4. We clarified in the legend that the y-axis frequency represents the percentage of frames recorded for transcription initiation spots (TS). The bars represent transcription intensity in all the frames recorded, with active transcription ‘ON’ and without TS ‘OFF’. The intensity increases with higher initiation rates and thus the intensity of SUT098 transcription initiation is high.

      #1.6: Figures S4 A-I should be more specifically cited in the text.

      We have cited the figures in the text in the revised version.

      #1.7: Figure 5A: it is really unexpected and unclear why the mCherry/YFP in the WTH3/hda1D and WTH3/hda1D/H3K56mut is increasing compared to WTH3, since DNC is supposed to increase. Similar comment for Figure S5C. This should be clarified in the text.

      Thank you for pointing this out. We missed to address this in the text. The isogenic control “H3 wild type” carries only one copy of the two genes coding for H3, which has a general effect on transcription. We added data showing this as part of our response (RFig3.), and explained this part more clearly in the revised text (line 263). Essentially, the genetic background of the yeast synthetic histone mutant collection sensitizes for a decreased ratio of mCherry/YFP (RFig3.). This result is also included in table S2, where deletions of the histone genes HHT2 (H3) and HHF2 (H4) are listed as shared repressors in both screens. Hda1C mutations show the increased ratio in the sensitized “H3 wild type” background, but not in backgrounds we tested that contain a wild-type dosage of histone genes.

      These data remain valid to support the genetic interaction of hda1D along with the substitution mutants of H3K56.

      (INSERT Rfig3)

      RFig3. Fluorescence signal values of H3WT and BY4741 strains with GCG1pr FPR. The H3WT affects general transcription of coding transcript and decreases the ratio of mCherry/YFP fluorescence.

      #1.8: More generally, as already mentioned above, the fluorescence data are expressed either as mCherry/YFP ratio or as absolute values. It would be good to systematically show the ratios and the absolute values of mCherry and YFP signal; the same for coding and DNC RT-qPCR as well as Net-Seq values when available.

      We ensured that the absolute data values for flow cytometry and qPCR have been represented in the supplementary figures S2 and S5. The FPKM values for NET-seq data for individual transcript units are provided in the supplementary table S3.

      #1.9: Figures S5A and B are not referred to in the text. It should be mentioned and explained how normalization to H3 affects the levels of acetylated H3 over the NDR.

      We now refer to the figures in the main text and explained the rationale for normalization.

      #1.10:* p. 12 "Our data thus suggest to extend the transcriptional noise hypothesis with activities limiting DNC transcription to account for genome-wide variation in non-coding transcription".

      If DNC is the result of "transcriptional noise", it is surprising that in the case of CGC1-SUT098, the transcription frequency is higher in the non-coding versus the coding direction. Is the SUT098 behaving like the coding unit in this case? The authors should comment on that. *

      This is very interesting point. One interpretation of the “transcriptional noise” hypothesis is indeed that non-coding transcription is at low level. We selected loci with high DNC expression, so these loci are somewhat contradictory to this idea a priori. Nevertheless, identifying a biological function of non-coding RNAs is challenging, and it remains to be tested if SUT098 represents particularly “loud noise” or if the high transcription indicates that it carries a yet unknown cellular function. In theory, this screen is suitable to identify factors that may be required to induce DNC, perhaps even specifically. To identify such factors a locus with high DNC is needed to facilitate detection, since our previous screen using the PPT1/SUT129 system had lower SUT expression and failed to identify such mutants systematically. This is important, since a mutation lowering DNC needs to start from a sufficiently high fluorescence signal to distinguish it from background fluorescence. Since the results presented did not clearly uncover such factors, we favor the hypothesis of DNC arising due to the promoter architecture at NDRs, see also positive correlation plot in RFig1. The many repressive pathways are also acting on highly expressed DNCs, which is certainly an interesting information provided by this manuscript.

      **Minor comments**

      #1.11: p. 4 should one talk about Hda1C-linked histone acetylation facilitates... (should be deacetylation...??)

      Done.

      #1.12: The authors should explain why they chose two coding/non-coding pairs that are cac2D insensitive and whether other criteria, such as level of DNC transcription, were also considered, since GCG1-SUT098 represents one of the most highly expressed divergent non-coding transcripts.

      The GCG1 and ORC2 loci were chosen based on i) high DNC levels, ii) a low fold-change of NET-seq data in the cac2 and iii) a DNC region free from other transcriptional units. However, this was based on the state-of-the-art annotation in 2015 when we started this project. Also, when we categorized genes as affected by cac2, we used a fold-change expression cut-off that suggested that about a third of DNCs are repressed by CAF-I. It appears that we still underestimated the effect of CAF-I, since our data show that the target regions of our new screens are also affected by CAF-I. DNC expression at these loci is high, which would result in a low fold-change in mutants that further increased DNC here.

      #1.13: It is hard to understand why both the H3K56A and H3K56Q mutations lead to increased DNC, a result already presented in the Marquardt et al. 2014 paper. It would be helpful to provide a more extensive explanation or hypothesis.

      The H3K56 substitution mutant Q is expected to mimic the acetylation state and A is devoid of post-translational modifications. We observe an increase of signal ratio in the mutants because the H3K56ac is both responsible for incorporation and eviction of -1 nucleosomes (Marquardt et al., 2014). Mutations affecting H3K56 can thus result in less -1 nucleosome density and more DNC through reducing incorporation or enhancing eviction. We have improved the revised text to highlight this. We have clarified this in the text (line 271).

      #1.14: What defines the level of DNC repression? How does the level of repression correlate with the level of coding transcription?

      We have added RFig.1 to address the question about correlation. There is a statistically significant positive correlation between transcription in each direction by NET-seq data in wild type samples genome-wide. However, the correlation is weak (rho = 0.26), which is consistent with locus-specific adjustments of transcriptional strength in each direction. For DNC, several chromatin-based pathways contribute to repression. The resulting level of DNC transcription thus reflects the combined action of several pathways. Here, we characterize Hda1C as a novel player with a genome-wide effect on this phenomenon. Elucidating the mechanistic interplay at specific target DNC loci will be an exciting future research question.

      Reviewer #1 (Significance (Required)):

      This is a very interesting and innovative study using cutting edge genetic approaches, genome-wide sequencing as well as single cell imaging to extend our understanding of non-coding transcription regulation and its potential impact on gene expression. It is a nice continuation and complement of an earlier study from the same author (Marquardt et al., 2014) and will certainly be of interest to a large chromatin biology audience.

      We are grateful for the appreciation of our research on this topic.

      Reviewer #2

      Promotors are frequently transcribed in both directions. The divergent, \upstream' transcript is frequently unstable. Transcription initiation is regulated through the acetylation of promoter-proximal nucleosomes, where HDAC-dependent deacetylation of histones typically represses transcription initiation.*

      *The current manuscript addresses the question whether initiation of coding and divergent, non-coding (DNC) transcription is regulated by the same factors. Previously Marquardt and others showed that H3K56ac-mediated histone exchange has a differential effect on coding and DNC transcription.

      Using a clever reporter system, the authors screened for positive and negative regulators that preferentially affect DNC transcription. They discover the Hda1 deacetylase complex as a DNC-biased repressor and diverse HATs as DNC-biased activators. The role of activators could not be validated, presumably due to high variability of the system.*

      Focusing on Hda1c the authors present data suggesting a larger effect of Hda1c on 'upstream' nucleosomes associated with DNC transcription than in coding transcription. Genome-wide NET-seq mapping was consistent with this differential regulation. Life cell imaging of one specific case argues that Hda1-mediated repression reduced the time between initiation events. The authors employ state of the art methods and in general the data are of very good quality. The effect size is very small, which raises the broader question whether the results, while statistically significant is biological relevant. I have a few comments that the authors may use to revise their manuscript.

      Thank you for the appreciation of our very good data quality. We hope our revision plan will help to clarify some confusion about the scope and effect size.

      #2.1) The differentially regulated coding and DNC transcription are defined by a directionality score. The screen was performed with two reporter loci that are strongly biased for DNC transcription (the idea to detect activators did not work out). Considering that coding and DNC transcription may not be totally independent because of the proximity of target nucleosomes, and sense and antisense transcription may compete for regulators, the question arises how levels of coding transcription affect DNC transcription in wildtype and mutants. The authors stratified their results according to levels of DNC transcription, but discussion and data analysis of the effect of coding transcription on the directionality score may be relevant.

      We added the plot in RFig.1 above to address the question of correlation between transcription in each direction. NET-seq data supports a weak but highly statistically significant positive correlation between transcription in each direction genome-wide (rho = 0.26, p-value = 4.94e-24). We agree that it is relevant to discuss the effect of coding transcription on the directionality scores and revised the discussion accordingly (line 315). We have used both the coding and DNC signal values to create the comprehensive quadrant scatter plot in Fig. 1D-E. Analysis of mutants along the diagonal illustrates that many mutations affect coding transcription as well as DNC. The directionality score measures deviations from the axis of positive correlation, which requires us to use the information of both fluorophores.

      #2.2) The study is strong where the findings can be generalized. The single-molecule live-cell imaging analysis, while done properly, has only limiting impact, because the corresponding coding transcript could not be detected. This si more an anecdotal finding.

      There seems to be a misunderstanding, the live-cell imaging measurements of transcription for SUT098 are stand-alone data. SUT098 by itself is a transcription unit, so we measure DNC of this unit independently from GCG1 that has much lower expression. The measurements are specific to SUT098 transcription and the quantification provides new information about the mechanisms involved in the regulation of DNC. We clarified the text in this regard (line 233).

      #2.3) The effect size is small (20%, on average) and the variability is high. The fact that the HATs that emerged as very robust activators of DNC transcription could not be validated and that the Hda2 subunit of the HDAC complex was not found statistically significant show the limitations of the study. To their credit, the authors discuss these limitations appropriately.

      We have worked on the Methods in the revised manuscript to clarify this confusion (line 712). For the screen, the median signal values represent data from up to 50,000 individual cells. These experiments are remarkably accurate and highly reproducible, especially for molecular biology where n=3 is common. We have uploaded these data to the FlowCore public repository. We encourage any colleague to exploit the opportunity to analyze these data independently to experience the high data quality. With high number of observations, 20% average is a large effect and reflects a rather big shift of the population. As is standard for genetic screens, resource constraints are prohibitive to pursue all hits. In addition, it is expected that only some hits will be affecting transcription of DNC since the fluorescence reporter can be affected by many other cellular events. We focused on the effects on DNC in this manuscript.

      There seems to be some misunderstanding, Hda2 is a statistically significant hit in the ORC2/SUT14pr screen; this information is in Fig. 1E. The Hda1C subunits are labeled in purple.

      #2.4) Figure S3C suggests that the Hda effect is largest at genes that are poorly expressed, and smaller at more average expression levels. Are we looking at a phenomenon that mainly applies to repressed genes?

      Thank you very much for this suggestion. We replaced S3A-C with revised panels where the data is shown with the same y-axis scale, please see also #1.4. We believe the revised presentation also helps to clarify that the mutations increase DNC for all cohorts stratified by DNC expression.

      **Minor issues**

      #2.5) The NET-seq study involves two replicates. How well did they correlate?

      The WT and mutant NET-seq replicates have good correlation (Spearman’s correlation coefficient was above 0.6 for WT and above 0.8 for the mutants).

      (INSERT Rfig4)

      RFig4. Correlation scatter plot of individual NET-seq replicates of WT, hda1D and hda3D. Spearman correlation coefficients of WT, hda1D and hda3D are 0.677, 0.8 and 0.825, respectively.

      #2.6) For the live-cell imaging replicates were not mentioned. Were replicate studies performed?

      We have updated the text to make this important point more accessible (line 230). For live-cell imaging studies, transcription is recorded as movies of cells over time. We took multiple movies, and pooled the data from all the cells to improve statistical power. Data from each movie represent individual repeats. We monitored 130 cells on average for the WT and mutant strains over time.

      #2.7) Fig 4E is not mentioned in the text (mislabeled as 4D)

      Done.

      #2.8) Fig S5 is not mentioned in the main text.

      __Done.

      __Reviewer #2 (Significance (Required)):

      In summary, this is a high-quality study that presents the results of a genome-wide screen that will be of interest to colleagues in the narrower field. Due to the small effects the results may appeal less to a general readership.

      We are grateful for appreciating our manuscript as a high-quality study. We hope our revisions help to clarify confusion concerning effect size.

      Reviewer #3

      In this manuscript, Gowthaman et al describe the results and follow up of their screen aimed at identifying regulators of divergent noncoding (DNC) transcription in S. cerevisiae. From this screen, they identify Hda1C as a repressor of DNC transcription, and perform follow experiments to support and detail this finding. In addition to RTqPCR to confirm the reporter and endogenous changes, the authors perform NET-seq to look at global DNC alteration upon Hda1C subunit deletion and identify a number of non-coding transcripts with altered expression levels. In addition, the authors perform live cell imaging to demonstrate that there is a modest restriction of initiation frequency when one of the subunits of Hda1C is deleted. Finally, the authors explore changes to pan-H3 acetylation and the genetic overlap between Hda1C and H3K56ac demonstrating independent genetic pathways, but overall increases in H3 acetylation over DNCs when Hda1C is deleted. Overall, the screen and results are of interest, but the authors overstate some of the conclusions (perhaps most importantly within the title!). I have the following suggestions to improve the manuscript:

      Thank you for recognizing the interest in our results. We have revised the manuscript to state the conclusions more cautiously.

      **Major comments**

      #3.1. The title of the manuscript is based on the single molecule live cell imagining experiments presented in Figure 4. While there is a statistically significant decrease in initiation frequency from deletion of one Hda1C subunit, there is no statistical decrease in deletion of the other two. Furthermore, these experiments were performed at one locus. As a result, I find the title to be an overstatement of the findings of the paper and suggest the authors refocus on the more robust findings of the manuscript.

      Live-cell imaging requires extensive engineering of the target loci. Perhaps this was lost in the Methods, but it is a 5-step process to integrate the stem-loops. We tried to engineer other loci, but this is far from trivial and this technique does not work for all loci tested. The hairpins are also unstable, and need to be carefully checked prior to experimentation, which challenges scaling this approach up to a higher-throughput. It appears that we undersold this point, but the fact that we now provide a locus and strains for the community that makes such studies possible for DNC represent a tremendous achievement. Since hda1D also decreases time between initiations, we generalized the finding to Hda1C.

      However, we recognized that the reviewer makes a helpful suggestion to choose a more careful title since there is no statistically significant reduction of initiation frequency in some mutants. We have revised the title to “__Hda1C limits divergent non-coding transcription and restricts transcription initiation frequency__” in the revised manuscript to address this point.

      #3.2. Relatedly, in Figure 4, the authors present the findings from the single molecule live cell imaging experiments. Within this experiment, the authors include a cac2 deletion (CAF-1 subunit) strain, and observe a modest effect, similar to hda1 deletion. This is surprising as the authors mentioned this location (GCG1/SUT098) was selected as CAF-1 was NOT shown to regulate the DNC previously (Marquardt et al 2014; as mentioned at the beginning of the Results section). The similar decrease in initiation frequency between cac2 deletion and hda1 deletion further concerns me regarding the use of these data as the headlining finding.

      We believe there is a misunderstanding. We clarify that selection of the GCG1 locus was based on a cut-off value for cac2D effect, as is also shown in Fig S1C. The fold-change is small, but since DNC transcription of the chosen loci is high in wild type, an increase in a mutant would not necessarily give a high fold-change. Hence, we need to be cautious to conclude that CAF-I does not regulate DNC at this locus. The fold-change analysis suggested it, but it remained possible. CAF-I appears to affect even more loci than initially identified with the chosen cut-off. We see the same trend as in Hda1C mutants as in cac2, which offers support to the exciting idea that modulation of the initiation frequency may be a shared mechanism by chromatin-based regulators acting on DNC.

      #3.3. It is unclear to me why the change in mRNA expression is included within the screen. Why not solely look at the expression change of the DNC? Importantly, the authors note in the discussion that perhaps the reason the SAGA complex was identified was due to regulating mRNA expression and not DNC expression and therefore was identified in the screen. Could the authors not just present the fold change in DNC expression using their YFP reporter, and not the YFP vs mCherry?

      The regulation of initiation frequency in each direction is super-imposed on a general positive correlation __(rho = 0.26, p-value = 4.94e-24) between the coding and non-coding directions__, please see also RFig.1. For the purpose of this study about selective effects on the direction of transcription, it is vital to incorporate both sides of the reporter. Otherwise, we would select for factors that activate or repress the transcription from the target promoter NDR. This point is accessible in Fig.1D-E, where mutations that affect YFP usually also have an effect on mCherry. The aim of this study was to identify mutants that affect the relative expression, and therefore a focus on one fluorophore would not improve the analysis. We clarified this important point more accessibly in the revised manuscript (line 315).

      Please also note that all the raw data are available, so colleagues are in the position to perform their independent analyses. We believe that it is very valuable for the community to have access to these data since they may be useful for other purposes and could be analyzed in many different ways. In fact, we have tried several methods and approaches over the years and present what we believe is most appropriate in this manuscript. For example, Hda1C comes out as a convincing hit with a range of different approaches to analyze the data, which is also a reason we feel confident about the characterization of Hda1C.

      #3.4. This is absolutely beyond the scope of the paper, but limiting the screen to only nonessential proteins likely misses important regulators. In the future, perhaps the authors could pursue a SATAY screen to look for essential proteins as well? Again, the findings of this paper are appropriate, and the screen is a great undertaking, but I want to suggest this to the authors for potential future projects.

      Thank you for this excellent suggestion. We agree that capturing the role of essential factors would be very informative, and the saturated transposition approach would be promising. However, as the reviewer points out, performing these analyses is beyond the scope of the current manuscript.

      #3.5. The authors perform NETseq experiments in deletion strains and identify ~1500 DNC transcripts with altered expression. Later the authors look into the mechanism and demonstrate an increased H3ac in hda1 deletion strains. The authors could enhance the representation of these datasets by correlating the change in H3ac with the change in DNC transcription - do they correlate?

      Thank you for bringing up this excellent point. We present the correlation data of change in H3ac and DNC transcription in the hda1D mutant (RFig5.). The ChIP-seq and NET-seq values of hda1D were divided by respective WT values in order to quantify the relative increase of H3 acetylation or nascent transcription in hda1D). The data showed a weak (Spearman rho= 0.23) but significant (pval=3.0e-20) positive correlation between the ratio values. The hda1D-dependent increase in H3 acetylation correlates with hda1D-dependent increase of RNAPII occupancy in DNC transcripts. We enhanced our representation of these data by including this plot as S5D in the revised manuscript as suggested.

      (INSERT Rfig5)

      RFig5__: Scatterplot of hda1D/WT NET-seq (y-axis) and ChIP-seq (x-axis) ratios. Each point corresponds to a bidirectional gene promoter overlapping with an NDR. The x-axis shows ChIP-seq ratios, and the y-axis shows the NET-seq ratios. These data support Spearman correlation test: rho = 0.234 and a statistically significant p-value = 3.0e-20.__

      #3.6. In Figure 5, the authors argue that Hda1C works non-redundantly with K56ac, using point mutants to mutate K56 to A or Q. Did the screen identify anything else in the K56ac pathway? Rtt109 or Asf1, for example? Because Hda1C deacetylates H3, including but not limited to K56, it is a bit surprising the K56 point mutations result in a larger increase in SUT098-YFP levels. The authors discuss within the text that Hda1C has multiple targets; but coming back to my previous point that CAF-I was not supposed to impact this location, I am having a hard time understanding these results.

      This is an excellent point. We improved the manuscript by highlighting other factors with links to H3K56ac in our scatter plots, for example Rtt109 in Fig 2A. Nevertheless, the reviewer may wish to satisfy his/her curiosity by exploring table S2 in more detail. Table S2 lists the top candidates from both screens.

      We hope our answer to point #3.2 helped to clarify the aspect of this comment related to CAF-I.

      **Minor comments**

      #3.7. The authors follow up the screen using RTqPCR for GCG1/SUT098 in newly made deletion strains. I was surprised the authors choose this locus rather than the ORC2/SUT014 locus, as the screen showed a strong increase for this reporter. While I appreciate generating the deletion strains within the reporter is beyond necessary, assessing the endogenous locus within the deletion strains by RTqPCR seems reasonable.

      We chose GCG1 locus since the fold change in directionality by genetic screen was high for the activator mutants. We will perform this experiment and add the missing validation experiment for the ORC2 locus in the revised manuscript.

      #3.8. The authors tend to show their genomic data as metaplots; it would be nice to see heatmaps where more can be gleaned from the display of all the loci. This applies to the NET-seq data (Figure 3) and the ChIP-seq data (Figure 5).

      We appreciate the suggestion and generated the requested heatmaps using the NET-seq tracks of WT and hda mutants (RFig6.). The heatmap represents the same genomic intervals as on the corresponding metagene plot (Figure 3A). We find that the differences between WT and hda samples are more clearly accessible at first glance on the metagene plot rather than on the heatmap. We believe that this could be because the heatmaps do not represent what transcripts have in common and rather underlines the differences. In contrast, the metagene plots reveal the common trends by taking the average of signal. We thus prefer showing metagene plots in the manuscript, as they allow for overlay of multiple tracks on the same plot, thus enhancing visual comparison for the readers.

      (INSERT Rfig6)

      RFig6. Heatmap representing NET-seq data in WT, hda1D and hda3D. Genomic intervals covering [TSS - 100 bp, TSS + 500 bp] of DNC transcripts (n=1517) are shown. The color indicates the log2-transformed NET-seq values.

      #3.9. In Figure 5B, the authors present H3ac ChIP-seq data, presented as a ratio of H3ac/total H3. While this is a perfectly acceptable way to present the data, I was surprised to see a decrease in total H3 levels when examining the supplemental data. Has this decrease in H3 occupancy upon hda1 deletion been shown previously? This finding should be discussed within the manuscript.

      We appreciate that the reviewer noticed this. We do not think this has been explicitly stated before, as the focus thus far had been on the effects towards the mRNA. However, the effect is not statistically significant between the WT and hda1D as observed in S5B. We thus prefer to remain cautious about this conclusion.

      #3.10. In Supplemental Figure S3, the authors break down the NET-seq data by DNC FPKM, which is very nice. Very minor point that the font here is quite small.

      Thanks, we improved the font size. Note that we also revised the y-axis scale in response to comment #1.4.

      Reviewer #3 (Significance (Required)):

      \*Significance:** *

      The regulation of divergent non-coding RNAs is an understudied field. In this paper, the authors perform a screen for all non-essential yeast proteins in regulating the expression of these ncRNAs. The screen results and follow up defining the role of Hda1C in broadly repressing the expression of these ncRNAs is of interest to the field.

      We are grateful to the reviewer for highlighting the interest of our work to the field.

      \*Context:** *

      This work follows from Marquardt's previous 2014 study that identify Caf1 as regulating DNCs in S. cerevisiae.

      \*Audience:** *

      Broadly, the chromatin and transcription field. Anyone interested in how chromatin regulates transcription, regulation of ncRNAs, and functions of histone modifying enzymes.

      \*Expertise:** *

      I am a member of the chromatin and transcription field, largely performing genomic experiments. We do not perform microscopy, although sufficiently understand the experiments and results presented here.

    1. Author Response:

      Reviewer #1:

      This manuscript by Gabor Tamas' group defines features of ionotropic and metabotropic output from a specific cortical GABAergic cell cortical type, so-called neurogliaform cells (NGFCs), by using electrophysiology, anatomy, calcium imaging and modelling. Experimental data suggest that NGFCs converge onto postsynaptic neurons with sublinear summation of ionotropic GABAA potentials and linear summation of metabotropic GABAB potentials. The modelling results suggest a preferential spatial distribution of GABA-B receptor-GIRK clusters on the dendritic spines of postsynaptic neurons. The data provide the first experimental quantitative analysis of the distinct integration mechanisms of GABA-A and GABA-B receptor activation by the presynaptic NGFCs, and especially gain insights into the logic of the volume transmission and the subcellular distribution of postsynaptic GABA-B receptors. Therefore, the manuscript provides novel and important information on the role of the GABAergic system within cortical microcircuits.

      We have made all changes humanely possible under the current circumstances and we are open to further suggestions deemed necessary.

      Reviewer #2:

      The authors present a compelling study that aims to resolve the extent to which synaptic responses mediated by metabotropic GABA receptors (i.e. GABA-B receptors) summate. The authors address this question by evaluating the synaptic responses evoked by GABA released from cortical (L1) neurogliaform cells (NGFCs), an inhibitory neuron subtype associated with volume neurotransmission, onto Layer 2/3 pyramidal neurons. While response summation mediated by ionotropic receptors is well-described, metabotropic receptor response summation is not, thereby making the authors' exploration of the phenomenon novel and impactful. By carrying out a series of elegant and challenging experiments that are coupled with computational analyses, the authors conclude that summation of synaptic GABA-B responses is linear, unlike the sublinear summation observed with ionotropic, GABA-A receptor-mediated responses.

      The study is generally straightforward, even if the presentation is often dense. Three primary issues worth considering include:

      1) The rather strong conclusion that GABA-B responses linearly summate, despite evidence to the contrary presented in Figure 5C.

      2) Additional analyses of data presented in Figure 3 to support the contention that NGFCs co-activate.

      3) How the MCell model informs the mechanisms contributing to linear response summation.

      These and other issues are described further below. Despite these comments, this reviewer is generally enthusiastic about the study. Through a set of very challenging experiments and sophisticated modeling approaches, the authors provide important observations on both (1) NGFC-PC interactions, and (2) GABA-B receptor mediated synaptic response dynamics.

      The differences between the sublinear, ionotropic responses and the linear, metabotropic responses are small. Understandably, these experiments are difficult – indeed, a real tour de force – from which the authors are attempting to derive meaningful observations. Therefore, asking for more triple recordings seems unreasonable. That said, the authors may want to consider showing all control and gabazine recordings corresponding to these experiments in a supplemental figure. Also, why are sublinear GABA-B responses observed when driven by three or more action potentials (Figure 5C)? It is not clear why the authors do not address this observation considering that it seems inconsistent with the study's overall message. Finally, the final readout – GIRK channel activation – in the MCell model appears to summate (mostly) linearly across the first four action potentials. Is this true and, if so, is the result inconsistent with Figure 5C?

      GABAB responses elicited by three and four presynaptic NGFC action potentials were investigated to have a better understanding about the extremities of NGFC-PC connection. Although, our spatial model suggests that in L1 in a single volumetric point one or two NGFCs could provide GABAB response with their respective volume transmission, it is still important that in the minority of the percentage three or more NGFCs could converge their output. The experiments in Fig 5 not only offer mechanistic understanding that possible HCN channel activation and GABA reuptake do not influence significantly the summation of metabotropic receptor-mediated responses, but also support additional information about the extensive GABAB signaling from more than two NGFC outputs. Interestingly in this experiment the summation until two action potentials show very similar linear integration as seen in the triplet recordings. This result suggests that the temporal and spatial summation is identical when limited inputs are arriving to the postsynaptic target cell. Similar summation interaction can be seen in our model until two consecutive GABA releases. Three or four consecutive GABA releases in our model still produces linear summation, our experiments show moderate sublinearity. One possible answer for this inconsistency is the vesicle depletion in NGFCs after multiple rapid release of GABA, which was not taken into account in our model.

      Presumably, the motivation for Figure 3 is that it provides physiological context for when NGFCs might be coactive, thereby providing the context for when downstream, PC responses might summate. This is a nice, technically impressive addition to the study. However, it seems that a relevant quantification/evaluation is missing from the figure. That is, the authors nicely show that hind limb stimulation evokes responses in the majority of NGFCs. But how many of these neurons are co-active, and what are their spatial relationships? Figure 3D appears to begin to address this point, but it is not clear if this plot comes from a single animal, or multiple? Also, it seems that such a plot would be most relevant for the study if it only showed alpha-actin 2-positive cells. In short, can one conclude that nearby, presumptive NGFCs co-activate, and is this conclusion derived from multiple animals?

      The aim of Fig. 3 D was to indicate that the active, presumably NGFCs are spatially located close to each other. The figure comes from a single animal. We agree with the reviewer, therefore changed the scatter plot figure in Fig. 3D to another one, that provides information about the molecular profiles of the active/inactive cells. We made an effort to further analyze our in vivo data and the spatial localization of the monitored interneurons (see Author response image 3.). The results are from 4 different animals, in these experiments numerous L1 interneurons are active during the sensory stimulus, as shown in the scatter plot. We calculated the shortest distance between all active cells and all ɑ-actinin2+ that were active in experiments. The data suggest that in the case of identified active ɑ-actinin2+ cells, the interneuron somas were on average 182.69+60.54 or 305.135+34.324 μm distance from each other. Data from Fig. 2D indicates that the average axonal arborization of the NGFCs is reaching ~200-250μm away. Taken these two data together, in theory it is probable that the spatial localization would allow neighboring NGFCs to directly interact in the same spatial point.

      The inclusion of the diffusion-based model (MCell) is commendable and enhances the study. Also, the description of GABA-B receptor/GIRK channel activation is highly quantitative, a strength of the study. However, a general summary/synthesis of the observations would be helpful. Moreover, relating the simulation results back to the original motivation for generating the MCell model would be very helpful (i.e. the authors asked whether "linear summation was potentially a result of the locally constrained GABAB receptor - GIRK channel interaction when several presynaptic inputs converge"). Do the model results answer this question? It seems as if performing "experiments" on the model wherein local constraints are manipulated would begin to address this question. Why not use the model to provide some data – albeit theoretical – that begins to address their question?

      We re-formulated the problem to be addressed in this Results section. We admit that our model is has several limitations in the Discussion and, consequently, we restricted its application to a limited set of quantitative comparisons paired to our experimental dataset or directly related to pioneering studies on GABAB efficacy on spines vs shafts. We believe that a proper answer to the reviewer’s suggestion would be worth a separate and dedicated study with an extended set of parameters and an elaborated model.

      In sum, the authors present an important study that synthesizes many experimental (in vitro and in vivo) and computational approaches. Moreover, the authors address the important question of how synaptic responses mediated by metabotropic receptors summate. Additional insights are gleaned from the function of neurogliaform cells. Altogether, the authors should be congratulated for a sophisticated and important study.

      Reviewer #3:

      The authors of this manuscript combine electrophysiological recordings, anatomical reconstructions and simulations to characterize synapses between neurogliaform interneurons (NGFCs) and pyramidal cells in somatosensory cortex. The main novel finding is a difference in summation of GABAA versus GABAB receptor-mediated IPSPs, with a linear summation of metabotropic IPSPs in contrast to the expected sublinear summation of ionotropic GABAA IPSPs. The authors also provide a number of structural and functional details about the parameters of GABAergic transmission from NGFCs to support a simulation suggesting that sublinear summation of GABAB IPSPs results from recruitment of dendritic shaft GABAB receptors that are efficiently coupled to GIRK channels.

      I appreciate the topic and the quality of the approach, but there are underlying assumptions that leave room to question some conclusions. I also have a general concern that the authors have not experimentally addressed mechanisms underlying the linear summation of GABAB IPSPs, reducing the significance of this most interesting finding.

      1) The main novel result of broad interest is supported by nice triple recording data showing linear summation of GABAB IPSPs (Figure 4), but I was surprised this result was not explored in more depth.

      We have chosen the approach of studying GABAB-GABAB interactions through the scope of neurogliaform cells and explored how neurogliaform cells as a population might give rise to the summation properties studied with triple recordings. This was a purposeful choice admittedly neglecting other possible sources of GABAB-GABAB interactions which possibly take place during high frequency coactivation of homogeneous or heterogeneous populations of interneurons innervating the same postsynaptic cell. We agree with the reviewer that the topic of summation of GABAB IPSPs is important and in-depth mechanistic understanding requires further separate studies.

      2) To assess the effective radius of NGFC volume transmission, the authors apply quantal analysis to determine the number of functional release sites to compare with structural analysis of presynaptic boutons at various distances from PC dendrites. This is a powerful approach for analyzing the structure-function relationship of conventional synapses but I am concerned about the robustness of the results (used in subsequent simulations) when applied here because it is unclear whether volume transmission satisfies the assumptions required for quantal analysis. For example, if volume transmission is similar to spillover transmission in that it involves pooling of neurotransmitter between release sites, then the quantal amplitude may not be independent of release probability. Many relevant issues are mentioned in the discussion but some relevant assumptions about QA are not justified.

      Indeed, pooling of neurotransmitter between release sites may affect quantal amplitude, therefore we examined quantal amplitude under low release probability conditions using 0.7- 1.5 mM [Ca]o to detect postsynaptic uniqantal events initiated by neurogliaform cell activation (Author response image 7). This way we measured similar quantal current amplitudes comparing with BQA method with no significant difference (4.46±0.83 pA, n=4, P=0.8, Mann-Whitney Test).

      3) The authors might re-think the lack of GABA transporters in the model since the presence and characteristics of GATs will have a large effect on the spread of GABA in the extracellular space.

      We agree that the presence of GAT could effectively shape the GABA exposure, e.g. (Scimemi 2014). During the development of the model, we took into consideration different possibilities and solutions to create the model’s environment. To our knowledge, there is no detailed electron microscopic study that would provide ultrastructural measurements of structural elements around the NGFC release sites and postsynaptic pyramidal cell dendrites in layer 1 while preserving the extracellular space. Moreover, quantitative information is scarce about the exact localization and density of the GATs along the membrane surface of glial processes around confirmed NGFC release sites. We felt that developing a functional environment that would contain GABA transporters without possessing such information would be speculative. Furthermore, during the development of the model it became clear that incorporating thousands of differentially located GABA transporters would massively increase the processing time of single simulations including monitoring each interaction between GATs and GABA molecules, and requiring computational power calculating the diffusion of GABA molecules in the extracellular space, even if GABA molecules are far from the postsynaptic dendritic site without any interaction.

      As an admittedly simple and constrained alternative, we decided to set a decay half-life for the GABA molecules released. This approach allows us to mimic the GABA exposure time of 20-200 ms, based on experimental data (Karayannis et al 2010). In the model the GABA exposure time was 114.87 ± 2.1 ms with decay time constants of 11.52 ± 0.14 ms. After ~200 ms all the released GABA molecules disappeared from the simulation environment.

      A detailed extracellular diffusion aspect was out of the scope of our model, we were interested in investigating how the subcellular localization of receptors and channels determine the summation properties.

      4) I'm not convinced that the repetitive stimulation protocol of a single presynaptic cell shown (Figure 5) is relevant for understanding summation of converging inputs (Figure 4), particularly in light of the strong use-dependent depression of GABA release from NGFCs. It is also likely that shunting inhibition contributes to sublinear summation to a greater extent during repetitive stimulation than summation from presynaptic cells that may target different dendritic domains. The authors claim that HCN channels do not affect integration of GABAB IPSPs but one would not expect HCN channel activation from the small hyperpolarization from a relatively depolarized holding potential.

      Use-dependent synaptic depression of NGFC induced postsynaptic responses was nicely documented by Karayannis and coworkers (2010) although they investigated the GABAA component of the responses and they found that the depression is caused by the desensitization of postsynaptic GABAA receptors. We are not aware of experiments published on the short term plasticity of GABAB responses. In our experiments represented in Fig 5 we found linearity in the summation of GABAB responses up to two action potentials and sublinearity for 3 and 6 action potentials. In fact, our results show that no synaptic depression is detectable in response to paired pulses since amplitudes of the voltage responses were doubled compared to a single pulse which means that the paired pulse ratio is around 1. To verify our result, we repeated our dual recording measurements with one, two, three and four spike initiation in the presynaptic neurogliaform cell (Author response image 6). Measuring both the amplitude and the overall charge of GABAB responses we again found linear relationship among one and two spike initiation protocol.

      Author response image 6 - Integration of GABAB receptor-mediated synaptic currents (A) Representative recording of a neurogliaform synaptic inhibition on a voltage clamped pyramidal cell. Bursts of up to four action potentials were elicited in NGFCs at 100 Hz in the presence of 1 μM gabazine and 10 μM NBQX (B) Summary of normalized IPSC peak amplitudes (left) and charge (right). (C) Pharmacological separation of neurogliaform initiated inhibitory current.

    1. How can you be against faith when we take leaps of faith all the time, with friends and potential spouses and investments? Here, the meaning of the word “faith” is shifted from a spiritual belief in a creator to a risky undertaking. A common invocation of this fallacy happens in discussions of science and religion, where the word “why” may be used in equivocal ways.

      This term in specific as been seen and even critiqued in my writing. I think that this is due to the fact that may students and writers attempt to use more educational or specific words that we may not always understand how to use. By attempting to use them in new ways, we risk confusing the audience or writing an argument that has a meaning different from what we intended.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Point-by-point response:

      *Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      **SUMMARY**

      This MS tackles a largely unknown topic of vessel formation: how vessels anastomose and lumenise. The authors demonstrate that a matrix protein svep1 produced by neural tube during zebrafish embryogenesis plays a key role with blood flow to orchestrate anastomose formation. Actually in absence of this protein concomitantly with blood flow reduction results in significant decrease of lumenised DLAV segments.

      In absence of svep1 they observed an expansion of apelin positive endothelial cells connected with a defect in tip/stalk cell specification. Interestingly the phenotype is amplified by blocking the kinase activity of VEGFR2

      **MAJOR COMMENTS**

      The most solid evidence on the role of blood flow in cooperating with svep1 relies on the use of tricaine, which reduces heart contractility. Interestingly the authors report some data by using embryo lacking cardiac troponin T2. In my opinion I suggest the author to better analyze the phenotype obtained by the deletion of svep1 together a dose-dependent reduction of tnnt2. This approach is more elegant and physiologic than the use of a chemical compound. Furthermore this approach will allow to better analyze the relations ship between blood flow and the expression of svep1 in neural tube. It should be relevant to establish a sort of flow threshold required to dampen lumenisation. *

      Response: We appreciate the comment and have previously attempted to titrate the tnnt2 morpholino as published to have a graded reduction in blood flow. In our hands, this has not proved to be a robust approach, but we are willing to give it another try. In addition, we propose use alternative compounds to tricaine for blood flow reduction without affecting neural physiology. Alternatively we will use a-bungarotoxin mRNA injection to selectively affect neural activity to immobilize the embryos without effects on heart rate and blood flow (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4526548/)

      To further improve the findings here reported I suggest to analyze the expression of klf2, which is a well known mechano-sensor of blood flow in several animal species including zebrafish.

      Response: We will perform klf2 expression analysis

      It's likely that apelin is relevant in the observed phenotype. Which is the phenotype of a double mutant lacking both apl and svep1? Is there a direct influence of blood flow on apl expression?

      Response: We will investigate the double loss of function. However, double mutants would take some time, and a combination of morpholino and mutant would likely be the first and best option to answer this question in a reasonable time frame. The effect of flow on apl expression can be tested.

      Is there any suggestion that this mechanism is oprative in mammalian?

      Response: This is an interesting question and certainly relevant for follow up studies. At present, we can only speculate on a possible connection with flow, given that Svep1 mutations have recently been associated with artherosclerosis. However, whether the anastomosis defect we identify is conserved remains to be seen.

      *Reviewer #1 (Significance (Required)):

      The data here reported might represent a step forward in the field because a new mechanism is suggested.

      The interest is sufficiently broad.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      The authors demonstrated that loss of svep1 in zebrafish contributed to defective anastomosis of intersegmental vessels, in addition, such Svep1 acted synergistically with blood flow to modulate vascular network formation in the zebrafish trunk.

      **Major comments:**

      The expression of svep1 is localized in neurons of neural tube, dorsal epithelial cells (as indicated by transgenic zebrafish) and ventral somite boundary (as indicated by in situ) but is excluded from endothelial cells nor the vasculature. It remains puzzling and the authors have not addressed this very reason of how a gene that is expressed in non-vascular tissue play a crucial role in vessel anastomosis, ie DLAV, ISV lumenization, during angiogenesis. As the entire story of this svep1 is related to its function in angiogenic sprout and lumen formation of vascular tissues, it will be helpful for reader to be able to put the pieces together of how such gene may be functionally involved in such angiogenic process. Previous publication of this gene involved in lymphoangiogenesis, as in this manuscript the authors could provide more evidence of how such gene and its localized expression contribute to different tissue in the vascular system, ie DLAV, instead of the neural tube, dorsal epidermis or ventral somite boundary.*

      Response: We appreciate the wish to understand exactly how non-endothelial expression of Svep1 causes an endothelial phenotype selectively under reduced flow conditions. The very nature of this new phenotype requires analysis in vivo, and can not easily be transferred to an ex vivo assay. Therefore, selective loss of function in different cell populations is not easily available. More importantly, the interpretation of such efforts, when mosaic, are marred with issues. At this point, we feel that full molecular characterization of how Svep1 affects endothelial cells during anastomosis will require entirely new approaches and lies beyond what can be achieved in this manuscript.

      We will however attempt to clarify the findings and the potential mechanisms in the discussion.

      Another puzzling point is that tricaine is the center of the subject in this study. As the authors claim that tricaine-dependent blood flow reduction synergistically augmented the effect of svep1 deficiency. However, tricaine is known acting on neural voltage-gated sodium channels, whether svep1 function was affected by tricaine in the neural tissues and possibly its expression, the authors could provide more explanation and argument in the discussion.

      Response: As mentioned in our response to reviewer 1, we will perform additional experiments to try to clarify whether an effect of tricaine on neuronal sodium channels contributes to the phenotype.

      It is unclear on p12 "These results suggest that while svep1 loss-of-function produces a cardiac defect that enhances the effect of tricaine on reducing blood flow, svep1 has an additive effect in modulating blood vessels anastomosis" that svep1 deficiency enhances the effect of tricaine leading to reduced blood flow, however, it is not accurate to state that svep1 loss-of-function produces a cardiac defect. It is not sure if the effect of svep1 was actually neural rather than cardiovascular tissue, for example, tricaine acts on neural voltage-gated sodium channel that slowing down heart beat. Whether the authors can explore the possibility that svep1 function in neural rather than cardiovascular tissues, may be discuss why the authors think svep1 enhances the blood flow defect (tnnt2a knockdown or tricaine) on angiogenesis such as DLAV phenotype.

      Response: We will attempt to dissect potential contributions by neural effects from cardiac and flow related effects as stated above. Tnnt2 MO and alternative drugs to reduce heart function selectively will be used. We will also clarify the discussion.

      On p13, the authors stated that svep1 expression was inhibited by reduced blood flow, however, is it really the effect of reduced blood flow or caused by the chemical tricaine? If tnnt2a knockdown showed a similar phenotype, then it may be more convincing.

      Response: see above

      \*Minor comments:**

      The work on "svep1 loss-of-function and knockdown are rescued by flt1 knockdown" was beautifully done and it is very clear and convincing.

      The last two sections, "Vegfa/Vegfr signalling is necessary for ISV lumenisation maintenance and DLAV formation" and "Vegfa/Vegfr signalling inhibition exacerbates svep1 loss-of-function DLAV phenotype in reduced flow conditions" are more related to the flt1 knockdown phenotype. These 3 different sections are actually related in the sense that the rescue phenotype should be explained in the vegf signaling pathway. They are better off to discuss more cohesively about this vegf pathway that will help readers to appreciate more their work in svep1. *

      Answer: We agree and will do so.

      *Reviewer #2 (Significance (Required)):

      This manuscript of svep1 in zebrafish provides new insight in angiogenesis, particularly in development of vessel anastomosis in zebrafish embryo, is very significant in the field and readers who are interested in angiogenesis and zebrafish development, including myself.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      This manuscript reports that the secreted extra-cellular matrix protein Svep1 plays a role in vascular anastomosis during developmental angiogenesis in zebrafish. Further, the study demonstrates that flow and Svep1 modulate the vascular network in a synergistic fashion. This is a high quality manuscript presenting novel data which compellingly support the conclusions that are made. I have no suggestions for further experimentation but list minor points below.

      1. The final paragraph of Discussion is underdeveloped in that it claims regulation of phenotypic robustness in angiogenesis and its failure promises crucial insights into the mechanisms causing breakdown of vascular homeostasis in human disease. However, this issue is not pursued in any substantial way in Discussion. For example, are there known mutations in humans which lead to anastomosis defects and, if so, do any of them relate to the molecules or signaling pathways which are the subject of this manuscript? *

      Response: We agree with the wish to see more substantial discussion of the issue of phenotypic robustness and potential links to human disease. The question of anastomosis itself is something that has not been addressed in humans, as it is a rather detailed phenotype observable where predictive patterning occurs and can be dynamically studied. As such, there is a lack of literature and knowledge on signalling pathways that drive anastomosis in humans, and also not many that have been identified in experimental systems or animal models. Flt1 and Vegf signalling, junctional molecules and a few other pathways have been shown to be involved, but nothing is known so far about Svep1 and anastomosis in other system. We will attempt to complement the discussion to make this more clear.

      • There are typographical errors in the text so a further proof-read is required. *

      Response: thank you, these will be corrected

      *Reviewer #3 (Significance (Required)):

      This manuscript provides an incremental conceptual advance in our understanding of the molecular mechanisms responsible for vascular anastomosis during developmental angiogenesis. The manuscript will be of interest to developmental biologists and vascular biologists.

      My field of expertise pertains to angiogenesis and lymphangiogenesis in the setting of cancer and other diseases. *I am not a developmental biologist.

  5. May 2021
    1. Idempotency means calling a method multiple times without changing the result. The idempotent methods are required for Webhooks because a resource may be called multiple times if the network is interrupted. In this scenario, non-idempotent operations can cause significant unintended side-effects by creating additional resources or changing them unexpectedly. For businesses that rely on data, non-idempotency poses a considerable risk.

      I don't think we need this paragraph.

      We can start with -

      There could be scenarios where your endpoint might receive same webhook event multiple times. This is expected as per design and can be handled easily using x-razorpay-event-id header

      Check the value of x-razorpay-event-id in the webhook request header. The value for this header is unique per event You can cross reference on your end to identify if an event with same header is processed on your end already to avoid duplicates.

      But why do Razorpay sends same event multiple times? To avoid an event being missed, Razorpay follows at-least-once delivery semantics. In this approach, if we do not receive a successful response from your server, we resend the Webhook.

      There could be situations where your server accepts the event but fails to return a response in 5 seconds. In such cases, the session is marked timeout. It is assumed that the Webhook has not been processed and is sent again. Ensure your server is configured to handle or receive the same event details multiple times using the solution as mentioned above.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their comments, criticisms and suggestions that will help to improve the quality of our manusrcipt.

      Please find enclosed in this initial response our answer to each point raised by the reviewers.

      Please note that for several answers normally come along with an additional figure that could be added in the full revised version of the manuscript. However, these additional figures could not be added in the way we have to submit our answers but we are ready to send a pdf file including our answers with the additional figures upon request.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The paper by Genest et al. describes the effect of flotillins and sphingosine kinase 2 to stabilize AXL as a mechanism to promote epithelial-mesenchymal transition in breast (cancer) cells. The potential role of vesicles trafficking EMT-promoting proteins is of high interest in the field, also for exploring new opportunities of pharmacological targeting. However, the paper fails to convincingly demonstrate that the proposed mechanism is of real importance to support or promote EMT for the following main reasons:

      1-a) The role of flotillins is studied only by overexpression and in the context of non-cancerous MCF10A cells, while breast cancer cells of epithelial-like origin are not analyzed.

      Regarding the first part of the point raised here, we are not sure to understand correctly the sentence “[…] while breast cancer cells of epithelial-like origin are not analyzed”. Indeed, we used the breast cancer cell line MDA-MB-231 and a derived cell line that we generated by knocking down flotillin expression (MDA-MB-231shFlot2) in the second part of this study (Figure 6C, F and H and S7A, E and F). This previously characterized cell line allowed us to demonstrate that abolishing flotillin overexpression was sufficient to significantly inhibit the invasive properties of MDA-MB-231 cells (Planchon et al, J Cell Science 2018, https://doi.org/10.1242/jcs.218925

      Although flotillin upregulation induces some major mechanisms of the EMT process in MCF10A cells, flotillin downregulation was not sufficient to reverse the EMT phenotype in MDA-MB-231 cells. This could be explained by the fact that EMT is a multifactorial process and that MDA-MB-231 cells went through too many irreversible changes leading to this process. By contrast, when we analyzed EMT markers after SphK2 inhibition or knock down in MCF10AF1F2 and in MDA-MB-231 cells (Figure 6A-C), we could observe a significant decrease in ZEB1 expression.

      1-b) This is contrast with the purpose of the paper (see abstract, introduction, patients' data) which is to study tumors and EMT. Effect of shRNAs is also not reported, making it difficult to estimate the importance on the EMT phenotype.

      As we mentioned in our manuscript, previous studies by other groups who downregulated flotillin expression in different cancer cell lines using siRNA approaches or re-expression of miRNAs that inhibit flotillin expression, already showed flotillin participation in EMT (for review please see, Gauthier-Rouvière et al, Cancer Metastasis Review, 2020, **doi: 10.1007/s10555-020-09873-y).

      In this context, the novelty and the first goal of our study was to investigate how strong is the contribution of flotillin upregulation to EMT induction. To achieve this goal, we chose on purpose to use non-tumoral epithelial cells that do not harbor the anomalies already favoring EMT, unlike the cancer cell lines used in previous studies. In these non-tumoral models (the human MCF10A and mouse NMuMG mammary epithelial cell lines), we ectopically overexpressed flotillins (MCF10AF1F2 and NMuMGF1F2) to levels similar to what observed in invasive breast cancer cells. Using this approach, we found that flotillin overexpression is enough to induce EMT.

      1-c) Then, alteration of EMT should be concluded also from other non-genetic functional parameters, not just by markers. For instance: was morphology of the cells changed? Was cell migration affected with F1F2?

      Our conclusion that flotillin upregulation is sufficient to induce EMT in MCF10AF1F2 and NMuMGF1F2 cells is not based only on genetic functional parameters or markers. For instance, Figure S1 (panels H and I) shows a strong modification of the cell morphology and of the actin cytoskeleton organization in NMuMG cells upon flotillin upregulation. NMuMGF1F2 cells became flat and lost their apical F-actin belt and exhibited an increase in stress fibers.

      As shown below (Additional Figure 1), similar modifications of the cell morphology and of the F-actin cytoskeleton organization occur also when flotillins are upregulated in MCF10A cells (see below the comparison of MCF10A and MCF10AF1F2 cells) (these data could be added in the manuscript).

      ADDITIONAL FIGURE 1 CAN NOT BE ADDED BUT IS AVAILABLE UPON REQUEST

      Additional figure 1: Upregulation of flotillins in MCF10A cells leads to changes in the cell morphology and in F-actin cytoskeleton organization. Comparison of the morphology and of the actin cytoskeleton organization in MCF10AmCh and MCF10AF1F2 cells. Confluent cells were fixed and stained for F-actin (green) using Alexa488-conjugated-Phalloidin and for nuclei (blue) using Hoechst (in panel A flotillin2-mCherry signal is shown). (A) Upper panels show the maximum intensity projection images (MIP) of MCF10AmCh (control) and MCF10AF1F2 (flotillin overexpression) cells obtained from a stack of images acquired by confocal microscopy. Lower panels show magnified images from the boxed areas, including one single plane and the x-z and y-z projections along the indicated axes. (B) 3D reconstruction images obtained from the region in the boxed area from the MIP-images shown in A.

      These data show that in MCF10AF1F2 cells the apical actin belt is lost and the height of the cellular monolayer is lower compared with control MCF10AmCh cells.

      We also analyzed the migration capacity of these cells (shown in Figure 3G of the submitted manuscript). Briefly, using a Boyden chamber assay, we showed that flotillin upregulation significantly increased migration of MCF10A cells (Figure 3G). We previously demonstrated that flotillin upregulation also promotes cell invasion in 3D using a spheroid assay (Planchon et al, J Cell Science, 2018, https://doi.org/10.1242/jcs.218925**). As shown below (Additional Figure 2), using a wound healing assay, we also observed that cell velocity is higher in flotillin-overexpressing NMuMGF1F2 cells than in control NMuMG cells (this could be added to the manuscript).

      ADDITIONAL FIGURE 2 CAN NOT BE ADDED BUT IS AVAILABLE UPON REQUEST

      Additional figure 2: Upregulation of flotillins in NMuMG cells increases cell velocity in a 2D migration assay. (A) Representative images of NMuMGmCh (control) and NMuMGF1F2 cells during wound healing. The yellow dashed line indicates the leading edge of the migrating monolayer at the indicated times. The trajectory of 60 individual cells was tracked and the cell velocity and persistence of migration were extracted. The histogram shows the velocity quantification (mean ± SEM of 4 independent experiments). (B) Representative trajectories of individual cells.

      2) AXL up-regulation is not very strong (2-fold). What is unclear is if the minimal AXL increase due to F1F2 really provides a significant contribution to the EMT phenotype (as the authors conclude). The siRNA experiment knocks down all AXL, not just the F1F2-induced levels, making it difficult to estimate the real effect of the mechanism proposed.

      As shown in figure 3A and D, in MCF10AF1F2 cells compared with MCF10AmCh cells, we measured a significant 2.5 ± 0.7-fold increase in the AXL protein level. We do not think that this can be considered as a minimal increase.

      Considering that flotillin upregulation may affect simultaneously different receptors (Figure S2I, Figure S6A-F), we did not expect that downregulating a single receptor would have a major impact on the level of EMT markers and on cell migration. Yet, after knocking down AXL in MCF10AF1F2 cells, we observed a decrease in ZEB1 and N-cadherin expression and the re-expression of E-cadherin (Figure 3D-F) and the inhibition of cell migration (Figure 3G). The fact that we observed such an effect by downregulating AXL, which according to Reviewer #1 is minimally increased, might be explained by its well-known ability to act not alone but through cross-talk with other signaling receptors (Graham et al, Nature Reviews Cancer 2014; Halmos and Haura, Science Signaling 2016; Colavito et al, Journal of Oncology 2020).

      As suggested by Reviewer #1, ideally, it would be interesting to bring back AXL to its level in MCF10AmCh cells to better evaluate only the contribution of its increase. However, adjusting so precisely the efficacy of AXL downregulation by siRNA seems quite difficult to achieve.

      3) Why didn’t the author focus on EphA4 (or to a lesser extent ALK), which showed better regulation ?

      As we mentioned (page 18) “the available tools allowed us to validate this result only for AXL, but not for EphA4 and ALK”**.

      Nevertheless, for EphA4, we showed in Figure S6 that it is located in flotillin-positive late endosomes (Figure S6 A and C, for MCF10AF1F2 and NMuMGF1F2 cells, respectively) in a phosphorylated form (using an antibody against P-Y588/Y596-EphA4 that works in NMuMG cells, Figure S6D). However, the signals obtained by western blotting using the same antibody were too low to validate any significant variation of EphA4 Y-phosphorylation status, as suggested by the results from the phospho-RTK array.

      Regarding ALK, the increase in its phosphorylation, suggested by the phospho-RTK array, remains puzzling to us. By western blotting of cell lysates and in the presence of positive controls, we did not detect any positive signal for phosphorylated ALK and even for total ALK in MCF10A and MCF10AF1F2 cells. In addition, to our knowledge, ALK expression in MCF10A cells has never been reported in the literature. These observations did not encourage us to pursue our investigations on ALK.

      Moreover, several points led us to focus on AXL. Indeed, AXL expression is associated with the acquisition of a mesenchymal cell phenotype, invasive properties, and resistance to treatments and AXL is an attractive therapeutic target against which several inhibitors are in preclinical and clinical development (Shen Y et al. Life Sciences 2018). Moreover, AXL expression in tumors is attributed to post-transcriptional regulation, but the mechanisms are totally unknown. Understanding how its stabilization and signaling can be triggered by flotillin-mediated endocytic pathways is new and of high significance for the cancer field and the trafficking community.

      3) The conclusions of the manuscript are contradicted by the reported clinical data. In Figure S4 the authors clearly observe co-expression of Flotillin 1 and AXL prevalently in luminal breast cancers, which is the subtype known to not be driven by EMT. This evidence already indicates that this (otherwise interesting) mechanism is not relevant to EMT in breast cancer. So, the conclusions are not supported by the data, and the experimental setup and model chosen are not appropriate to generalize the findings to cancer.

      We acknowledge that flotillin 1/AXL co-expression is highest in the luminal subtype. If this co-expression was observed only in this particular subtype, we would have agreed that it excluded that flotillins and AXL co-overexpression may participate in EMT in tumor cells. However, our results show that flotillin 1 and AXL are co-expressed also in other subtypes that have undergone EMT. Considering this observation and the influence of flotillin upregulation on AXL overexpression we reported here, we believe that the point raised by the Reviewer is not sufficient to exclude that the co-upregulation of flotillins and AXL can participate in EMT induction in breast cancer cells.

      **Minor (here the most important):**

      4) The point of the Figure 2 is not clear. Why this part should have such a central role in the story? The entire data presented are not followed up in the rest of the paper. Moreover, in some cases upregulations also questionably significant (like RAS and STAT3 are not even 2 fold).

      Moreover, the error bars are so small that it seems unrealistic that the plots indicate three independent experiments.

      Because the activation of oncogenic signaling pathways is crucial to promote EMT, we think that analyzing these pathways in the context of flotillin upregulation is coherent with the message of the paper.

      To our knowledge, the amplitude of up- or down-regulation has nothing to do with its significance. The amplitude also depends strongly on the context (stimulation with an agonist, overexpression of GEF, etc). For instance, increases lower than 2-fold are frequently reported (Bodin and Welch, Mol Biol Cell, 2005; Miura SI et al, Arteriosclerosis, Thromb and Vasc Biology, 2003; Matsunaga-Udagawa R et al, J Bio Chem 2010)** when assessing the activity of Ras or small GTPases, but they represent real upregulations. Furthermore, Ras activation is supported by the downstream 4-fold activation of ERK that we measured (Figure 2C).

      In Figure 2, panels B, C, E, F and J, considering the amplitude of the mean increases shown, the error bars corresponding to SEM do not seem disproportionately small.

      As the Reviewer seems to insinuate that we have not performed independent experiments, we are presenting in the table below the detailed results all obtained from independent experiments.

      Panel

      Parameter measured

      Number of independent experiments

      Fold of increase value in MCF10AF1F2 cells compared with MCF10AmCh cells in each experiment

      Mean

      SEM

      p-value

      B

      Ras-GTP

      5

      1.95 ; 1.96 ; 1.18 ; 1.67 ; 1.86

      1.72

      0.14

      0.001

      C

      Phospho- ERK

      5

      1.24 ; 5.43 ; 3.22 ; 6.11 ; 3.52

      3.71

      0.73

      0.0042

      E

      Phospho-AKT

      4

      2.29 ; 6.54 ; 3.76 ; 2.6

      3.8

      0.97

      0.0276

      F

      Phospho-STAT3

      4

      1.63 ; 1.63 ; 2.42 ; 1.60

      1.82

      0.20

      0.0066

      J

      Phospho-SMAD3

      8

      4.1 ; 5.12 ; 6.29 ; 1.82 ; 2.58 ; 6.66 ; 2.82 ; 5.40

      4.35

      0.64

      0.0001

      In the legend to figure 2 panels C, E, F, J, “The histograms show […] with control MCF10AmCh **cells calculated from 4 independent experiments” was corrected by “The histograms show […] with control MCF10AmCh cells calculated from at least 4 independent experiments” as data shown in panel J were actually calculated from 8 independent experiments.

      5) More robust statistical analysis should be provided in the Figure 1 to support that EMT is suppressed with F1F2 overexpression. For instance a more standard GSEA on hallmark signatures.

      To avoid confusion, we understand that Reviewer #1 meant “… that EMT is induced with F1F2 overexpression” and not “… suppressed …”.

      As recommended by Reviewer #1, we performed a GSEA on the hallmark signature and the results are already included in the current revised version of our manuscript (figure 1C).

      6) In Figure 3 E-Cadherin is rescued with siAXL in the IF but not in the western blot.

      Using siRNA transfection, we can have a mosaic effect due to the fact that not all the cells of the sample are transfected and thus efficiently knocked down. This mosaicism was clear when we analyzed E-cadherin by immunocytochemistry. Indeed, in some cells, probably the ones that have been more efficiently transfected with the AXL siRNA, E-cadherin expression is clearly seen. By western blotting, which provides a global analysis in which transfected and non-transfected cells are mixed, this was not significantly higher than in MCF10AF1F2 cells transfected with a control siRNA, although there was a trend towards increased E-cadherin expression in MCF10AF1F2 transfected with the AXL siRNA.

      For the revised version of our manuscript we will try to improve the efficacy of the AXL siRNA and test whether we can fully rescue E-cadherin expression. The corresponding panel could be modified according to the data we will obtain.

      7) Some sentences require clarifications. The authors should be more clear on why ZEB2 antibody was not available or what they mean with "Unfortunately the available tools..".

      Page 7: we wrote «no anti-Zeb2 antibody is available». We should have said: «none of the anti-Zeb2 antibodies tested worked in MCF10A cells». We decided to remove “no anti-Zeb2 antibody is available” from the sentence to avoid confusion in the revised version of our manuscript.

      Page 19: we wrote «unfortunately the available tools» to refer **the available tools against EphA4 and ALK that did not allow us to validate the data obtained using the phospho-RTK array showing that the Y-phosphorylation of these two RTK is increased in cells with upregulated flotillins. (see also our answer to major point 2).

      8) Western blot from the CHX experiment should be shown, at least in the supplements. Again, the standard deviation in this experiment is minimal, was this really an average of three independent experiments (and not three western on the same lysates)?

      As asked, a representative western blot is now shown in Figure 3C in the current revised version of the manuscript.

      As indicated in the legend to the figure already in the initial version of our manuscript: “**The results are the mean ± SEM of 6 to 8 independent experiments depending on the time point, and are expressed as the percentage of AXL level at T0”. We wish to reassure Reviewer#1 that the results are really based on western blots performed on different lysates obtained in independent experiments. We can show the Reviewer these data obtained from independent experiments if necessary.

      9) All conclusions are derived from one single cells MCF10a. NMuMG cells are shown at the beginning but not used for the rest of the paper. Anyway, if this wants to be a cancer research paper, then cancer cells needs to be used.

      It is true that we did not use a cancer cell line at the beginning of the paper because, as expected, flotillin knock-down did not allow to revert the mesenchymal phenotype of MDA-MB-231 cells toward an epithelial one. If this had been obtained, we would have used these cells from the beginning of the paper. The lack of reversion of the mesenchymal phenotype after flotillin knock-down was expected. Indeed, the EMT process is multifactorial and the decrease of flotillins alone is obviously not sufficient to reverse it in a tumor cell line bearing multiple oncogenic mutations. Moreover, because we wanted to assess whether flotillin upregulation is sufficient in normal cells to acquire the properties of tumor cells and particularly to induce EMT, we used human MCF10A and murine NMuMG cells, two non-tumoral epithelial cell lines. Until now, the studies carried out on the effects of flotillin overexpression have used tumor cells that already harbor pro-oncogenic perturbations, preventing to show that flotillin overexpression alone activates oncogenic processes leading to EMT, and to identify the downstream mechanisms.

      Nevertheless, we have used the MDA-MB-231 cell line in several experiments to analyze: i) AXL distribution and internalization following the knock-down of flotillins (Figures 4 and S5), ii) SphK2 and flotillin 2 co-localization and co-endocytosis (Figures 5A and D and S7A), iii) the impact of SphK2 inhibition on AXL expression level distribution and endocytosis (Figure 6), iv) SphK2 expression level upon flotillin knock-down (Figure S7E) and AXL expression level upon SphK1 inhibition (Figure S7F). With these experiments performed in MDA-MB-231 cells, we showed that AXL and SphK2 colocalize in flotillin-positive late endosomes and are co-endocytosed from the plasma membrane containing flotillin-rich domains to flotillin-positive vesicles. We also demonstrated that flotillins and SphK2 control the rate of AXL endocytosis and its stabilization.

      We recently obtained additional data with HS578T cells, another triple negative breast cancer cell line, on the co-trafficking of AXL and flotillins as well as the co-trafficking of SphK2 and flotillins (Additional Figure 3, this data could be added in the fully revised version of our manuscript).

      In addition, we observed that inhibiting SphK2 also decreased the level of AXL in HS578T cells. This data could be added in the revised version of the manuscript (see data in our answer to Point #1 from Reviewer #3).

      • ADDITIONAL FIGURE 3 CAN NOT BE ADDED BUT IS AVAILABLE UPON REQUEST*

      Additional figure 3: Co-trafficking of SphK2 and AXL with flotillin 1 in intracellular vesicles in HS578T cells. HS578T cells co-expressing Flot1-mCherry with SphK2-GFP (A) or AXL-GFP (B) were monitored by time lapse spinning disk confocal video-microscopy. On the right of each panel are shown still images at different time points (min) of the boxed area. The colored arrows allow following three distinct vesicles that are positive for Flot1-mCherry and Sphk2-GFP, or AXL-GFP.

      10) The methods section contains inconsistent data about patients' samples (9 are indicated, but the Figure S4 features 37). Then, where those other 527 come from?

      We corrected the manuscript and added all characteristics regarding the 37 patients in the “Supplementary information” section.

      The 527 patients are from another cohort and were used for the analysis of the correlation between the mRNA levels of FLOT1 and p63 in breast cancer biopsies from 527 patients (Figure 2I). This cohort was described in our previous study (Planchon et al. J Cell Science 2018, https://doi.org/10.1242/jcs.218925). In the revised version of our manuscript, we now refer to this previous article in the “Result” section and in the legend to figure 2I to explain the origin and characteristics of this cohort.

      11) Some figures do not match with the legends or with the description in the text. It has not been easy to review this paper.

      We apologize as we indeed made one mistake in figure 2 that was inserted into the manuscript and that was actually figure S2 (that appeared twice). However, the correct figure 2 was uploaded on the website of Review Commons and BioRxiv. Regarding the comments made in point 4, it seems that Reviewer #1 examined the correct figure 2 that was uploaded and that matches the legend indicated in the manuscript.

      Besides this mistake, we do not see any other mismatch between figures and legends.

      Reviewer #1 (Significance (Required)):

      I am a cancer biologist working on EMT.

      **Referee Cross-commenting** I have nothing to comment on other's reviews.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): Genest and co-authors present in this paper new fascinating evidence on how intracellular trafficking can modulate oncogenic signalling.

      First of all, they show how overexpression of Flotillin1 and 2 in non-cancerous breast lines can induce a strong reprogramming towards an EMT phenotype. They analyse mRNA and protein expression, intracellular distribution of activated proteins, cell phenotypes to demonstrate a strong activation of oncogenic signalling pathways. They then identify AXL as a key player in this process and show how this protein is stabilised upon Flotillin expression. The authors use an amazing variety of approaches to study the endocytosis and the trafficking of endogenous, GFP-tagged, Halo-tagged and Myc-tagged AXL in different cell lines and their data are strong and very convincing, the images are of very high quality and the analysis rigorous. Their data strongly support the hypothesis that high Flotillin levels triggers AXL endocytosis and accumulation in non-degradative late endosomes where signalling remains active. The authors then show how SphK2 has a key role in AXL stabilisation, it colocalises with Flotillin, AXL and CD63 and its activity (which they block by using inhibitors or siRNA) is necessary for flotillin-induced AXL stabilisation and EMT induction. The paper is extremely well written, the data flow logically and they are appropriately presented and analysed. I don't have any major comment and I believe the paper is suitable for publication.

      We thank the Reviewer for the positive appreciation on our manuscript.

      I have only some minor comments/questions: 1) did the authors try to colocalise AXL with endogenous Flotillin in MDA-MB-231 cells? They could use the antibodies used in Fig S1B. Of note, the authors have shown it in luminal tumours in Fig S4C.

      We performed co-immunofuorescence experiments to detect endogenous AXL with endogenous Flotillin in MDA-MB-231 cells. As shown below (Additional Figure 4), we could find AXL and Flotillin being present in the same intracellular endosomes. Images could be added in the revised version of the manuscript.

      ADDITIONAL FIGURE 4 CAN NOT BE ADDED BUT IS AVAILABLE UPON REQUEST

      Additional figure 4: Endogenous AXL and flotillin 1 are found in the same in intracellular vesicles in MDA-MB-231 cells. MDA-MB-231 cells were fixed and labelled with relevant antibodies directed against Flotillin1 and AXL. Scale bar in the main image : 10 µm. Scale bars in the magnified images from the boxed area : 1 µm. Arrows indicate flotillin and AXL positives vesicles

      2) In Fig6G, it appears that AXL-Flotillin colocalization is lost upon SphK2 inhibition. Is this the case? It could be that the correct lipids are necessary for the formation of Flotillin-positive internalisation domains and this could be very interesting and reinforce the model proposed in the paper.

      In figure 6G, cells were not permeabilized. Thus, only AXL at the cell surface was labelled using an antibody against the extracellular domain of AXL. Because flotillin 2 is tagged with mCherry, this allowed its visualization revealing its localization both at the cell surface and intracellularly in the inset of the lower pane l of figure 6G.

      After 6 hours of treatment using the opaganib inhibitor, we did not notice any major change in AXL-flotillin colocalization at the cell surface. Somehow, this is expected because blocking the generation of S1P is more likely to inhibit the invagination of flotillin-rich membrane microdomains rather than their formation.

      3) I would remove the sentence on line 995-997 "to our knowledge this is the first report to describe ligand-independent AXL stabilization..." as the cells are not serum starved in all experiments and animal serum can contain variable amounts of the ligand GAS6.

      We understand and agree with Reviewer #2, this sentence has been modified by “**To our knowledge this is the first report to describe AXL stabilization following its endocytosis”

      Please note that the authors don't have to necessarily address comments 1-2, their paper is already very rich in convincing data.

      Reviewer #2 (Significance (Required)):

      AXL is a major oncogene that promotes EMT in a variety of tumour types. Understanding how its signalling can be triggered by endocytic pathways even in cells that are non-cancerous is very important and of high significance for the cancer field and the trafficking community.


      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      This is an interesting and well written paper describing that upregulated flotillin promotes an endocytic pathway called upregulated flotillins-induced trafficking (UFIT) that mediates AXL endocytosis and allows its stabilization. Consequently, stabilized AXL in these flotillin-positive late endosomes enhances activation of oncogenic signaling pathways that promotes EMT. The authors suggest that Flotillin upregulation-induced AXL stabilization requires the activity of SphK2. However, this latter point is not supported by the data and further studies are needed to support this important conclusion.

      **Major concerns:**

      1. Most of the conclusions are based on effects of high concentrations (50 uM) of an ill-defined SphK2 inhibitor. The experiment described in Figure 6C-H need to be confirmed by downregulation of SphK2.

      We understand that Reviewer #3 is concerned that in our experimental conditions, the effects we observed could be really explained by a specific inhibition of SphK2.

      From the literature, among all the inhibitors described for SphK2, opaganib (ABC294640) is the most specific inhibitor available. It was shown to have no inhibitory effect on SphK1 up to 100 µM (French et al, J Pharmacol Experimental Exp Ther 2010; Neubauer HA and Pitson SM, The FEBS Journal 2013). In agreement, we found that PF543, the most specific SphK1 inhibitor, had no effect on AXL level (Figure S7F), unlike incubation with opaganib (Figure 6A and C), and that was confirmed in MCF10AF1F2 cells by the knock down of SphK2 with a specific siRNA (Figure 6B).

      In the literature, depending on the cell lines, opaganib is used in vitro in the 10 to 60 µM range. Opaganib IC50 on recombinant SphK2 was established at 60 µM (French et al, J Pharmacol Experimental Exp Ther 2010). In our experiments, opaganib was used at a concentration of 50 µM, below the IC50 value, as previously done by Nichols’ group (Riento and al, PloS ONE, 2018). In most of our experiments (Figure 6, A, D, E-I, Figure S7D), opaganib was added for a maximum of 10 hours, which is shorter compared to what done in other studies (24-48 hours). Furthermore, it was shown that an opaganib concentration of 50 µM does not have any inhibitory effect in vitro on 20 protein kinases tested, including PKA, PKB, PKC, CDK, MAP-K, PDK1 and Src (French et al, J Pharmacol Experimental Exp Ther 2010).

      In addition to inhibit SphK2, acting in a sphingosine-competitive manner, opaganib also was shown to act as an antagonist of estrogen receptor (ER), and inhibits ER-positive breast cancer tumor formation in vivo (Antoon JW et al, Endocrinology 2010). If Reviewer #3 is concerned about the possibility that the opaganib downstream effects we observed in our study might be explained by ER inhibition, we remind that we used cellular models that do not express ER. Indeed, the MDA-MB-231 cell line is a triple negative breast cancer cell line. MCF10A cells also do not express ER (Lane MA et al, Oncolgy Report, 1999,)** and our transcriptomic analysis (Table S1) did not reveal any increase in the expression of ER genes in MCF10AF1F2 cells in which flotillins are upregulated, thus eliminating a possible non-specific effect of opaganib in these cells.

      In conclusion, we hope that these arguments help to convince Reviewer #3 that our experiments were performed in conditions where we carefully limited the possibility of opaganib off-target effects, on the basis of the currently available opaganib-related data from the literature.

      We totally agree with Reviewer #3 that complementary experiments by downregulating SphK2 must be used. In agreement, we already downregulated SphK2 by siRNA in MCF10AF1F2 cells. This led to a significant decrease in AXL and ZEB1 expression. In the current revised version of the manuscript we have added data obtained with similar siRNA experiments performed in MDA-MB-231 cells (now Figure 6C). In agreement, we observed AXL and ZEB1 downregulation.

      As shown below (Additional Figure 5) we recently obtained similar data in HS578T cells, showing that inhibiting SphK2 also affects AXL protein level in this triple negative breast cancer cell line (these data could be added in the manuscript).

      ADDITIONAL FIGURE 5 CAN NOT BE ADDED BUT IS AVAILABLE UPON REQUEST

      Additional figure 5: SphK2 inhibition decreases AXL level in HS578T cells. HS578T cells were incubated with opaganib (50µM, 10 hours) (A) or with siRNA Ctrl or siRNA SphK2 for 72 hours (B). Cell lysates were blotted with relevant antibodies against AXL, SphK2 and actin. The histograms show AXL level (normalized to actin) expressed as fold-increase compared with the control condition, and data are the mean ± SEM of 3 (A) and 4 (B) independent experiments.

      Reviewer #3 also asks to use the siRNA approach on experiments shown in previous panels D-H (now panels E-I) of figure 6.

      In complement to Figure 6D (now Figure 6E), experiments using a siRNA against SphK2 to show that “**AXL decrease upon SphK2 inhibition is not due to protein synthesis inhibition” are on-going and the obtained data could be added in the full revised version of our manuscript.

      However, we are unfavorable to use a siRNA against SphK2, in addition to opaganib, in the experiments done to measure the effect of SphK2 inhibition on the rate of AXL internalization (previously in Figure 6E and F, now Figure 6F and G) and the level of AXL at the cell surface (previously in Figure 6G and H, now Figure 6H and I). Indeed, we carefully chose a short (4 hours) incubation with opaganib at the end of which the total cellular level of AXL was not yet decreased, allowing to measure unambiguously a defect in AXL endocytosis or a change in the level of AXL at the cell surface. We believe that it would be very difficult to achieve similar experiments using a siRNA against SphK2. It would require to determine the exact time after siRNA transfection leading to a sufficient SphK2 level reduction but in conditions where AXL level is still maintained. We think that due to the mosaic transfection efficiency, being able to precisely synchronize the effect of a siRNA at its beginning is impossible.

      1. Does overexpression of SphK2 reverse the effects of the SphK2 inhibitor? In a similar manner, does overexpression of SphK2 enhance stabilization of AXL?

      To answer the first question, it is not clear for us how to test whether SphK2 overexpression can reverse the effects of the SphK2 inhibitor because the ectopically expressed SphK2 would also be sensitive to the inhibitor. This would require to overexpress a SphK2 mutant that is catalytically active but insensitive to the inhibitor, and to our knowledge, such a mutant does not exist.

      Regarding the second question, we are currently generating a retroviral DNA construct allowing to overexpress SphK2 homogeneously in the cell population. Then we will test whether it further increases AXL level through its stabilization. This will be tested in cells upregulated for flotillin. As we showed in Figure 6 A and D (previously Figure 6 A and C) that AXL level depends on SphK2 activity only in cells that overexpress flotillins, we anticipate that there will be no impact in a cell line with a moderate level of flotillin. Results could be added in the fully revised manuscript.

      1. Although the authors suggest recruitment of SphK2 and formation of S1P in UFIT, there are no measurements of S1P. Also, there is no indication that SphK2 is activated despite the fact that ERK and AKT are activated in UFIT and are known to phosphorylate and activate SphK2. Is SphK2 that is recruited to flotillin phosphorylated?

      To answer the first point raised by Reviewer#3, we recently performed, in collaboration with a lipidomic platform, a comparative analysis by quantitative mass-spectrometry of S1P levels between MCF10AmCh and MCF10AF1F2 cells. As we anticipated, the results show a 3,5-fold increase in S1P in MCF10AF1F2 cells compared with MCF10AmCh (Additional Figure 6). This data agrees with the fact that we found that the SphK2 catalytic activity is required for the UFIT pathway mediated AXL stabilization. This result is also in agreement with the study from the Nichols’ group which detect a decrease in S1P in cells in which flotillins were knocked out (Riento et al, PloS ONE, 2018). The results regarding the analysis of S1P level along with the complete methodology used will be added in the fully revised version of our manuscript.

      ADDITIONAL FIGURE 6 CAN NOT BE ADDED BUT IS AVAILABLE UPON REQUEST

      Additional figure 6: Upregulation of flotillins in MCF10A cells promotes an increase in the level of Sphingosine-1-phosphate. The level of sphingosine-1-phosphate was compared by quantitative mass-spectrometry analysis from three independent samples of MCF10AmCh and MCF10AF1F2 cells. The results are expressed in pmol equiv / 1 . 106 cells. The graph shows the value for each sample and the bar horizontal bars indicate the mean value for each condition.

      Regarding the second point, we would like to clarify that we do not think that SphK2 interacts directly or indirectly with flotillins because SphK2 did not co-immunoprecipitate with flotillins (not shown). Thus, investigating by western blotting SphK2 phosphorylation status in flotillin immunoprecipitates is pointless. In theory, we could investigate the activity-related phosphorylation status of SphK2 associated with flotillin rich-membranes and endosomes. But this seems difficult to achieve because unfortunately, the only two commercially available antibodies against phosphorylated SphK2 are not described to work for immunofluorescence staining. One is against the Thr578 residue (https://www.abcam.com/sphk2-phospho-t578-antibody-ab215750.html), identified as phosphorylated downstream of ERK by Sarah Spiegel’s group (Hait et al, J Biol Chem, 2007). The second is designed to recognize specifically the phospho-Thr614 residue (https://www.abcam.com/sphk2-phospho-t614-antibody-ab111948.html), but this site has not been rigorously demonstrated to be phosphorylated downstream of AKT or ERK or to stimulate SphK2 activity. Thus, considering the lack of appropriate tools and considering that we already showed, using opaganib, that the catalytic activity of SphK2 is required for the UFIT pathway, we believe that investigating the phosphorylation status of SphK2 reflecting its activity in flotillin-positive vesicles will be complicated to achieve in a reasonable amount of time and we think that it will not bring a higher value to our present study.

      To answer more broadly to the question “Is SphK2 recruited to flotillin phosphorylated?”, we anticipate that it could be the case at least on the Ser419 and Ser420 residues because Nakamura’s group demonstrated that the phosphorylation of these sites favors the nuclear export of SphK2 (Ding G et al, J Biol Chem, 2007). This group developed an antibody against these phospho-sites, potentially working by immunofluorescence. However, as it is unknown whether phosphorylation of these residues influences SphK2 activation status, we do not plan to perform immunofluorescence experiments with this tool (not available commercially) because the results would not address the Reviewer’s question.

      1. It should be determined whether the optogenetic system used to induce flotillin oligomerization also induces recruitment and activation of SphK2.

      As we already have all the available tools, optogenetic experiments will be performed to answer this point and the results could be added to the fully revised version of our manuscript.

      As suggested, we plan to perform experiments in which exogenous S1P will be added to cells with a moderate flotillin expression level to check whether it could recapitulate the effect of flotillin upregulation on AXL expression. Results could be added to the fully revised version of the manuscript.

      However, our current results on the localization and the involvement of SphK2 suggest that the generation of S1P involved in the UFIT pathway occurs at the plasma membrane and in late endosomes. Because the exogenous S1P that will be added in the culture medium will not go through the plasma membrane, we anticipate that it could be insufficient to mimic all the mechanisms of the UFIT pathway. Its effect will be limited to the plasma membrane. In addition, these mechanisms are very likely based on a local concentration of S1P in some microdomains (at the plasma membrane and in intracellular membranes) scaffolded by flotillins. It will be very difficult to mimic such local concentration of S1P just by adding S1P to the cells.

      We agree that identifying the S1P receptors involved would be of valuable interest for a better characterization of the UFIT pathway. However, we think that this is beyond the scope of our present study. Among the five known S1P receptors, we do not know if any could be involved in membrane remodeling at the plasma membrane to promote endocytosis. To our knowledge, involvement of S1P receptors in endocytosis has never been reported. However, based on the work by Nakamura’s group (Kajimoto et al, Nat Comm, 2013 and Kajimoto et al, J Biol Chem, 2018), the S1P1 and S1P3 receptors are involved in membrane remodeling and cargo sorting from the outer membrane of late endosomes (where flotillins accumulate in our cell models). We could hypothesize that these receptors are influenced by flotillins and are involved in the UFIT pathway. But we think that testing this hypothesis would be the subject of a distinct study.

      At the plasma membrane, we totally agree that the effect of S1P could be mediated, as suggested by De Camilli’s group (Shen et al, Nat Cell Biol 2014), by the formation of tubular endocytic structure rich in sphingosine after acute cholesterol extraction. Reciprocally, in our cell models, upregulated flotillins, thanks to their ability to bind to sphingosine (demonstrated by Nichols’ group (Riento et al, PloS ONE, 2018)) and to oligomerize, could create sphingosine-rich membrane regions.

      1. There is a commercial antibody for endogenous SphK2 that can be used to validate and substantiate the data with GFP-SphK2. (F1000Res . 2016 Dec 6;5:2825. doi: 10.12688/f1000research.10336.2. eCollection 2016. Validation of commercially available sphingosine kinase 2 antibodies for use in immunoblotting, immunoprecipitation and immunofluorescence)

      We thank Reviewer #3 for this suggestion and advice. Being able to detect the localization of endogenous SphK2 in late endosome would be valuable for our study. We already tried with no success with antibodies from Sigma and Cell Signaling Technology (not described to work in immunofluorescence experiments).

      We will follow the advice from Reviewer #3 and test the anti-SphK2 antibody from ECM-Biosciences mentioned in the article by Neubauer and Pitson F1000 research, 2016. If we obtain interesting results, they will be included in the revised version of our manuscript.

      However, in experiments using SphK2-GFP, we noticed that in live cells, the signal in late endosomes was completely lost after fixation using paraformaldehyde. Similarly, we also observed in live cells that NBD-Sphingosine, added in the culture medium, quickly accumulated in flotillin-positive late endosomes (Additional Figure 7, this data could be added in the fully revised version of the manuscript), but this accumulation was no longer detectable after fixation. Based on these observations, we believe that SphK2 recruitment to flotillin-positive late endosomes is highly labile probably because it mainly involves its interaction with sphingosine molecules that are enriched in these intracellular compartments. This is supported by our observation that addition of opaganib, characterized as a sphingosine competitive inhibitor, displaces SphK2-GFP from flotillin-positive late endosomes in live cells (Figure S7D). In addition, we showed that SphK2-Halo is more recruited in CD63-positive late endosomes in cells overexpressing flotillins (Figure 5E). This could be due to a higher concentration of sphingosine promoted by flotillins (that bind to sphingosine) accumulating in these compartments.

      Thus, we will try the immunofluorescence staining of endogenous SphK2 using the recommended antibody, but it might be difficult to detect its presence in flotillin-rich late endosomes in fixed cells. The data could be added in the fully revised version of the manuscript.

      ADDITIONAL FIGURE 7 CAN NOT BE ADDED BUT IS AVAILABLE UPON REQUEST

      Additional figure 7: Visualization of NBD-sphingosine in flotillin-positive late endosomes. Live HS578T, MDA-MB-231 and MCF10AF1F2 cells expressing Flot1-mCherry were monitored by time lapse spinning disk confocal video-microscopy, 5 min after addition of fluorescent NBD-Sphingosine in the culture medium. On the right are shown still images corresponding to the boxed areas to illustrate the accumulation of NBD-sphingosine in virtually all flotillin-positive endosomes.

      Reviewer #3 (Significance (Required)): This is an interesting paper. If the authors confirm the involvement of Sphk2 and mechanism of action of S1P, this would be an important contribution to the field.

      Modifications done in the initial revised-version of our manuscript (at the time of the initial response). A full revised version will be provided after all the additional experiments asked by all the Reviewers will be achieved.

      Revisions are highlighted in grey in the initial revised-version of the manuscript

      1) Figure 1 has been modified and now includes results from a GSEA analysis as recommended by Reviewer #1. The texts of the corresponding legend and of the “Results” and “Methods” sections have been modified accordingly.

      1) The Figure 2 version that was inserted in the manuscript was wrong because it was a copy of Figure S2. However, the correct Figure 2 was uploaded to the Review Commons website and accessible for the Reviewers. The correct Figure 2 is now inserted in the manuscript.

      2) In the legend to panels C, E, F, J of Figure 2, the sentence: “The histograms show […] with control MCF10AmCh cells calculated from 4 independent experiments” was corrected to “The histograms show […] with control MCF10AmCh cells calculated from at least 4 independent experiments” because data shown in panel J are actually calculated from 8 independent experiments.

      3) Figure 6 has been modified with the addition of panel C showing the effect of SphK2 downregulation by siRNA on AXL and ZEB1 level in MDA-MB-231 cells. The text has been modified accordingly.

      4) In Figure 3 C, representative western blots have been added as asked by Reviewer #1.

      5) In the Supplementary information section, the full clinicopathological characteristics of only 9 patients were indicated, whereas Figure S4 mentioned 37 patients. We corrected this mistake and now provide the characteristics of all patients.

      6) In the sentence “Conversely, it induced ZEB 1 and 2 mRNA expression (Figures 1H and S1K) and ZEB1 protein expression (Figures 1I and S1L) (no anti-ZEB2 antibody is available)”, we removed “no anti-ZEB2 antibody is available”.

      7) The sentence previously on line 995-997 "to our knowledge this is the first report to describe ligand-independent AXL stabilization..." has been modified to “**To our knowledge this is the first report to describe AXL stabilization following its endocytosis”

      8) We are now referring to reference 18 (Planchon et al. J Cell Science, 2018) for the description of the cohort of 527 patients with breast cancer because this was missing.

    1. This will seem little to you with your strong practical sense for it takes fifty years for a poet’s weapons to influence the issue.”

      This reminds me of some of the influential poets and writers I admire, and their own perspective on social activism and global change. it makes me think of how we write things in hopes of inspiring change in people and to spark a fire of rebellion in certain cases . And yet by the time a piece of literature has made its way around the world, the actions of those who hold the same beliefs yet were more keen to pursue them through a practical sense have already made some kind of change. I think literature is meant to aid people as a whole- for generations to come- and I think what makes a piece of writing so strong is that it still holds meaning no matter what time you are in and that it captures the human existence.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the three reviewers for their helpful and valuable comments. We plan to address their criticisms in a revised manuscript and hope that our manuscript will then be significantly improved.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The authors have presented a very interesting and compelling set of data regarding the impact of conditional deletion of the only known pathway allowing the uptake of pyruvate into mitochondria. The paper comprises two interwoven stories that are both important. The first is the remarkable finding that the majority of excitatory neurons in the cortex (i.e. those under the influence of the CaMKII promoter) show remarkable metabolic flexibility as they tolerate elimination of pyruvate oxidation, considered the major supplier of ATP in neurons. The data on this seem clear although the authors did not delve into the potential mechanisms of metabolic compensation that likely occurs. Instead they examined whether there was some mal-adaptive compensation and they found clear evidence of this: in the absence of MPC activity the mice are much more prone to epileptic seizures, unveiled experimentally by relatively standard protocols (kindling). The authors present largely very convincing evidence that this mal-adaptive compensation in turn ends up decreasing the activity of KV7.2/7.3 channels whose job is normally to limit runaway repetitive firing by mediating an hyperpolarizing K+ efflux following an action potential. This channel, put on the map as it was one of the downstream targets modulated by cholinergic metabotropic activation, is also know known to be controlled by Calmodulin and therefore cytosolic Ca levels. Overall, I think at its core this manuscript is interesting and important. There however several weaknesses, I fear, will diminish the impact on the eventual readership. If these points can be addressed, it will strengthen the longevity of these findings:

      1) It is puzzling why the authors resorted to using shRNA-mediated KD of MPC1 for some of the in vitro studies when they have gone to the trouble of making a floxed CRE-dependent mouse. Primary cells (e.g. Fig 1) or organotypic cultures (Fig. 6) from these mice would have made a more consistent set of starting conditions to compare data across the manuscript. As there viruses expressing the CRE recombinase are widely available this could have been used on mice simply harboring the floxed gene it they are worried about waiting for the expression of the CaMKII promoter for in-vitro conditions.

      This is indeed a good point. Indeed initially, when we started these experiments, we tried to use viruses expressing the CRE recombinase in cultured neurons from mice harboring the floxed gene as proposed by the reviewer. However, for reasons that we do not fully understand, the use of AAVs or lentiviruses expressing the CRE was found to be deleterious for the cultured neurons. In view of this toxicity we tried using TAT-CRE recombinase, a recombinant cell-permeant fusion recombinase, which we added directly to the medium. However, this strategy proved to be poorly efficient. We finally used cultures of Cre-floxed neurons in which we tried to knockout MPC1 gene using 4-hydroxytamoxifen in the culture medium. However, we did not obtain satisfying results because, as previously reported, cortical neurons grow poorly in the presence of 4-hydroxytamoxifen (Nichols et al., Cell Death and Disease, 2018. https://doi.org/10.1038/s41419-018-0607-9). For these reasons we turned to the shRNA strategy and to the use of 3 small molecule inhibitors of the MPC each with different chemical structures. Both the RNA interference and the pharmacological approaches gave similar results, reinforcing our confidence in the specificity of the results, and the unlikelihood of off-target effects.

      2) The data in Figure 5 gets a little less convincing as using extracellular glutamate to drive Ca elevations is so non-physiological that the results might really be distorted by the participation of something irrelevant to the story, even though it supports the overall interpretation for a role of Ca/CaM in the control of the channel. Similarly, the use of RU360 should be done with caution. The drug, although a useful antagonist of MCU in purified mitochondria, is famously finicky with respect to its ability to cross membranes and could well have off target impact. A much cleaner experiment would be to suppress the expression of MCU via KD. Presumably in the MPC-deficient neurons, this would have minimal impact on Ca signals. Given the frequent ambiguity associated with interpreting pharmacological results, coupled to the central importance of this finding in interpreting the entire paper, I think carrying out experiments with molecular genetic manipulation of MCU is warranted.

      The main point of this figure is to study the capacity of MPC1 KO neurons to handle intracellular calcium increase and to regulate calcium homeostasis. To this end, we used strategies described to acutely increase cytosolic calcium, either through membrane depolarization with KCl (Rienecker et al., ASN Neuro. 2020. https://doi.org/10.1177/1759091420974807) or through activation of glutamate receptors using glutamate (For example see Wong, Vis Neurosci, 1995 : DOI: 10.1017/s0952523800009469). It is important to mention that the concentration of glutamate used in our experiments (10 microM for 2 min) is well below the concentration normally used to induce excitotoxicity (100-500 microM for 30min). The fact that both stimulations provided similar results and clearly indicated a defect in the clearance of cytosolic calcium in MPC-deficent neurons.

      Regarding the concern with RU360, we are aware of the problems with plasma membrane permeability associated with this compound, and for this reason we included a membrane permeabilizer (0.02% pluronic acid) to facilitate its entry into the cell. This was indicated in the Material and Methods section (line 585) as well as in the figure legend (line 948). In order to clarify this methodology, we will add this information in the main text. It should be noted that this concern would not apply to the electrophysiogical experiments, since in this case the compound was injected directly into the cell. We would like to add that we chose to inhibit the MCU using a chemical inhibitor rather than a shRNA because of the well known difficulty in obtaining a complete loss of function of the MCU using RNA interference (Nichols et al., Cell Death and Disease, 2018. https://doi.org/10.1038/s41419-018-0607-9). Nevertheless, as recommended by the reviewer, we will attempt to downregulate the expression of MCU using shRNA.

      3) The authors have not really made clear in this paper whether the ability to suppress the phenotype of the MPC deficiency with ketones is really related to a providing TCA cycle support or instead a pharmacological impact on non-TCA related targets (such as the Kv7.2/7.3 channels). Presumably the use of other ketones might circumvent this. The action of ketone bodies has been a topic of considerable interest in neuroscience, given the clinical relevance for childhood epilepsies. Previous studies for example have argued for direct inhibition of the vesicular glutamate transporter (Juge et al. Neuron 2010). The use of other ketones (acetoacetate) would narrow down the interpretations of the data.

      Our results point to 2 two possible mechanisms of ketone bodies: i) providing acetyl-CoA to the Krebs cycle, thereby stimulating OXPHOS and ii) direct action of 3-beta hydroxybutyrate on the activity of Kv7/7.3 channels. The reviewer is asking whether, in addition to 3-beta hydroxybutyrate, other ketone bodies, acetone or acetoacetate, may display antiepileptic activity, which would probably indicate that providing substrates to the TCA cycle is sufficient to prevent neuron-intrinsic hyperactivity and seizures. We agree that this in an interesting question and we will now test the effect of acetoacetate on PTZ-induced seizures in MPC KO mice.

      **other**

      1) In vitro - scramble controls only serve to demonstrate there is no general effect of treating cells with shRNAs, but do not address if there is an off-target effect. The most convincing thing here would be to have an shRNA-insensitive variant that rescues.

      We have used 2 different shRNAs and 3 chemically unrelated inhibitors of the MPC and in all cases we obtained similar results. Therefore, we think that it is unlikely that the effects we observe are due to an off-target activity. The experiment proposed by the reviewer is interesting but extremely difficult. The idea would be to reintroduce a shRNA-insensitive MPC1 into MPC1-deficient neurons treated with shRNA. This is difficult as it is known that the expression level of MPC1 needs to be matched to that of MPC2, otherwise it leads to depolarization of the mitochondria. Obtaining the right level of MPC1 would be extremely difficult to achieve in practice.

      2) Does rescuing CaMK binding to KCNQ channels rescue the phenotypes?

      The question raised by the Reviewer implies that CaM is not constitutively bound to KCNQ channels, which is a matter of debate. As we pointed out in the discussion, ‘Intracellular calcium decreases CaM-mediated KCNQ channel activity (32, 36) by detaching CaM from the channel or by inducing changes in configuration of the calmodulin-KCNQ channel complex (36).’ The CaM-KCNQ tethering is also described in a review by Alaimo and Villaroel, 2018 (doi:10.3390/biom80300579): ‘[…] CaM was first defined as an integral subunit constitutively tethered to the C-terminal region of Kv7.2/3 channels since Kv7.2 mutants that were deficient in CaM binding were unable to generate measurable currents [5,21]. However, this model has been questioned since Kv7.2 channels, carrying a hB mutation [40] or Kv7.4 hA mutated channels [41] that do not bind CaM, can still reach the plasma membrane and are functional.’

      When considering to manipulate CaM binding to KCNQ, it should also be considered that previous studies on this matter have mainly worked with heterologous systems and through genetic manipulations of CaM (by expression of a dominant negative or by overexpression of CaM) or of the KCNQ binding motif.

      Based on both theoretical and practical issues, we, thus, believe that it is not feasible to implement a straightforward approach that would be compatible with our mouse model.

      An alternative, indirect approach, as indicated by Reviewer #3, would be to test the effect of Ca2+ chelators. Although this is likely to introduce confounding effects through the inhibition of other Ca2+-dependent channels, we propose to focus on trying this option and assess whether a XE991-sensitive component will be unmasked in MPC1 deficient cells.

      3) As the authors imply that BHB activates KCNQ channels, showing this directly in their prep would provide some convincing data. If this is true, why doesn't BHB increase firing rate of WT neurons?

      Activation of KCNQ channels is expected to reduce (not increase) neuronal firing. When exposed to BHB, we indeed found that WT cells also show a trend towards decreased excitability (p=0.08). We will report this trend in the revised figure 5F. Given that KCNQ channels are already available to be recruited upon repetitive firing in WT cells (to a larger extent as compared to KO, as indicated by our data with XE991) it is conceivable that a further potentiating effect of BHB at the concentration used for ex vivo recordings (2 mM) will be limited.

      4) How does the anti-epileptic effects of ketones in this study relate to previous reports of regulation of KATP channels? One of main concerns is that ketones might have a parallel anti-epileptic effect in the MPC1 KO mice that is unrelated to the mechanism proposed here.

      The ketogenic diet is likely to exert several effects including disruption of glutamatergic synaptic transmission, inhibition of glycolysis, and activation of ATP-sensitive potassium channels as pointed out by the reviewer. We do not exclude that inhibition of the MPC could also have an impact on the KATP channels and we are currently exploring this possibility. However, such work to dissect the potential implication of the KATP channels would go well beyond the scope of the present paper. Nevertheless, we will plan to certainly raise this important possibility in the discussion.

      **Minor comments:**

      1- What is the MPC1 KO efficiency in CaMK neurons? The western blot in 2c is from the whole cortex and therefore does not show that.

      This is indeed a good comment, however, please note that the estimation of MPC1 KO efficiency has also been evaluated in synaptosomes isolated from MPC1 KO cortices. These structures are mainly isolated from neurons (Carlin et al., JCB, 1980. 10.1083/jcb.86.3.831). As shown in figure 2C, these synaptosomes are massively enriched for CamKII and contain less astrocytic marker GFAP in comparison to the whole cortex. The amount of MPC1 in the synaptosomes prepared from the KO animals is strongly decreased. Nevertheless, as recommended by the reviewer, we plan to quantify the efficiency of the KO by performing a double immunostaining for MPC1 and a specific marker for neurons.

      2- Mitochondrial Ca2+ levels are not measured directly, for which there are many tools. This is needed to demonstrate definitively that there is a defect in Ca2+ handling."

      The reviewer raised an important point and we plan to monitor the levels of mitochondrial calcium in MPC-deficient neurons using the mito-Aequorin, a luminescent quantitative probe targeted to mitochondria (Granatiero et al., Cold Spring Harb. Protoc. 2014. 10.1101/pdb.top066118)

      Reviewer #1 (Significance (Required)):

      see above.

      **Referee Cross-commenting**

      It seems we are in reasonable agreement about the pros & cons of the manuscript. I agree that alternative approaches to RU360 are warranted.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      De la Rossa and colleagues examined the consequences of conditionally knocking out MPC1,a subunit of the mitochondrial pyruvate carrier. They found that despite decreased levels of oxidative phosphorylation in excitatory neurons, phenotypically these conditional knockout mice were normal at rest. However, when challenged by inhibition of GABA neurotransmission, these animals developed severe seizure activity and expired. These authors then showed that neurons with an absence of MPC1 were hyperexcitable in part through abnormal calcium homeostasis, which was associated with a reduction in M-type inhibitory potassium channel activity. Intriguingly, the ketogenic diet and the major ketone body beta-hydroxybutyrate were able to reverse these changes.

      This is a carefully conducted research study that reveals cell type-specific alterations of MPC1 deletion and functional consequences. The study design was logical and involved an exhaustive array of methodologies. The manuscript was generally well written and organized, and there are no major concerns. This study shows a direct causal relationship between impaired bioenergetics at the level of mitochondrial, and subsequent behavioral seizures, and is perhaps the most direct demonstration to date that an intrinsic disturbance of metabolic function can result in seizure activity (through changes in calcium regulation and impairment of ion channel activity). This will be an important contribution to the scientific literature.

      **MINOR:**

      1. Page 4, line 86: Would recommend changing "paroxystic" to "paroxysmal" (the latter which is a more recognized term). We will make the change.

      Page 5, line 124: recommend including the concentration of beta-hydroxybutyrate used when first mentioned. In general, concentration and dose information were difficult to find, as well as route of administration (for kainate, page 7, line 175). This type of information was not conveniently presented.

      We will follow the reviewer’s recommendation.

      Page 5, line 128: "both overcomed" is awkward. Would recommend using "both reversed".

      We fully agree and will make the change in the revised manuscript.

      Page 8, line 193: the authors probably meant "astro-MPC1-WT mice", not "neuro-MPC1-WT mice".

      Thank you for the acurate look. This will be changed.

      Page 12, lines 280-282: the authors might want to mention that chronic exposure of BHB might reduce the hyperexcitability of neuro-MPC1-KO mice.

      This point could indeed be discussed.

      Please review entire manuscript and use consistent tense. For example, on page 13, line 309, to maintain the past tense, it should read "We first assessed whether..."

      Thanks for the recommendation.

      Page 13, line 318: the authors used 10 mM BHB when examining calcium levels, but they earlier used 2 mM. They need to explain why they used a different concentration; and 2 vs 10 mM are quite different.

      The reviewer makes a valid point. When we performed the in vitro experiments, we used 10 mM BHB, which is slightly higher than the amount of ketone bodies measured in the blood of mice fed on a ketogenic diet for 2 days (Supplementary figure 4). This concentration of BHB has also been used by others (see for example: Izumi et al., JCI 1998, 101:1121-1132). Later on, when electrophysiology experiments were performed, the person in charge of these experiments followed a previously published protocol by Yellen and colleagues, in which the authors had used 2 mM BHB (Ma et al., J. Neurosci 2007,27: 3618-3625). This explains the differences between the concentrations used in vitro and in vivo.

      Page 13, line 323: it is not necessary to say "...interesting study published during the preparation of this manuscript." This phrase should be deleted, and the relevant reference simply cited.

      We will follow the reviewer’s recommendation.

      The authors need to explain more clearly in the beginning what exactly is meant by "paradoxical" hyperactivity. They provide greater meaning later in the manuscript, but this should be clarified at the outset.

      We will explain why we used this adjective in the beginning as recommended by the reviewer.

      Reviewer #2 (Significance (Required)):

      This is a very important study to show how primary defects in metabolism (i.e., disruption of the mitochondrial pyruvate carrier) can lead to epilepsy. Moreover, it details a primary mechanism that connects cellular bioenergetics to membrane excitability (through changes in calcium homeostasis and M-current function).

      This is a well-conducted study that utilizes a multiplicity of experimental tools to link biochemistry with seizure activity. This type of study is not so readily done, and strengthens the notion that primary defects in metabolism can result in epileptic seizures.

      This study is unique and attempts successfully to be more than just correlational. Hence it is a valuable contribution to the field.

      The audience will likely consist of metabolic geneticists, neurologists/epileptologists, and neuroscientists. This is a beautiful study that runs the translational spectrum from biochemistry to behavior.

      My expertise is in the field of translational epilepsy research, with a focus on mitochondria, metabolism, the ketogenic diet and ketone bodies. Thus, I am qualified to critically evaluate the entire manuscript.

      **Referee Cross-commenting**

      After reading comments and reviewing the manuscript again, would agree with Reviewer #1, and would change recommendation to MAJOR REVISION.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      This manuscript tests the genetic requirement of the mitochondrial pyruvate carrier (MPC) in regulation of neuronal excitability. The authors find that MPC deficiency in glutamatergic neurons is associated with aerobic glycolysis, inhibition of the M-type K channels, and neuronal hyperexcitability that manifests in increased sensitivity to chemical pro-convulsants without changes in resting conditions. Alterations in Ca homeostasis in MPC-deficient neurons is consistent with reduced mitochondrial membrane potential and attendant diminution of mitochondrial calcium buffering capacity. The authors further show that the effect of MPC deficiency can be phenocopied by treatment of wild type neurons with a chemical inhibitor of the mitochondrial Ca uniporter (MCU). Based on these data, it is proposed that reduced mitochondrial Ca uptake causes neuronal hyperexcitability in the absence of MPC. Overall, the manuscript presents detailed electrophysiology and in vivo seizure studies. However, there is significant disconnect between the actual data in Fig. 6 and the authors' conclusions/proposed mechanism. In particular, the evidence for the role of Ca in the hyperexcitability due to MPC deficiency is the weak link in the authors' argument.

      1. The studies linking reduced mitochondrial Ca uptake to hyperexcitability in MPC-deficient neurons (Fig. 6) have several limitations that significantly weaken the paper: 1a. The Ca measurements in cortical neurons (Fig. 6A-F) are performed under conditions (glutamate/KCl) that are fundamentally different from those used in electrophysiology of CA1 pyramidal neurons (Fig. 6G-N). The electrophysiological excitation is much briefer and less extreme than the chemical stimulation, and it is not clear that the Ca dysregulation occurs at the earliest times (see Fig. 6A).

      This point was also raised by reviewer 1. Please see our response to point 2.

      1b. The conclusion that MCU is functionally responsible for MPC's effect on neuronal excitability is singularly based on the use of RU360 as a chemical inhibitor of MCU but the specificity of this reagent is questionable. Evidence for a cause and effect relationship that directly implicates altered MCU/mitochondrial Ca buffering has not been provided.

      This accurate point was also raised by the reviewer 1. Please see our response to point 2 for a complete response. We will downregulate expression of MCU using shRNAs. We will also measure the mitochondrial calcium level in the hope of better understanding whether the phenotype of the MPC-deficient mice is due to impaired mitochondrial calcium uptake.

      1c. There is a large variation in the effect of 10 uM RU360 on firing frequency, comparing Fig. 6H and N (blue traces), including the shape of the traces and values at ramp number 6. This calls into question the reliability of the comparisons in each separate figure.

      Data presented in each single graph in the main Figures were obtained from groups of littermates through recordings conducted in consecutive days. Some caution is warranted when comparing data between different figures (i.e. between different experimental series), as several factors may contribute to inter-experiment variability, including variability between different batches of animals. However, the difference pointed out by the reviewer regarding the values of cell firing reported in Fig. 6H and N is only apparent. When applying depolarizations with ramps of 5s, a fair amount of WT cells infused with RU-360 show high instantaneous firing frequency, especially for the last ramps that steeply reach high current levels. This leads to accommodation/inactivation of the action potential towards the end of the ramps, as shown in the example trace in Fig 6G. As a result, the current-frequency plot deviates from linearity, as it is the case in Fig 6H (blue trace) and, even more evidently, in Fig 6N. We have now reanalyzed the same recordings from WT cells infused with 10 µM RU-360 and measured the firing frequency in response to a square depolarizing step (250 pA) of 0.5 or 1 second. No difference was found between the firing frequencies of the cells from Fig 6H and Fig. 6N (group 1 and group 2, respectively, in the figure below). Although the ramps may lead to some distortion for higher stimulation levels, we have decided to show results from ramps consistently throughout the main figures because this protocol with continuously increasing currents allows us to measure more precisely the rheobase and the firing threshold (as opposed to the stepwise increments of a square stimulation).

      1d. The calcium > PIP2 > M-type K+ channel axis is well established but has not been fully explored in the context of MPC deficiency. The use of a calcium chelator will likely be informative in this context, and would be better evidence for a role of Ca in the MPC effects.

      Although the use of a Ca2+ chelator such as BAPTA is likely to introduce confounding effects through the inhibition of other Ca2+-dependent channels, we will try this option and assess whether a XE991-sensitive component will be unmasked in MPC deficient cells.

      1e. The ability of BHB to rescue various parameters in this and other figures in the paper is interesting but does not directly speak to the specific mechanism as to how MPC deficiency affects neuronal excitability. BHB's effect is consistent with the metabolic flexibility of neurons when the TCA cycle cannot be fueled by glucose/pyruvate (as in GLUT1 or MPC deficiency).

      The mechanism we propose to explain the hyperexcitability of MPC-deficient neurons relies on the low mitochondrial membrane potential and their decreased capacity to buffer calcium. Based on our data, we propose that calcium accumulation in the cytosol disrupts the CaM-KCNQ interaction leading to hyperexcitability. Indeed, BHB could act in two possible (and parallel) ways. 1: directly on the M-type channels, 2. on mitochondria by providing acetylCoA to the TCA cycle. The use of an alternative ketone body will be informative in disentangling these two possibilities.

      The manuscript (and the field) will benefit from a more scholarly discussion and integration of published literature:

      2a. The published studies on the outcome of pharmacologic MPC inhibition in neurons (Ref 18, Divakaruni et al.) are not only consistent with the bioenergetic effect in Fig. 1, but more importantly, show that interference with MPC does not lead to broad deficiencies in energy metabolism but rather remodel fuel utilization patterns to alternative substrates that feed the TCA cycle (BHB, leucine, etc). For this reason, terms such as "mitochondrial dysfunction" and "OXPHOS deficiency" used throughout the manuscript to describe MPC deficiency are vague and imprecise. In addition, this metabolic flexibility may explain lack of defects under resting conditions. In light of these considerations, the argument as to whether aerobic glycolysis in MPC-deficient neurons explains the lack of phenotype in resting conditions (p 17) seems one-sided. Overall, the studies in ref 18 are relevant to the current manuscript and should be better integrated in the discussion.

      We fully agree with the possibility that the rewiring of cell metabolism in MPC-deficient neurons in the presence of leucine, BHB and other metabolites could explain the lack of phenotype in resting conditions. We thank the reviewer for this highly relevant comment which we will include in the revised discussion.

      2b. Several references are cited to describe the role of OXPHOS vis-à-vis aerobic glycolysis in neuronal function. At times, however, the authors' statements are not consistent with what these papers actually show (or do not show). For example, see the use of refs 6 and 44 on p17 of the discussion, where the authors state that aerobic glycolysis uncoupled from OXPHOS is sufficient to provide ATP for normal neurotransmission, but this does not mean OXPHOS is not needed.

      We agree that these references are not appropriate here and they will be removed.

      2c. Although the XE991 experiments support an important role for the M-type channels in the altered excitability with deficiency, it is not clear that the proposed mechanism can explain all of the electrophysiological differences, particularly those resting properties that are measured without a Ca challenge to the neurons. It would be good to discuss other possible mechanisms that could affect neuronal excitability.

      Our results point to M-type channels as important players in the phenotype of the MPC-deficient mice. Previous reports indicate that inhibition of this channel by XE991 can modulate input resistance, membrane potential and firing threshold of pyramidal cells (e.g. Shah et al, 2018, doi/10.1073/pnas.0802805105; Hu et al. 2007, DOI:10.1523/JNEUROSCI.4463-06.2007; Petrovic et al., 2012, doi:10.1371/journal.pone.0030402). We also found that XE991 induced a shift towards more negative potentials in the firing threshold of WT cells, but not in MPC1 deficient cells (-3.3±0.6 vs. -0.4±1.0, n=9, 8, p=0.027). However, we agree with the reviewer that the phenotype is probably highly complex and that additional mechanisms may contribute to modulate the intrinsic excitability of MPC-deficient neurons. One such mechanism could be closure of KATP channels, which we are currently investigating. This will be discussed.

      Reviewer #3 (Significance (Required)):

      The significance of the advance: The studies provide genetic evidence for the role of MPC in neuronal excitability.

      The work in the context of existing literature: Please see specific comments above under point 2 regarding the need for a scholarly discussion and integration of existing literature.

      Audience that might be interested: mitochondrial bioenergetics and metabolism and metabolic control of neuronal excitation.

      Keywords describing expertise: metabolism, mitochondria and electrophysiology.

    1. Judgments made by different people are even more likely to diverge. Research has confirmed that in many tasks, experts’ decisions are highly variable: valuing stocks, appraising real estate, sentencing criminals, evaluating job performance, auditing financial statements, and more. The unavoidable conclusion is that professionals often make decisions that deviate significantly from those of their peers, from their own prior decisions, and from rules that they themselves claim to follow.

      As educators (and disciplinary "experts") we like to think that our judgements on student performance are objective. As if our decisions are free from noise. I often point out to my students that their grades on clinical placements may be more directly influenced by their assessors relationship with their spouse, than by the actual clinical performance.

    1. Can a page opt-out of showing annotations

      One of the most important questions. We wrote about it here and here. Many people voiced their feelings about this in the six months or so that we spent investigating it and exploring it.

      The tension is between what users want and what authors want. The way the web works now, authors control what is delivered by the web server, users control how they consume web content. If I want to install the "Drumpf" chrome extension, I can, and web sites can't (easily?) block me from doing so. Greasemonkey is another good example of this.

      If the browser comes w/ the native ability to fetch, anchor and display annotations, but as the user I have the ability to decide which services I subscribe to-- then should page authors be able to block me from that? And if you think they should, should that blockage only be for public commentary, but not personal notes or private group annotation?

      One particularly thorny problem is around how governments and other public entities might want to use that same blocking power to prevent you from marking up their sites and documents. Should your government, or any government, be able to block its citizens or others from critical analysis of documents that it hosts? And if not, then how do you distinguish them from just any old page author.

      For the moment, a countermeasure that authors can employ is to add shrapnel to their web pages which blocks some of the strategies that are used. Wordpress plugins are available that do this. In a crude sense, this may be a reasonable compromise.

    1. Author Response:

      Reviewer #1 (Public Review):

      We thank the Reviewer #1 for their valuable comments. We agree with the Reviewer that our current results are not sufficient to confirm the therapeutic effects. The statement related to therapy is removed.

      The study by Song and colleagues explores the role of circRNAs in fibrosis of the endometrium. Endometrial cells for patients with and without fibrosis were subjected to expression profiling analysis, and circPTPN12 and miR-21-5p were strongly separate in fibrosis in endometrial, with circPTPN12 acting as an inhibitory factor for miR-21-5p. Through the use of various molecular approaches, the authors further that miR-21-5p inhibition results in upregulation of ΔNp63α, and transcription factor that induces EMT. The role of circPTPN12 was also confirmed in vivo using a mouse model of mechanically induced endometrial fibrosis. The authors concluded that targeting the path circPTPN12/miR-21-5p/∆Np63α may be a therapeutic strategy for endometrial fibrosis.

      The authors clearly and convincingly show the involvement of the circPTPN12/miR-21-5p/∆Np63α in EMT and its potential involvement in endometrial fibrosis. Whether or not this can be a therapeutic target is too preliminary at this point. First because the in vivo experiments confirm the link between circPTPN12/miR-21-5p/∆Np63α at the RNA level only (p63) and it would be more convincing to see protein data as well.

      We did try to detect the protein of ΔNp63α in mouse with immunochemistry and immunofluorescence, using three antibodies (CST, cat# 67825 and 39692; Abcam, ab124762). Unfortunately, we did not obtain positive results. However, ΔNp63α mRNA was significantly changed.

      The involvement of p63 in the process remains a little elusive in this paper.

      We have reported that ΔNp63α is ectopically expressed in endometrial epithelial cells in IUA patients (Cao et al., 2018), and showed that ΔNp63α promotes the expression of SNAI1 by DUSP4/GSK3B pathway and induces EECs-EMT and fibrosis (Zhao et al., 2020). We've put this description of ΔNp63α in the discussion section (2nd paragraph).

      In addition, if the authors believe this pathway can be a real future target to treat endometrial fibrosis, they could better contextualise such a statement, specifically describe what kinds of therapeutic intervention they think of, like regression or prevention of fibrosis. These should be tested in vitro and in vivo.

      Our results showed that replenishing miR-21-5p can reverse EMT and remit endometrial fibrosis in vivo and in vitro. However, the therapeutic intervention of miR-21-5p in clinic needs more research on other animal models such as rats, pigs, and non-human primates. Thus, we removed therapeutic statement (page 1, Line 1-2; and page 2, Line 37-40; and page 4, Line 74-76; page 13, Line 273).

      More evidence of the involvement of circPTPN12/miR-21-5p/∆Np63α and the correlation between the three players using clinical material is also necessary.

      The involvement of ∆Np63α in endometrial fibrosis has been proved in our published paper and results are quoted in this paper (Zhao et al., 2020). The correlation between circPTPN12 and miR-21-5p using clinical material was listed in Figure 2J. In vivo and ex vivo experiments had confirmed that overexpression of circPTPN12 downregulates miR-21-5p and upregulates ∆Np63α (Figure 3H/Figure 4J/ Figure 5B/ Figure 5E). In addition, ex vivo experiments suggested that the decrease of ∆Np63α is secondary to the increase of miR-21-5p (Figure 4C-E).

    1. In a way she realized that she herself was doomed, that sooner or later the Thought Police would catch her and kill her, but with another part of her mind she believed that it was somehow possible to construct a secret world in which you could live as you chose. All you needed was luck and cunning and boldness. She did not understand that there was no such thing as happiness, that the only victory lay in the far future, long after you were dead, that from the moment of declaring war on the Party it was better to think of yourself as a corpse. 'We are the dead,' he said. 'We're not dead yet,' said Julia prosaically. 'Not physically. Six months, a year--five years, conceivably. I am afraid of death. You are young, so presumably you're more afraid of it than I am. Obviously we shall put it off as long as we can. But it makes very little difference. So long as human beings stay human, death and life are the same thing.' 'Oh, rubbish! Which would you sooner sleep with, me or a skeleton? Don't you enjoy being alive? Don't you like feeling: This is me, this is my hand, this is my leg, I'm real, I'm solid, I'm alive! Don't you like THIS?' She twisted herself round and pressed her bosom against him. He could feel her breasts, ripe yet firm, through her overalls. Her body seemed to be pouring some of its youth and vigour into his. 'Yes, I like that,' he said. 'Then stop talking about dying. And now listen, dear, we've got to fix up about the next time we meet. We may as well go back to the place in the wood. We've given it a good long rest. But you must get there by a different way this time. I've got it all planned out. You take the train--but look, I'll draw it out for you.' And in her practical way she scraped together a small square of dust, and with a twig from a pigeon's nest began drawing a map on the floor.

      the idea of life and death, that they are already dead

    1. In some important professions, such as physics and engineering, Asian Americans are overrepresented and African Americans underrepresented. We presumably get better research because of this. This may or may not outweigh the inequity of unequal group representation. That is a social decision.

      This is a great article, but this statement irks me. "We presumably get better research out of this" - I do not think we can presume that. While a stopwatch may make a truer meritocracy (although one can argue that environment still plays a part in this), certainly there is a tremendous amount of environmental factors involved in what drives overrepresentation of certain racial or ethnic groups in "high-achieving" professions like physics or engineering.

    1. HU Skip navigation Search Search Search 9+ HU {"@context":"https://schema.org","@type":"VideoObject","description":"Doc Searls and Jonathan Bennett talk with Steven J. Vaughan-Nichols about what's happening in technology journalism, with the open source world he knows perhaps better than any other journalist on the case, and with where he got started: in space and space technologies. (Bonus fact: Steven digs Starlink, and Jonathan is using it to participate in the show.)\n\nHosts: Doc Searls, Jonathan Bennett\nGuest: Steven Vaughan-Nichols\nFLOSS Weekly Episode 629\nMore Info: https://twit.tv/shows/floss-weekly/episodes/629\n\nDownload or subscribe to this show at https://twit.tv/shows/floss-weekly\n\nThink your open source project should be on FLOSS Weekly? Email floss@twit.tv.\n\nThanks to Lullabot's Jeff Robbins, web designer and musician, for our theme music.\n\nGet episodes ad-free with Club TWiT at https://twit.tv/clubtwit\n\nProducts we recommend: https://www.amazon.com/shop/twitnetcastnetwork\nTWiT may earn commissions on certain products.\n\nJoin our TWiT Community on Discourse: https://www.twit.community/\n\nFollow us:\nhttps://twit.tv/\nhttps://twitter.com/TWiT\nhttps://www.facebook.com/TWiTNetwork\nhttps://www.instagram.com/twit.tv/\n\nAbout us:\nTWiT.tv is a technology podcasting network located in the San Francisco Bay Area with the #1 ranked technology podcast This Week in Tech hosted by Leo Laporte. Every week we produce over 30 hours of content on a variety of programs including Tech News Weekly, MacBreak Weekly, This Week in Google, Windows Weekly, Security Now, All About Android, and more.","duration":"PT3778S","embedUrl":"https://www.youtube.com/embed/T04zvX_JOPE","interactionCount":"26","name":"Steven J. Vaughan-Nichols - Technology Journalism","thumbnailUrl":["https://i.ytimg.com/vi/T04zvX_JOPE/maxresdefault.jpg"],"uploadDate":"2021-05-12","genre":"Science & Technology","author":"FLOSS Weekly"} Steven J. Vaughan-Nichols - Technology JournalismWatch laterShareCopy linkInfoShoppingTap to unmuteIf playback doesn't begin shortly, try restarting your device.4:320:00Up nextLiveUpcomingCancelPlay NowDigital Sovereignty - Dr. Andre Kudra1:09:48Rust - Steve Klabnik & Rust1:03:52FLOSS WeeklySUBSCRIBESUBSCRIBEDWe're not talking dentistry here; FLOSS all about Free Libre Open Source Software. Join host Doc Searls and his rotating panel of co-hosts every Wednesday as they talk with the most interesting and important people in the Open Source and Free Software community. Records live every Wednesday at 12:30pm Eastern / 9:30am Pacific at https://twit.tv/liveYou're signed outVideos you watch may be added to the TV's watch history and influence TV recommendations. To avoid this, cancel and sign in to YouTube on your computer.CancelConfirmSwitch cameraShareInclude playlistAn error occurred while retrieving sharing information. Please try again later.0:001:02:570:02 / 1:02:57Live•Scroll for details Steven J. Vaughan-Nichols - Technology Journalism 26 views • May 13, 2021 • Doc Searls and Jonathan Bennett talk with Steven J. Vaughan-Nichols about what's happening in technology journalism, with the open source world he knows perhaps better than any other journalist on the case, and with where he got started: in space and space technologies. (Bonus fact: Steven digs Starlink, and Jonathan is using it to participate in the show.) Hosts: Doc Searls, Jonathan Bennett Guest: Steven Vaughan-Nichols FLOSS Weekly Episode 629 More Info: https://twit.tv/shows/floss-weekly/ep... Download or subscribe to this show at https://twit.tv/shows/floss-weekly Think your open source project should be on FLOSS Weekly? Email floss@twit.tv. Thanks to Lullabot's Jeff Robbins, web designer and musician, for our theme music. Get episodes ad-free with Club TWiT at https://twit.tv/clubtwit Products we recommend: https://www.amazon.com/shop/twitnetca... TWiT may earn commissions on certain products. Join our TWiT Community on Discourse: https://www.twit.community/ Follow us: https://twit.tv/ https://twitter.com/TWiT https://www.facebook.com/TWiTNetwork https://www.instagram.com/twit.tv/ About us: TWiT.tv is a technology podcasting network located in the San Francisco Bay Area with the #1 ranked technology podcast This Week in Tech hosted by Leo Laporte. Every week we produce over 30 hours of content on a variety of programs including Tech News Weekly, MacBreak Weekly, This Week in Google, Windows Weekly, Security Now, All About Android, and more. Show less Show more 50ShareSave FLOSS Weekly FLOSS Weekly 5.16K subscribers Subscribe Steven J. Vaughan-Nichols - Technology Journalism

    1. Author Response:

      Reviewer #1:

      Weaknesses: The main aim of the study is to identify biomarkers that predict S/MD dengue early in the course of dengue. This requires biomarkers of which the levels change early after symptom onset. However, levels of several of the biomarkers did not change markedly between the two time points (early vs late), suggesting that the levels of these biomarkers had not yet changed on day 1-3, thereby questioning their use as 'early biomarkers'.

      Thank you, we acknowledge that the levels of some of the biomarkers are not markedly different between early and late time points. However this does not affect the aims of the study; firstly the late time-point may not represent the patient’s baseline as this time-point was within 2-3 weeks of the acute illness and secondly, our focus was on the first 3 days of illness, in order to identify early predictors, noting that this may not represent the peak for many of the biomarkers, which would be in the critical phase. However, we still were able to achieve our main aim which was to compare biomarkers on days 1-3 between patients who progressed to more severe outcomes and those who did not.

      The authors selected the biomarkers based on earlier pathophysiology studies. An alternative approach might have been to first measure a larger set of candidate biomarkers in a selection of patients and select only those biomarkers showing a clear change in the early phase.

      Thank you for your suggestion. For this study, due to the limited number of outcomes (moderate-severe events - 281 cases) and limited volume blood samples, we selected 10 biomarkers as the events-per-variable should be greater than 10 and we also would like to investigate the non- linear effect and interaction of the biomarkers [Heinze et al., Biom J 2018]. We therefore selected the most promising biomarkers systematically based on pilot data and published literature.

      Reference: Heinze G, Wallisch C, Dunkler D. Variable selection - A review and recommendations for the practicing statistician. Biom J 2018; 60(3): 431-49.

      The predictive values of many of the biomarkers was only modest or absent. In addition, some of the findings appear a bit counterintuitive. Examples include the trend of the association of IP-10 with S/MD dengue that changed from positive to negative in the global model, and the opposite trends of some of the biomarkers (e.g. IL-8, ferritin) in adults and children. The authors acknowledge the existence of differences in dengue pathology between children and adults, but could discuss the possible biological reasons in more detail. For example, why would specifically IL-8 or ferritin have an oppositie effect in children and adults.

      The trend of the association of IP-10 with S/MD changed from the single to global model does not diminish the possibility of that biomarker being selected in the best combinations. In this study we do not try to elucidate causal pathways. Another biomarker in our model may be a mediator or confounder of IP-10 in the pathway to the outcome. This could be IL-1RA, as its association with S/MD was similar between the single and global model, and the correlation between IP-10 and IL-1RA was strong (Spearman’s rank correlation coefficient was 0.75). A change in direction after correction for another variable is often referred to as Simpson’s paradox. We have added this point to the discussion of the revised manuscript (page 14, lines 10-16).

      The opposing effect in children and adults is likely to be due to the composite endpoint of severe and moderate dengue. As shown in the analysis of severe dengue alone (figure S5, table S6), the effects of IL-8 and ferritin were similar in children and adults, which suggests these biomarkers are still associated with severe disease in all age groups and that the difference is driven by the moderate dengue group. In addition, uncomplicated dengue in adults have higher ferritin levels compared to in children, with increasing age and chronic conditions in adults likely contributing to this. We have added this point to the discussion in the revised manuscript (page 14, lines 21-26 and page 15, lines 1-2).

      The study does not include a validation cohort. The authors conclude that their findings 'assist the development of biomarker panels for clinical use.' Can the authors put into perspective the performance of their current combined biomarker panel to rule out S/MD dengue.

      Thank you for your comment, this is a case-control and preliminary study to investigate the potential combination of biomarkers associated with dengue clinical outcomes. We quantify importance by means of AIC and p-value. Another dataset without selection by outcome is needed to validate the findings in relation to predictive value. We have added to the limitations that this was not a prediction study, therefore, the performance of the combined biomarker panel with respect to predictive value was not performed (page 16, lines 14-17).

      Overall, the authors show convincingly in a unique cohort that biomarkers can be helpful to triage dengue patients already in the first days from symptom onset. Identification of the best biomarkers for this goal, validation in other cohorts, and a better understanding of differences between children and adults are required before such panels can be introduced in daily clinical practice.

      Thank you for your comment.

      Reviewer #2:

      The main weakness is the exclusion of virological markers, such as plasma/serum viral RNA levels or NS1 antigenaemia. Indeed, previous observations have found severe dengue patients to have higher viraemia in the acute phase of illness compared to those with uncomplicated dengue. More recently, several mechanistic studies have suggested that dengue virus NS1 protein could bind endothelial cells to disrupt its integrity, leading to vascular leakage. Indeed, the authors have pointed out these findings in lines 20-25 on page to lines 1-2 on page 6. Despite these reports, it is curious that the authors have not included either viraemia or NS1 antigenaemia as possible biomarkers for severe dengue.

      Thank you, we acknowledge that plasma viremia and NS1 antigenaemia levels are important factors in dengue disease outcomes. In this study, only enrolment viremia levels were available, but NS1 antigenaemia levels were not. We have previously investigated the association between viremia levels and clinical outcomes using a pooled dataset of the IDAMS international study and other three studies in Vietnam. We found that higher plasma viremia was associated with increased dengue severity [Vuong et al., Clin Infect Dis 2020]. For this study, the main aim was to investigate host biomarkers which could be combined in a multiplex test panel.

      However, as suggested, we have added the information of viremia levels to table S3 (which was previously table 2) of the revised manuscript. Also, we have performed a sensitivity analysis to include viremia levels as a potential biomarker and we have found that: (1) higher plasma viremia was associated with increased the risk of severe/moderate dengue in both single and global models, and (2) viremia was not selected in children but was selected fourth in adults when performing the best subset procedure. We have added this information in the Statistical analysis (page 10, lines 20- 24) and Results sections (page 13, lines 17-20), and the Supplementary file (appendix 8, figure S8, tables S13-S15, pages 30-34).

      Reference: Vuong NL, Quyen NTH, Tien NTH, et al. Higher plasma viremia in the febrile phase is associated with adverse dengue outcomes irrespective of infecting serotype or host immune status: an analysis of 5642 Vietnamese cases. Clin Infect Dis 2020.

      The manuscript in its present form may favour those with a strong statistical background to fully appreciate the nuances. Clearer explanations on the statistical findings would, I think, be helpful to those without such statistical background but who would nonetheless be in positions to translate these findings into clinical practice.

      We have added more explanation in the Statistical analysis, Results and Discussion sections to clarify statistical methods used in this study and the interpretation of the results.

      Most of the cases included in this study had DENV-1 infection. The biomarkers identified in this study may thus be DENV-1 specific and may not be readily applied to triage dengue cases caused by other DENV infection.

      In our study, DENV-1 accounted for 42% of all cases. We have performed a sensitivity analysis taking into account differences between serotypes. The results showed that there was no significant difference between serotypes with respect to the association between the biomarkers and primary endpoint in both the single and global models. This suggests that the study’s results are applicable for all serotypes. This information has been added in the Statistical analysis (page 10, lines 18-20) and Results sections (page 12, lines 18-20), and the Supplementary file (appendix 5, figures S3-S4, tables S4-S5, pages 13-17).

      Reviewer #3:

      1) For general ease of readership, it would greatly help if the authors can explain the choice of the statistical method used in the data analysis and perhaps briefly explain the model and how AIC should be interpreted in the main rather than the supplementary text).

      We have clarified in more details in the Statistical analysis section of the revised manuscript.

      2) While this reviewer understands that the authors want to focus on host immune and inflammatory biomarkers but it would be helpful if NS1 and viremia data are also shown ( at least in supplementary data) if these have been found not to correlate with disease severity.

      Thank you please see response to comment #1 of reviewer #2. Quantitative NS1 results were not available in this study. We have added viremia in a sensitivity analysis and the results showed that higher viremia was associated with increased risk of severe/moderate dengue, similar to our previous study [Vuong et al., Clin Infect Dis 2020]. In the best subset procedure, viremia was not selected in children and was selected fourth in adults.

      Reference: Vuong NL, Quyen NTH, Tien NTH, et al. Higher plasma viremia in the febrile phase is associated with adverse dengue outcomes irrespective of infecting serotype or host immune status: an analysis of 5642 Vietnamese cases. Clin Infect Dis 2020.

      3) It is Interesting to note that some biomarkers ( particularly the vascular markers) in severe group do not return to the same baseline as mild cases at convalescence even after >20 days. Whether such individuals already are at higher inflammatory state at baseline (pre-infection) as a result of underlying co-morbidities such as obesity or diabetes? Table 1 did not provide such information but would be interesting to show if there is any difference in health state in the 2 groups especially for obesity.

      We have added the information of obesity and diabetes in table 1, Results section (page 11, lines 13-14). There were 5 patients with diabetes; obesity was balanced between groups (14% in control group and 10% in S/MD group).

      4) It is rather confusing that the 2nd paragraph of discussion stated "Balancing model fit, robustness, and parsimony, we suggest the combination of five biomarkers IL-1RA, Ang-2, IL-8, ferritin, and IP-10 for children, and the combination of three biomarkers SDC-1, IL-8, and ferritin for adults to be used in practice."

      But the concluding paragraph went on to state "The best biomarker combination for children includes IL-1RA, Ang-2, IL-8, ferritin, IP-10, and SDC-1; for adults, SDC-1, IL-8, ferritin, sTREM-1, IL-1RA, IP-10, and sCD163 were selected." This should be clarified further.

      Thank you for pointing this out. The conclusion was based on the best combinations (taking into account AIC only), which consisted of 6 biomarkers for children and 7 biomarkers for adults. In the discussion, we reduced the number of biomarkers, taking into consideration not only the AIC, but also parsimony for clinical translation purposes, while keeping the model fit as good as possible (taking a difference of AIC of less than 5 compared to the best combination). We therefore suggested a combination of 5 biomarkers for children and 3 biomarkers for adults, considering these 3 factors - model fit, robustness and parsimony. We have clarified this point in the Discussion section of the revised manuscript (page 15, lines 20-25).

    1. Author Response:

      Reviewer #1:

      Summary and Strength:

      Single-cell RNA sequencing is the most appropriate technique to profile unknown cell types and Koiwai et al. made good use of the suitable tool to understand the heterogeneity of shrimp hemocyte populations. The authors profiled single-cell transcriptomes of shrimp hemocytes and revealed nine subtypes of hemocytes. Each cluster recognizes several markers, and the authors found that Hem1 and Hem2 are likely immature hemocytes while Hem5 to Hem9 would play a role in immune responses. Moreover, pseudotime trajectory analysis discovered that hemocytes differentiate from a single subpopulation to four hemocyte populations, indicating active hematopoiesis in the crustacean. The authors explored cell growth- and immune-related genes in each cluster and suggested putative functions of each hemocyte subtype. Lastly, scRNA-seq results were further validated by in vivo analysis and identified biological differences between agranulocytes and granulocytes. Overall, conclusions are well-supported by data and hemocyte classifications were carefully performed. Given the importance of aquaculture in both biology and industry, this study will be an extremely useful reference for crustacean hematopoiesis and immunity. Moreover, it will be a good example and prototype for cell-type analysis in non-model organisms.

      Thank you very much for your kind review. We hope that this paper will lead to a better understanding of the immune system of shrimp and further development of aquaculture.

      Weaknesses:

      The conclusions of this paper are mostly well supported by data, but some aspects of data analysis QC and in vivo lineage validation need to be clarified.

      1) It is not a trivial task to perform genome-wide analyses of gene expression on species without sufficient reference genome/transcriptome maps. With this respect, the authors should have de novo assembled a transcriptome map with a careful curation of the resulting transfrags. One of the weaknesses of this study is the lack of proper evaluation for the assembly results. To reassure the results, the authors would need to first assess their de novo transcripts in detail and additional data QC analysis would help substantiate the validity.

      The genome sequence of the kuruma shrimp M. japonicus has only been registered, and the high-quality data has not been published yet. Therefore, we could not perform validation using the genome sequence. However, by applying the BUSCO tool to the assembled sequences, we verified the quality of the assembly genes. Line 80-82 and 634-636.

      2) The authors applied SCTransform to adjust batch effects and to integrate independent sequencing libraries. SCTransform performs well in general; however, the authors would need to present results on how batch effects were corrected along with before and after analysis. In addition, the authors would need to check if any cluster was primarily originated from a single library, which could be indicative of library-specific bias (or batch effects).

      Thank you for your suggestion. The triplicate distribution after batch correction is shown in the Figure 2-figure supplement 1 and Figure 5-figure supplement 1. Line 123 (Figure 2-figure supplement 1), 244 ( Figure 5-figure supplement 1) and 686-689.

      3) Hem6 cells lack specific markers and some cells in this cluster are scattered throughout the other clusters (Fig. 1 & 2). Based on the pattern, it is possible that these cells are continuous subsets of other clusters. It would be good if the authors could group these cells with Hem7 or other clusters based on transcriptomic similarities or by changing clustering resolution. Additionally, they may also be a result of doublets, and it is unclear whether doublets were removed. Hem6 cells require additional measures to fully categorize as a unique subset.

      Based on the new UMI counts, we re-did in silico clustering and pseudotime analysis with new parameters. For Doublets, we assumed UMI less than 4000 this time because none of them had prominent UMI. Line 118 (Figure 2), 237 (Figure 5), 686-689 and 710-712.

      4) The authors took advantage of FACS sorting, qRT-PCR, and microscopic observation to verify in silico analyses and defined R1 and R2 populations. While the experiments are appropriate to delineate differences between the two populations, it is not sufficient to determine agranulocytes as a premature population (Hem1-4) and granulocytes as differentiated subsets (Hem5-9). To better understand the two groups (ideally nine subtypes), additional in vivo experiments would be essential. For example, proliferation markers (BrdU or EdU) could be examined after FACS sorting R1 and R2 cells to show R1 cells (immature hemocytes) are indeed proliferating as indicated in the analyses.

      Since stable culture of shrimp hemocytes is still difficult, it is difficult to implement BrdU assay now. We believe the advantages of our study are that single-cell analysis can be used in shrimp, that we explored marker candidates, and that we were able to provide guidelines for cell classification in the future. Of course, we are going to adapt BrdU or EdU assay on hemocytes in the feature.

      5) FACS-sorted R1 or R2 population does not look homogeneous based on the morphology and having two subgroups under nine hemocyte subtypes may not be the most appropriate way to validate the data. The better way to prove each subtype is to use in situ hybridization to validate marker gene expressions and match with morphology.

      What we want to show here is that it is very difficult to classify hemocytes by morphologically, and even if we could, it is likely to be divided into two rough groups (FACS result). As in the answer to the question above, we believe the advantage of this project is that we were able to search for marker candidates and provide guidelines for cell classification in the future. Of course, in the future, we hope to look at the function and expression of each gene. Since it is difficult to perform the in-situ assay or BrdU assay in shrimp hemocytes immediately, we have removed the Figure 7.

      Reviewer #2:

      In this manuscript Koiwai et al. used single cell RNA sequencing of hemocytes from the shrimp Marsupenaeus japonicus. Due to lack of complete genome information for this species, they first did a de novo assembly of transcript data from shrimp hemocytes, and then used this as reference to map the scRNA results. Based on expression of the 3000 most variable genes, and a subsequent cluster analysis, nine different subpopulations of hemocytes were identified, named as Hem1-Hem9. They used the Seurat marker tool to find in total 40 cluster specific marker transcripts for all cluster except for Hem6. Based upon the predicted markers the authors suggested Hem1 and Hem2 to be immature hemocytes. In order to determine differentiation lineages they then used known cell-cycle markers from Drosophila melanogaster and could confirm Hem1 as hemocyte precursors. While genes involved in the cell cycle could be used to identify hemocyte precursors, the authors concluded that immune related genes from the fly was not possible to use to determine functions or different lineages of hemocytes in the shrimp. This is an important (and known) fact, since it is often taught that the fruit fly can be used as a general model organism for invertebrate immunologists which obviously is not the case. Even among arthropods, animals are different. The authors suggest four lineages based upon a pseudo temporal analysis using the Drosophila cell-cycle genes and other proliferation-related genes. Further, they used growth factor genes and immune related genes and could nicely map these into different clusters and thereby in a way validating the nine subpopulations. This paper will provide a good framework to detect and analyze immune responses in shrimp and other crustaceans in a more detailed way.

      Strengths:

      The determination of nine classes of hemocytes will enable much more detailed studies in the future about immune responses, which so far have been performed using expression analysis in mixed cell populations. This paper will give scientists a tool to understand differential cell response upon an injury or pathogen infection. The subdivision into nine hemocyte populations is carefully done using several sets of markers and the conclusions are on the whole well supported by the data.

      Thank you for taking the time to review our paper. We hope that this paper will serve as a guideline for crustacean hemocyte research.

      Weaknesses:

      One obvious drawback of the paper is first the low number of UMIs. A total number of 2704 cells gave a median UMI as low as 718 which is very low. Especially shrimp no. 2 has an average far below 500 and should perhaps be omitted. Therefore, one question is about cell viability prior to the drop-seq analysis. The fact of this low number of UMIs should be discussed more thoroughly.

      By confirming the mitochondrial-derived sequences, we cleared up the suspicion that large numbers of dead cells were contaminating. We have also succeeded in increasing the number of UMIs by changing mapping software and adjusting the parameters. The value of UMIs is still lower than that of other model organisms, but we think that will improve as the reference genome is published in the future. I have discussed this in the manuscript. Line 87-89, 118 (Figure 2) and 716-717.

      Details about how quality control (QC) was performed would be needed, for example the cutoff values for number of UMI per cell, and also one important information showing the quality is the proportion of mitochondrial genes.

      As we answered in the above section, we checked and figured the results of mitochondrial contents. Since there are no set rules here, we set the parameters for one cell based on the initial distribution diagram. Line 87-89, 118 (Figure 2) and 686-689

      The clustering into nine subpopulations seems solid, however the determination of lineages based upon the pseudo time analysis with cell-cycle related genes is not that strong. The authors identify four lineages, all starting from hem1 via hem2-Hem3- Hem4 and then one to Hem5, another through part of Hem 6 to Hem 7, next through part of Hem 6 to Hem 8 and finally through part of Hem 6 to Hem 9. Referring to Figure 3 - supplement 3, it seems as if Hem6 could be subdivided into two clusters, one visible in B and C, while another part of Hem & is added in D.

      Based on the new UMI counts, we re-did in silico Clustering and pseudotime analysis with new parameters. It made more clear result. Line 118 (Figure 2), 237 (Figure 5), 686-689 and 710-712.

      Also, the data in figure 3 - supplement 1 showing expression of cell cycle markers do not convincingly show the lineages. Cluster Hem 3 and 4 seems to express much fewer and lower amount of these markers compared to cluster Hem6 - Hem9.

      As a result of the new clustering and other analyses, we can now see more clearly how growth-related genes vary along the clusters (Figure 7). Line 366 (Figure 7).

      It is also clear (from figure 5 - supplement 1) that there are more than one TGase gene and the authors would need to discuss that fact related to differentiation.

      Thank you for your suggestion. We discussed about different type of TGase in revised paper. Line 386-399, 457 (Figure 8-figure supplement 2).

      While the part to determine subpopulations is very strong, the part about FACS analysis and qRT-PCR is weaker than the other sections, and doesn't add so much information. Validation of marker genes and the relationship between clusters and morphology shown in figure 6 is not totally convincing. It seems clear that both R1 and R2 contains a mixture of different cell types even if TGase expression is a bit higher in R1. A better way to confirm the results could be to do in situ hybridization (or antibody staining) and show the cell morphology of some selected marker proteins in a mixed hemocyte population. FACS sorting is very crude and does not really separate the shrimp hemocytes in clear groups based on granularity and size. This may be because the size of hemocytes without granules vary a lot. You need cell surface markers to do a good sorting by FACS.

      We agree your comments that in situ hybridization or antibody staining are powerful tools to support our new findings. However, it is difficult to perform in-situ assay or preparation of antibody for shrimp hemocytes immediately. What we want to show here is that it is very difficult to classify hemocytes by morphologically, and even if we could, it is likely to be divided into two rough groups (FACS result). As in the answer to the question above, we believe the advantage of this project is that we were able to search for marker candidates and provide guidelines for cell classification in the future. Of course, in the future, we hope to look at the function and expression of each gene.

      Another minor issue is the discussion about KPI. There are a huge number of Kazal-type proteinase inhibitors in crustaceans and it is not clear from this data if the authors discuss a specific KPI-gene, and there is a mistake in referring to reference 65 which is about a Kunitz-type inhibitor.

      Thank you for your important pointing. In case of kuruma shrimp, de novo assembled genes and blast results showed low (around 60%) identity against L. vannamei’s Kazal-type proteinase inhibitor, not against kuruma shrimp. Therefore, we could not discuss about which type of KPI in this study. We consider it important that further research on KPIs for kuruma shrimp be conducted in the future. Also, as you pointed out, reference 65 was wrong, so we removed it. Line 474 (Figure 8-figure supplement 5).

      In summary, this paper is a very important contribution to crustacean immunology, and although a bit weak in lineage determination it will be of extremely high value.

      Thank you for giving us a good feedback. We understand that the evaluation of the gene as a marker and the expression of the marker gene in each cell is poor in not being able to confirm. However, we believe that our research will hopefully serve as a basis for future research.

      Reviewer #3:

      This manuscript by Koiwai et al. described the single-cell RNA-seq analysis of shrimp hemocytes and was submitted as a Resource Paper in eLife. In this study, they identified 9 cell types in shrimp hemocytes based on their transcriptional profiles and identified markers for each subpopulation. They predicted different immune roles among these subpopulations from differentially expressed immune-related genes. They also identified cell growth factors that might play important roles in hemocyte differentiation. This study helps to understand the immune system of shrimp and maybe useful for improving the control of the pathogen infections. The analysis of the data and interpretation is overall good but there are also some concerns:

      Thank you for your careful peer review. We hope that this paper will be useful to other researchers in the future. We have made a revise based on your comments, please review it again.

      1) The number of UMI and genes detected per cell after mapping to the in-house reference genome does not appear to be presented, and the similarities or differences between the three replicated samples are not discussed, as well as the low number of genes detected per cell (~300 in this study) .

      By confirming the mitochondrial-derived sequences, we cleared up the suspicion that large numbers of dead cells were contaminating. We have also succeeded in increasing the number of UMIs by changing mapping software and adjusting the parameters. The value of UMIs is still lower than that of other model organisms, but we think that will improve as the reference genome is published in the future. I have discussed this in the manuscript. Line 87-89, 118 (Figure 2) and 686-689.

      2) The correlation between the morphology and the expression of marker genes demonstrated in Figure 6 is questionable. Cells of the same size could express totally different genes. On the other hand, cells that are different in size can express nearly identical genes. The evidence presented in this manuscript is not enough to support a correlation between cell size and gene expression. Therefore, the author would either need to provide more evidence to support this correlation, or not make such correlation.

      Yes, we agree your comments. What we want to show here is that it is very difficult to classify hemocytes by morphologically, and even if we could, it is likely to be divided into two rough groups (FACS result). So, it is not surprising that similar cells may or may not express similar genes. However, some of genes can be used as markers for cell (may refer to cell size too), such as TGase or proPO genes.

      3) There are many spindle-shaped cells in Figure 6B, but none of them appeared in Figure 6C and D after sorting, and the reason for this is unclear.

      We don't have any idea why the cells were deformed either, and we think this is exactly why it is so difficult to classify hemocytes by morphologically. This reason is unknown as cell culture is also not currently possible.

      4) The hemocyte differentiation model in Figure 7 is not supported by any experimental data.

      We understood your comment. Since we could not conduct any functional research about marker genes, we have removed figure 7.

    1. Author Response:

      Reviewer #1 (Public Review):

      Strengths:

      1) The model structure is appropriate for the scientific question.

      2) The paper addresses a critical feature of SARS-CoV-2 epidemiology which is its much higher prevalence in Hispanic or Latino and Black populations. In this sense, the paper has the potential to serve as a tool to enhance social justice.

      3) Generally speaking, the analysis supports the conclusions.

      Other considerations:

      1) The clean distinction between susceptibility and exposure models described in the paper is conceptually useful but is unlikely to capture reality. Rather, susceptibility to infection is likely to vary more by age whereas exposure is more likely to vary by ethnic group / race. While age cohort are not explicitly distinguished in the model, the authors would do well to at least vary susceptibility across ethnic groups according to different age cohort structure within these groups. This would allow a more precise estimate of the true effect of variability in exposures. Alternatively, this could be mentioned as a limitation of the the current model.

      We agree that this would be an important extension for future work and have indicated this in the Discussion, along with the types of data necessary to fit such models:

      “Fourth, due to data availability, we have only considered variability in exposure due to one demographic characteristic; models should ideally strive to also account for the effects of age on susceptibility and exposure within strata of race and ethnicity and other relevant demographics, such as socioeconomic status and occupation \cite{Mulberry2021-tc}. These models could be fit using representative serological studies with detailed cross-tabulated seropositivity estimates.”

      2) I appreciated that the authors maintained an agnostic stance on the actual value of HIT (across the population & within ethnic groups) based on the results of their model. If there was available data, then it might be possible to arrive at a slightly more precise estimate by fitting the model to serial incidence data (particularly sorted by ethnic group) over time in NYC & Long Island. First, this would give some sense of R_effective. Second, if successive waves were modeled, then the shift in relative incidence & CI among these groups that is predicted in Figure 3 & Sup fig 8 may be observed in the actual data (this fits anecdotally with what I have seen in several states). Third, it may (or may not) be possible to estimate values of critical model parameters such as epsilon. It would be helpful to mention this as possible future work with the model.

      Caveats about the impossibility of truly measuring HIT would still apply (due to new variants, shifting use & effective of NPIs, etc….). However, as is, the estimates of possible values for HIT are so wide as to make the underlying data used to train the model almost irrelevant. This makes the potential to leverage the model for policy decisions more limited.

      We have highlighted this important limitation in the Discussion:

      “Finally, we have estimated model parameters using a single cross-sectional serosurvey. To improve estimates and the ability to distinguish between model structures, future studies should use longitudinal serosurveys or case data stratified by race and ethnicity and corrected for underreporting; the challenge will be ensuring that such data are systematically collected and made publicly available, which has been a persistent barrier to research efforts \cite{Krieger2020-ss}. Addressing these data barriers will also be key for translating these and similar models into actionable policy proposals on vaccine distribution and non-pharmaceutical interventions.”

      3) I think the range of R0 in the figures should be extended to go as as low as 1. Much of the pandemic in the US has been defined by local Re that varies between 0.8 & 1.2 (likely based on shifts in the degree of social distancing). I therefore think lower HIT thresholds should be considered and it would be nice to know how the extent of assortative mixing effects estimates at these lower R_e values.

      We agree this would be of interest and have extended the range of R0 values. Figure 1 has been updated accordingly (see below); we also updated the text with new findings: “After fitting the models across a range of $\epsilon$ values, we observed that as $\epsilon$ increases, HITs and epidemic final sizes shifted higher back towards the homogeneous case (Figure \ref{fig:model2}, Figure 1-figure supplement 4); this effect was less pronounced for $R_0$ values close to 1.”

      Figure 1: Incorporating assortativity in variable exposure models results in increased HITs across a range of $R_0$ values. Variable exposure models were fitted to NYC and Long Island serosurvey data.

      4) line 274: I feel like this point needs to be considered in much more detail, either with a thoughtful discussion or with even with some simple additions to the model. How should these results make policy makers consider race and ethnicity when thinking about the key issues in the field right now such as vaccine allocation, masking, and new variants. I think to achieve the maximal impact, the authors should be very specific about how model results could impact policy making, and how we might lower the tragic discrepancies associated with COVID. If the model / data is insufficient for this purpose at this stage, then what type of data could be gathered that would allow more precise and targeted policy interventions?

      We have conducted additional analyses exploring the important suggestion by the reviewers that social distancing could affect these conclusions. The text and figures have been updated accordingly:

      “Finally, we assessed how robust these findings were to the impact of social distancing and other non- pharmaceutical interventions (NPIs). We modeled these mitigation measures by scaling the transmission

      rate by a factor $\alpha$ beginning when 5\% cumulative incidence in the population was reached. Setting the duration of distancing to be 50 days and allowing $\alpha$ to be either 0.3 or 0.6 (i.e. a 70\% or 40\% reduction in transmission rates, respectively), we assessed how the $R_0$ versus HIT and final epidemic size relationships changed. We found that the $R_0$ versus HIT relationship was similar to in the unmitigated epidemic (Figure 1-figure supplement 5). In contrast, final epidemic sizes depended on the intensity of mitigation measures, though qualitative trends across models (e.g. increased assortativity leads to greater final sizes) remained true (Figure 1-figure supplement 6). To explore this further, we systematically varied $\alpha$ and the duration of NPIs while holding $R_0$ constant at 3. We found again that the HIT was consistent, whereas final epidemic sizes were substantially affected by the choice of mitigation parameters (Figure 1-figure supplement 7); the distribution of cumulative incidence at the point of HIT was also comparable with and without mitigation measures (Figure 2-figure supplement 8). The most stringent NPI intensities did not necessarily lead to the smallest epidemic final sizes, an idea which has been explored in studies analyzing optimal control measures \cite{Neuwirth2020- nb,Handel2007-ee}. Longitudinal changes in incidence rate ratios also were affected by NPIs, but qualitative trends in the ordering of racial and ethnic groups over time remained consistent (Figure 3- figure supplement 3).

      Figure 1-figure supplement 6: Final epidemic sizes versus $R_0$ in variable exposure models with mitigation measures for $\alpha = 0.3$ (top) and $\alpha = 0.6$ (bottom). NPIs were initiated when cumulative incidence reached 5\% in all models and continued for 50 days. Models were fitted to NYC and Long Island serosurvey data.

      Figure 1-figure supplement 7: Sensitivity analysis on the impact of intensity and duration of NPIs on final epidemic sizes. HIT values for the same mitigation parameters were 46.4 $\pm$ 0.5\% (range). The smallest final size, corresponding to $\alpha = 0.6$ and duration = 100, was 51\%. Census-informed assortativity models were fit to Long Island seroprevalence data. NPIs were initiated when cumulative incidence reached 5\% in all models.

      See points 1 and 2 above for examples of additional data required.

      Minor issues:

      -This is subjective but I found the words "active" and "high activity" to describe increases in contacts per day to be confusing. I would just say more contacts per day. It might help to change "contacts" to "exposure contacts" to emphasize that not all contacts are high risk.

      To clarify this, we have replaced instances of “activity level” (and similar) with “total contact rate”, indicating the total number of contacts per unit time per individual; e.g. “The estimated total contact rate ratios indicate higher contacts for minority groups such as Hispanics or Latinos and non-Hispanic Black people, which is in line with studies using cell phone mobility data \cite{Chang2020-in}; however, the magnitudes of the ratios are substantially higher than we expected given the findings from those studies.”

      We have also clarified our definition of contacts: “We define contacts to be interactions between individuals that allow for transmission of SARS-CoV-2 with some non-zero probability.”

      -The abstract has too much jargon for a generalist journal. I would avoid words like "proportionate mixing" & "assortative" which are very unique to modeling of infectious diseases unless they are first defined in very basic language.

      We have revised the abstract to convey these same concepts in a more accessible manner: “A simple model where interactions occur proportionally to contact rates reduced the HIT, but more realistic models of preferential mixing within groups increased the threshold toward the value observed in homogeneous populations.”

      -I would cite some of the STD models which have used similar matrices to capture assortative mixing.

      We have added a reference in the assortative mixing section to a review of heterogeneous STD models: “Finally, under the \textit{assortative mixing} assumption, we extended this model by partitioning a fraction $\epsilon$ of contacts to be exclusively within-group and distributed the rest of the contacts according to proportionate mixing (with $\delta_{i,j}$ being an indicator variable that is 1 when $i=j$ and 0 otherwise) \cite{Hethcote1996-bf}:”

      -Lines 164-5: very good point but I would add that members of ethnic / racial groups are more likely to be essential workers and also to live in multigenerational houses

      We have added these helpful examples into the text: “Variable susceptibility to infection across racial and ethnic groups has been less well characterized, and observed disparities in infection rates can already be largely explained by differences in mobility and exposure \cite{Chang2020-in,Zelner2020- mb,Kissler2020-nh}, likely attributable to social factors such as structural racism that have put racial and ethnic minorities in disadvantaged positions (e.g., employment as frontline workers and residence in overcrowded, multigenerational homes) \cite{Henry_Akintobi2020-ld,Thakur2020-tw,Tai2020- ok,Khazanchi2020-xu}.”

      -Line 193: "Higher than expected" -> expected by who?

      We have clarified this phrase: “The estimated total contact rate ratios indicate higher exposure contacts for minority groups such as Hispanics or Latinos and non-Hispanic Black people, which is in line with studies using cell phone mobility data \cite{Chang2020-in}; however, the magnitudes of the ratios are substantially higher than we expected given the findings from those studies.”

      -A limitation that needs further mention is that fact that race & ethnic group, while important, could be sub classified into strata that inform risk even more (such as SES, job type etc….)

      We agree and have added this to the Discussion: “Fourth, due to data availability, we have only considered variability in exposure due to one demographic characteristic; models should ideally strive to also account for the effects of age on susceptibility and exposure within strata of race and ethnicity and other relevant demographics, such as socioeconomic status and occupation \cite{Mulberry2021-tc}. These models could be fit using representative serological studies with detailed cross-tabulated seropositivity estimates.”

      Reviewer #2 (Public Review):

      Overall I think this is a solid and interesting piece that is an important contribution to the literature on COVID-19 disparities, even if it does have some limitations. To this point, most models of SARS-CoV-2 have not included the impact of residential and occupational segregation on differential group-specific covid outcomes. So, the authors are to commended on their rigorous and useful contribution on this valuable topic. I have a few specific questions and concerns, outlined below:

      We thank the reviewer for the supportive comments.

      1) Does the reliance on serosurvey data collected in public places imply a potential issue with left-censoring, i.e. by not capturing individuals who had died? Can the authors address how survival bias might impact their results? I imagine this could bring the seroprevalence among older people down in a way that could bias their transmission rate estimates.

      We have included this important point in the limitations section on potential serosurvey biases: “First, biases in the serosurvey sampling process can substantially affect downstream results; any conclusions drawn depend heavily on the degree to which serosurvey design and post-survey adjustments yield representative samples \cite{Clapham2020-rt}. For instance, because the serosurvey we relied on primarily sampled people at grocery stores, there is both survival bias (cumulative incidence estimates do not account for people who have died) and ascertainment bias (undersampling of at-risk populations that are more likely to self-isolate, such as the elderly) \cite{Rosenberg2020-qw,Accorsi2021-hx}. These biases could affect model estimates if, for instance, the capacity to self-isolate varies by race or ethnicity -- as suggested by associations of neighborhood-level mobility versus demographics \cite{Kishore2020- sy,Kissler2020-nh} -- leading to an overestimate of cumulative incidence and contact rates in whites.”

      2) It might be helpful to think in terms of disparities in HITs as well as disparities in contact rates, since the HIT of whites is necessarily dependent on that of Blacks. I'm not really disagreeing with the thrust of what their analysis suggests or even the factual interpretation of it. But I do think it is important to phrase some of the conclusions of the model in ways that are more directly relevant to health equity, i.e. how much infection/vaccination coverage does each group need for members of that group to benefit from indirect protection?

      We agree with this important point and indeed this was the goal, in part, of the analyses in Figure 2. We have added additional text to the Discussion highlighting this: “Projecting the epidemic forward indicated that the overall HIT was reached after cumulative incidence had increased disproportionately in minority groups, highlighting the fundamentally inequitable outcome of achieving herd immunity through infection. All of these factors underscore the fact that incorporating heterogeneity in models in a mechanism-free manner can conceal the disparities that underlie changes in epidemic final sizes and HITs. In particular, overall lower HIT and final sizes occur because certain groups suffer not only more infection than average, but more infection than under a homogeneous mixing model; incorporating heterogeneity lowers the HIT but increases it for the highest-risk groups (Figure \ref{fig:hitcomp}).”

      For vaccination, see our response to Reviewer #1 point 4.

      3) The authors rely on a modified interaction index parameterized directly from their data. It would be helpful if they could explain why they did not rely on any sources of mobility data. Are these just not broken down along the type of race/ethnicity categories that would be necessary to complete this analysis? Integrating some sort of external information on mobility would definitely strengthen the analysis.

      This is a great suggestion, but this type of data has generally not been available due to privacy concerns from disaggregating mobility data by race and ethnicity (Kishore et al., 2020). Instead, we modeled NPIs as mentioned in Reviewer #1 point 4, with the caveat that reduction in mobility was assumed to be identical across groups. We added this into the text explicitly as a limitation: “Third, we have assumed the impact of non-pharmaceutical interventions such as stay-at-home policies, closures, and the like to equally affect racial and ethnic groups. Empirical evidence suggests that during periods of lockdown, certain neighborhoods that are disproportionately wealthy and white tend to show greater declines in mobility than others \cite{Kishore2020-sy,Kissler2020-nh}. These simplifying assumptions were made to aid in illustrating the key findings of this model, but for more detailed predictive models, the extent to which activity level differences change could be evaluated using longitudinal contact survey data \cite{Feehan2020-ta}, since granular mobility data are typically not stratified by race and ethnicity due to privacy concerns \cite{Kishore2020-mg}.”

      Reviewer #3 (Public Review):

      Ma et al investigate the effect of racial and ethnic differences in SARS-CoV-2 infection risk on the herd immunity threshold of each group. Using New York City and Long Island as model settings, they construct a race/ethnicity-structured SEIR model. Differential risk between racial and ethnic groups was parameterized by fitting each model to local seroprevalence data stratified demographically. The authors find that when herd immunity is reached, cumulative incidence varies by more than two fold between ethnic groups, at approximately 75% of Hispanics or Latinos and only 30% of non-Hispanic Whites.

      This result was robust to changing assumptions about the source of racial and ethnic disparities. The authors considered differences in disease susceptibility, exposure levels, as well as a census-driven model of assortative mixing. These results show the fundamentally inequitable outcome of achieving herd immunity in an unmitigated epidemic.

      The authors have only considered an unmitigated epidemic, without any social distancing, quarantine, masking, or vaccination. If herd immunity is achieved via one of these methods, particularly vaccination, the disparities may be mitigated somewhat but still exist. This will be an important question for epidemiologists and public health officials to consider throughout the vaccine rollout.

      We thank the reviewer for the detailed and helpful summary and suggestions.

    2. Reviewer #1 (Public Review):

      Strengths:

      1) The model structure is appropriate for the scientific question.

      2) The paper addresses a critical feature of SARS-CoV-2 epidemiology which is its much higher prevalence in Hispanic or Latino and Black populations. In this sense, the paper has the potential to serve as a tool to enhance social justice.

      3) Generally speaking, the analysis supports the conclusions.

      Other considerations:

      1) The clean distinction between susceptibility and exposure models described in the paper is conceptually useful but is unlikely to capture reality. Rather, susceptibility to infection is likely to vary more by age whereas exposure is more likely to vary by ethnic group / race. While age cohort are not explicitly distinguished in the model, the authors would do well to at least vary susceptibility across ethnic groups according to different age cohort structure within these groups. This would allow a more precise estimate of the true effect of variability in exposures. Alternatively, this could be mentioned as a limitation of the the current model.

      2) I appreciated that the authors maintained an agnostic stance on the actual value of HIT (across the population & within ethnic groups) based on the results of their model. If there was available data, then it might be possible to arrive at a slightly more precise estimate by fitting the model to serial incidence data (particularly sorted by ethnic group) over time in NYC & Long Island. First, this would give some sense of R_effective. Second, if successive waves were modeled, then the shift in relative incidence & CI among these groups that is predicted in Figure 3 & Sup fig 8 may be observed in the actual data (this fits anecdotally with what I have seen in several states). Third, it may (or may not) be possible to estimate values of critical model parameters such as epsilon. It would be helpful to mention this as possible future work with the model.

      Caveats about the impossibility of truly measuring HIT would still apply (due to new variants, shifting use & effective of NPIs, etc....). However, as is, the estimates of possible values for HIT are so wide as to make the underlying data used to train the model almost irrelevant. This makes the potential to leverage the model for policy decisions more limited.

      3) I think the range of R0 in the figures should be extended to go as as low as 1. Much of the pandemic in the US has been defined by local Re that varies between 0.8 & 1.2 (likely based on shifts in the degree of social distancing). I therefore think lower HIT thresholds should be considered and it would be nice to know how the extent of assortative mixing effects estimates at these lower R_e values.

      4) line 274: I feel like this point needs to be considered in much more detail, either with a thoughtful discussion or with even with some simple additions to the model. How should these results make policy makers consider race and ethnicity when thinking about the key issues in the field right now such as vaccine allocation, masking, and new variants. I think to achieve the maximal impact, the authors should be very specific about how model results could impact policy making, and how we might lower the tragic discrepancies associated with COVID. If the model / data is insufficient for this purpose at this stage, then what type of data could be gathered that would allow more precise and targeted policy interventions?

      Minor issues:

      -This is subjective but I found the words "active" and "high activity" to describe increases in contacts per day to be confusing. I would just say more contacts per day. It might help to change "contacts" to "exposure contacts" to emphasize that not all contacts are high risk.

      -The abstract has too much jargon for a generalist journal. I would avoid words like "proportionate mixing" & "assortative" which are very unique to modeling of infectious diseases unless they are first defined in very basic language.

      -I would cite some of the STD models which have used similar matrices to capture assortative mixing.

      -Lines 164-5: very good point but I would add that members of ethnic / racial groups are more likely to be essential workers and also to live in multigenerational houses

      -Line 193: "Higher than expected" -> expected by who?

      -A limitation that needs further mention is that fact that race & ethnic group, while important, could be sub classified into strata that inform risk even more (such as SES, job type etc....)

    1. Author Response:

      We thank you for the careful review and the opportunity to resubmit this manuscript. We particularly acknowledge the reviewer who helped to clarify the statistical arguments and stimulated our re-analysis of all results. This re-analysis has helped to change the focus of the work to identify significantly variable (higher) familial cancer risks in several race/ethnically described minority groups in the US, which we feel has broadened the message stimulating a word change in the title.

      Reviewer #1 (Public Review):

      This is a very well written and comprehensive paper that is a valuable contribution to the literature of childhood cancers. It shows that some childhood cancers have an inherited component and the risk could be to the mother or to the siblings. Although the relative risks are significant, childhood cancer is fortunately rare and the actual risk to the siblings is small.

      Can we assume this is less than one percent? i think it would be helpful to provide some absolute risk numbers for the siblings so that parents could be reassured that the risk to other children is small.

      Response: We appreciate this comment on absolute risk. It is true that the actual risk is very small given the rarity of childhood cancers. We calculated the overall absolute risk for mothers and siblings of a proband and compared it with the general population. It now reads “Moreover, due to the rarity of childhood cancers, the absolute risk is very small, but still higher among young siblings and mothers in the current study (0.074%) compared to general population (0.023%) of the same age group” in line 316 of the Discussion section.

      Do the authors have a suggestion on what genetic tests should be done on children with cancer? Do you have recommendations to make? i assume that the authors do not recommend screening of siblings for cancer except in rare cases. It would be useful to see what the authors recommend.

      Response: In this manuscript we do not provide clinical recommendations as we feel that is out of the scope of this research. Instead, we are making several points:

      1) That conventional US-based birth and cancer registries can be utilized to study familial-based cancer risks.

      2) That different ethnic groups appear to have different familial risks for some cancer subtypes.

      3) Early onset parental cancers can add information about familial-based risks.

      4) Second primary malignancies are enriched in families that exhibit familial risks (line 260 of the Results section). These characteristics will provide useful information for genetic counselors who need to advise families on their own decisions about genetic testing and family planning. At the present time the genetic counseling clinical discipline is tasked to make specific recommendations to families about screening siblings for cancer and presence of cancer predisposition alleles, such advice is stimulated by examining family history of cancer. Our work suggests that Latino families may have a higher risk of familial alleles in solid tumors overall, which may promote more attention or scrutiny of families by ethnicity.

      Are there some sites where the risk to siblings is there but not to parents which might suggest recessive inheritance?

      Response: this is an interesting question, but there are two reasons why our study may not be adequate to assess this. First, our sample size may not be large enough to adequately study this point. The risk to cancer in the general population is higher in children than it is in young adults – and therefore the low numbers of cancer in mothers that we see is largely a reflection of the low risk of cancer in young adults, since we cut off our observational age at 26 (due to the extent of follow-up on our young population). There is a lack of cancer at many of the ICC-03 defined childhood cancer sites among our parents, making it impossible to estimate cancer risk in the adults. Second, childhood cancers are biologically distinct from adults, so the risk imparted for childhood cancer from predisposition alleles that affect those cancers may not always have any effect on young adult cancers. Additionally, the progenitor cells at risk from childhood cancer may have differentiated, leading to no cells “at risk” of transformation after adolescence and the effect of childhood cancer predisposition alleles on those adult cancers not a meaningful comparison. Of course, there are exceptions to this such as TP53 alleles which affect cancer risk of many subtypes at any age.

      If the childhood cancer is rare and fatal one might not see it in the parents because of loss or reproductive fitness. Please comment.

      Response: We appreciate this comment a lot and have the same concern that patients with cancer that have a strong genetic cancer predisposition may not be capable to reproduce (even if the patient survives). We added a comment in the discussion section, and it now reads “Furthermore, it is likely that the low number of mothers with cancer is a result of bias against some very strong cancer predisposition alleles, so the patients could not survive long enough or be healthy enough to reproduce” on line 408.

      Should we assume that the higher risks for Latino children are purely due to genetic influences? Could there be environmental factors at play as well?

      Response: We appreciate this comment and totally agree that environmental factors also play a role. Not only genetic factors, but also the environmental factors, and the interaction between genetic and environmental factors would contribute to the variation in relative risks. We have addressed this point in lines 341 (“This familial concordance is likely due to both shared genetic and environmental…”) and 419 (“Second, the comparative attributable fraction of familial risk based on environmental risk factors interacting…”) of the discussion section. We believe that this point should stimulate further research, and we are constructing our own future studies to explore environmental factors along with genetics.

      Reviewer #2 (Public Review):

      [...] Although the authors comment that the results from the Chi-Sq test are not consistent with the specific group SIRs and 95%CIs, they do not explain how these results can be so different.

      I am concerned that there is either an error in the calculations or an error in the assumptions. It is not acceptable to have such contradictory results between the two distinct methods.

      For example, for hematological cancers the 95% CI for Latinos is entirely contained within the 95%CI for Non-Latino white, while this gives a p less than 0.05. The authors need to explore why these methods are giving very different answers and be clear that the low p-values are not simply an artifact of poor assumptions.

      Response: We sincerely appreciate the comments from Reviewer 2. And we want to thank Reviewer 2 for pushing on the inconsistency between confidence intervals and p-value comparing the SIRs between race/ethic groups. While overlapping CI’s do not necessarily indicate a lack of significance in the effect sizes, the apparent contrast in these statistical measures was too extreme to be believable and indeed there was an error.

      We reconstructed our data from scratch and recalculated all statistical comparisons with our statistician, Dr. W. J. Gauderman, and found a recurrent mistake in the calculation of p-value comparing the SIRs between race/ethic groups. We have corrected this mistake throughout the manuscript. Please refer to the new Figure 1, 3, and supplementary materials for the corrected numbers. The p values are now somewhat attenuated, and significant differences between Latinos and NL whites persist for solid tumors. In addition, Asians have significantly increased familial risk for hematologic cancers, and non-Latino Blacks have significantly increased risk of solid tumors when compared to non-Latino whites. Because of this broader enhanced risk evident in minority groups (with the corrected statistical comparisons), the focus of the manuscript was changed slightly emphasizing higher risks among minority groups in respective hematologic and solid tumor categories. There were also SIR differences suggested between many individual types of cancer, while not reaching formal statistical significance.

    1. Author Response:

      Reviewer #1 (Public Review):

      This Research Advance builds on the findings of this group's 2019 eLife paper which showed that conserved acidic and basic helices associate to enable heteropolymer formation by Snf7 and Vps24. This work provides some general structure/sequence relationships among the homologous ESCRT-III proteins that will be of interest to those in the ESCRT field. While there are no new mechanistic principles obtained from this study, the data allow the authors to propose a model of the minimal or core units needed for ESCRT-III membrane remodeling.

      The focus is largely on similarities and differences between the closely related Vps24 and Vps2, where they show that a few key point mutations or chimeric swaps (for Vps4 binding by the C-terminal region of Vps2) can exchange their functions. The last portion of the paper further tests similarities within the subgroups of ESCRT-III proteins to experimentally test functional groupings defined by sequence relationships.

      We thank the reviewer for their generous comments. We’d like to emphasize that one of the main focus behind this study is to be able to generate minimal ESCRT-III system that can be functional. We study Vps24 and Vps2 to generate a model ESCRT-III module with their specific properties. We previously engineered Snf7 to replace Vps20 (and other ESCRT components, eLife 2016). In this paper, we also extend some of the analysis to other ESCRT-III components. We agree that this current manuscript combines previously described mechanisms to understand the minimal ESCRT-III system and provides us a direction to understand why in some cases (for example archaeal system), there may be only two ESCRT-III subunits. This work, following up on previous works from our lab and others, takes us one additional step toward that direction.

      In addition, we’d also like to highlight from our work that in yeast, MVB biogenesis does have strong contributions from Did2 (CHMP1) and Vps60 (CHMP5), but not from Ist1 (IST1) and Chm7 (CHMP7) (Fig. 5). These have previously been under-emphasized in the literature.

      Reviewer #2 (Public Review):

      The manuscript by Emr and colleagues addresses the important question of how core ESCRT-III members Vps2 and Vps24 interact to form functional polymers using protein engineering and genetic selection approaches.

      Major findings are:

      Vps2 overexpression can functionally replace Vps24 in MVB sorting.

      Helix 1 N21K, T28A, E31K mutations, Vps2, were identified to be sufficient for suppression, concluding that Vps2 and its' over expression can replace the function of Vps24 and Vps2.

      Vps24 over expression does not rescue delta Vps2. The authors propose that this is due to the lack of the MIM and helix5 binding sites for Vps4 present in Vps2.

      Vps24 E114K mutation was identified to rescue deltaVps2 upon over expression and even better as a Vps24/Vps2 chimera suggesting that auto-activated Vps24 that can recruit Vps4 can functionally replace Vps2.

      Analyzing the effect of single ESCRT-III deletions on Mup1 sorting confirmed Snf7, Vps20, Vps2 and Vps24 as essential for sorting.

      In summary, the manuscript provides new insight into the assembly of ESCRT-III. It confirms some redundancy of VPS2 and Vps24 and shows how Vps2 can substitute Vps24 but not vice versa.

      We thank the reviewer for this summary of our work. One point we’d like to emphasize is that while we agree that Snf7, Vps20, Vps2 and Vps24 form a minimal core subunit to form MVBs, there are important functions of other ESCRT-III molecules Did2 and Vps60 (Figure 5 and supplement) for MVB biogenesis.

      Comments:

      The three minimal principles for ESCRT-III assembly stated in the abstract are not novel. Spiral formation of ESCRT-III has been described before for yeast Vps2-Vps24 as well as its mammalian homologues. The requirement for VPS4 recruitment is also well documented and finally, the manuscript does not provide proof for lateral association of the spirals via hetero-polymerization.

      We agree with the first two comments about spiral formation and Vps4 recruitment. We’d like to emphasize that the lateral association through heteropolymerization mechanism extends from our previous work (eLife 2019) and supported by this work through mutational analysis of Vps2’s helix-1 motif. In our previous work, we provided evidence of the association of Snf7’s helix-4 region with Vps24’s helix-1 region, and also lateral association of Snf7 and Vps24/Vps2 with in vitro assays. In the previous work, we didn’t characterize Vps2-Snf7 interaction, which we do further in this work. We find that charge-inversion mutations in Vps2 increases its affinity to Snf7, and this effect is sufficient to replace Vps24. We believe that these analyses strengthen our model and also enhance our knowledge of ESCRT-III polymerization. Therefore, this manuscript a strong extension/advance on our previous eLife paper, and both papers should be analyzed together.

      The authors show that 8-fold over expression is necessary to rescue Mup1 sorting to an extent of 40%. The authors hypothesize that over expression of Vps2 can rescue Vps24 deletion because Vps2 may have a lower affinity for Snf7 than Vps24. This is in agreement with data on mammalian homologues which showed that indeed CHMP3 binds with 10x higher affinity to CHMP4B than CHMP2A (Effantin et al, 2012). This could have been included in the discussion, since the function of yeast and mammalian core ESCRT-III proteins is most likely not different.

      We apologize for this oversight and have included appropriate reference to this paper in the next version.

      The authors designed several chimeric Vps24/Vps2 constructs and show that some of the Vps24 chimera including Vps2 helix 5 and the MIM are fully functional in Mup1 sorting in delta Vps24 cells, but lack the ability to functionally replace Vps2 in Vps2 delta cells. It is unclear whether the chimeras are in the closed conformation in the cytosol. It would be interesting to know whether they are activated more easily and possibly prematurely.

      With our current assays we cannot distinguish the open vs. closed conformations in solution vs. membrane for Vps24. We do not think that these chimeras are activated prematurely because they do remain functional (as highlighted by the reviewer) in vps24∆ strain.

      We’d like to thank the reviewer for pointing us to these mutants, which have encouraged us to further study these and related chimeras. To understand the role of swapping the Vps2 helix5 and MIM region further, we have added a couple of more experiments that would allow us to further understand the role of these motifs.

      We replaced the helix-5 and MIM regions of Vps2 onto Snf7 to ask whether this construct remains functional, and whether they can replace function of Vps24-Vps2 (by directly recruiting Vps4).

      In these set of data, we present evidence that when incorporated into Snf7, the helices 5 and MIM motifs of Vps2 make this chimeric Snf7 dysfunctional (Fig. 3 – Supp. 3). These data are consistent with the reviewers’ interpretation that premature recruitment of Vps4 to ESCRT-III filaments is presumably dysfunctional. However, inclusion of these motifs to Vps24 most likely does not prematurely disassemble ESCRT-III filaments, hence they remain functional. Also, mere substitution of the H5 and MIM motif to Snf7 (and therefore the Vps4 binding) is not sufficient for ESCRT-III function in cells.

      The larger point behind this set of analyses is that there are additional functions of Vps24-Vps2 beyond just recruitment of the AAA+ ATPase Vps4. Since we extensively analyzed the lateral association of Vps24-Vps2 to Snf7 in our previous manuscript (Banjade et al., eLife 2019), we ascribe these additional functions to lateral polymerization of Vps24-Vps2 on the Snf7 filament.

      The authors show that Vps24 E114K can form some kind of polymers in the presence of Vps2 in vitro while no polymerization is observed for wt Vps24 at 1 µM. It would be interesting to know whether wt Vps24 polymerizes at higher concentrations in this assay.

      We don’t observe polymers with 15 µM of Vps24 and 15 µM of Vps2, as the proteins start forming amorphous assemblies. We do refer to other manuscripts in the past who have observed similar linear polymers of Vps24 at higher concentrations (>300 µM) and longer incubation. So we believe that the ESCRT-III proteins Vps24 and Vps2 are able to form copolymers with a similar structure that is enhanced when these “activating” mutations are included.

      While the conclusion that E114K shifts the equilibrium to the open state is plausible, there is no evidence provided that this mimics Vps2 as stated. If so, Vps24 E114k should form the same polymers as shown in figure 4 supp 1 in the absence of Vps2 and spiral formation with snf7 should not require Vps2.

      We agree with this interpretation from the in vitro assays, and have appropriately changed the language in the manuscript. We now describe the effect of the E114K protein to “enhance” associated with existing Vps2. We hypothesize that this enhanced association to Vps2 occurs due to an “activation” process whereby Vps24 adopts a higher population of an open (or a semi-open) conformation, and have changed the language to reflect this interpretation. As an aside, we do note that Snf7 and Vps24 do form helices at higher concentrations without Vps2, as we showed in Banjade et al., eLife 2019.

      The speculation in the results section that Vps24 may not extend its helices 2 and 3 in an activated form due to potential helix breaking Asn residues in the linker region is not backed up by data, and it would have been appropriate to indicate this in the manuscript.

      We have now moved this analysis to the discussion and emphasized that this is a hypothesis. We also added the following sentence when describing the data regarding the mutations in the potential helix-breaking Asn residues: “We note that these data are indicative of mutations that control the conformations of the proteins. However, further biophysical analyses will be required for definitive evidence of this conformational flexibility.”

      The proposal that Vps2-Vps24 heteropolymers are formed by interactions along helices 2 and 3 is not supported by data presented in the manuscript. The authors would need to use recombinant proteins to test their mutants in biophysical interaction studies.

      We have now moved this interpretation to the discussion. Further dissection with biochemical and biophysical assays of Vps24-Vps2 would be a future direction in this area.

      Reviewer #3 (Public Review):

      This study sought to identify essential features of ESCRT-III subunits, with a focus on the yeast proteins Vps2 and Vps24, in order to reveal the required features of both subunits. The combined genetic and biochemical studies solidified the model that essential functions of ESCRT-III polymers - spiral formation, lateral association, and binding of Vps4 - are mostly distributed between different subunits (with some redundancy) and can be engineered into a single polypeptide. This study also sheds light on the long-standing and initially surprising finding that ESCRT-dependent budding of HIV does not require CHMP3 (Vps24), presumably because the distribution of distinct functions between different ESCRT-III subunits is not absolute.

      Inspired by earlier studies, the ability of overexpression of one ESCRT-III subunit to compensate for deletion of another subunit was explored using sorting assays. The demonstration of partial rescue inspired a mutagenesis approach that identified three residues that cluster on one face of a helix that enhanced rescue, and therefore confer functionality that in wt is primarily provided in the deleted subunits, which in this case is binding to Snf7. Extension of this analysis by protein engineering further demonstrated that the essential role of recruiting the Vps4 ATPase is normally performed by Vps2 but can be transferred to Vps24 by substitution of residues near the ESCRT-III subunit C-terminus. Similarly, it is shown that sequences that alter the propensity for bending of a helix at a point where open and closed ESCRT-III subunits differ in conformation contributed to the ability of Vps24 to substitute for deletion of Vps2, presumably by conferring the ability to adopt the open, activated conformation as well as the closed conformation.

      I don't have concerns about design or technical aspects of the experimental approach.

      We appreciate the reviewer’s comments and the summary of our work.

    1. Author Response:

      Reviewer #1:

      The authors sought to assess the relationship between developmental lineage and connectivity.

      This is a tour de force. It relies on detailed EM reconstructions, knowledge of complete neuroblast lineages thus correlating wiring with lineage, and through genetic manipulations of N gene function correlates developmental programs with wiring. The conclusion is important and provides a well described cellular and genetic system for linking the developmental program of a cell to its connection specificity. It provides a framework for considering how to study these questions in other regions of Drosophila and can be extended to the study of more complex mammalian systems where a similar neuroblast-lineage strategy generates different neuron types.

      There are no major weakness.

      This is an excellent study and, in my opinion, is ready to publish in its current form.

      We appreciate this comment!

      Reviewer #2:

      The conclusions of this paper are mostly well supported by data, however, there are several points that should be discussed further in the manuscript:

      1) The authors state that overexpression of Notchintra transforms Notch OFF neurons into Notch ON neurons. However, since this decision happens at the level of the GMC, wouldn't be more correct to say that Notch OFF neurons were not produced and only Notch ON neurons were generated? Moreover, the authors state that the Notchintra overexpression phenotypes are due to hemilineage transformation rather than to death of Notch OFF neurons, by providing the total neuronal number in both experimental conditions using NB5-2 lineage. I think this statement is too much of a generalization when only one NB lineage has been analyzed and should be addressed in more lineages to claim this as a general mechanism. Moreover, the opposite hypothesis could have also been tested to make the argument stronger: Would depletion of Notch in GMCs make all neurons in a lineage target the ventral neuropil domain?

      We agree, and now provide cell counts for WT and Notch-intra in all four lineages (5-2, 7-1, 7-4, and 1-2) in the text. In all cases, the number of neurons in wild type and Notch-intra lineages are not significantly different, supporting the Notch OFF to Notch ON transformation. We don't say that Notch-OFF neurons are missing, because there is no loss of neurons from the lineage, but rather the neurons that would have been Notch-OFF in wild type are now duplicating the Notch-ON neurons. Regarding presenting the opposite transformation, we tried to do it with misexpressing UAS-numb, but were unable to get the expected positive control phenotype in which all five Eve+ U neurons are transformed to Eve-negative siblings (Skeath and Doe, 1998). Thus, we were not able to do lineage-specific Notch inhibition. Unfortunately, we can’t use whole embryo N or N pathway mutants, as has been done before (Skeath and Doe, 1998), because they have massive disruption in the CNS that obscures lineage specific axon phenotypes.

      2) Temporal cohorts described in this work are an approximation to neuronal temporal identity. The authors validate the correlation of early and late temporal cohorts to the expression of the temporal TFs Hb and Cas (Fig 4G). Given the resolution of the TEM dataset and the existence of specific NBs and neuronal drivers for the neurons studied, a correlation between the 4 temporal cohorts presented in this work and the 4 temporal TFs Hb, Kr, Pdm and Cas expressed by these neurons could have been possible and would have presented a more comprehensive view of the relationship between tTF expression and neurite and synapse localization. Does temporal cohort between lineages (cortex neurite length) mean expression of the same temporal TF? For example: would mid-early neurons in different lineages express the same temporal factor?

      Excellent question! We show that radial position is a proxy for temporal identity, but the precise relationship of Hb, Kr, Pdm, and Cas expressing neurons to the four radial “bins” we describe remains unknown. In fact, a graduate student is doing these experiments by generating MCFO single neuron clones in newly hatched larvae (the stage of the TEM volume) and staining with Hb, Kr, and Cas temporal transcription factors (it is impossible to so this with Pdm because neurons lose expression at stage 15). This will be many months of work and probably over a thousand MCFO+ neurons to analyze, and we feel it is beyond the scope of the paper -- although very important and very interesting! Plus, we are still limited in lab time due to University of Oregon covid restrictions.

      Since shared temporal identity between different lineages on its own does not confer shared neuronal projections, but shared temporal cohort hemilineage does: Does this mean that the expression of a given temporal TF and/or neuronal birth order does not play a role in this shared connectivity? Please clarify these ideas in the text.

      We have tried to clarify this in the text. Whereas temporal identity alone has no detectable role in generating common synapse localization or connectivity, it does have some role in the context of hemilineage identity. That is, hemilineage temporal cohorts have more shared synapse localization and connectivity than either temporal or hemilineage identity alone. See Figure 6 for synapse localization, and Figure 7 for connectivity data.

      3) Although the authors claim so, it is not convincing that the role of spatial patterning in neuronal connectivity has been assessed in this work, since the authors do not present an obvious correlation between specific connectivity features (morphology, axon or synapses localization) and the position of a given NB in the VNC. This should be clarified in the text.

      Great point! We agree that spatial patterning was not directly tested in our manuscript, thank you for pointing this out. Our claim that spatial patterning is involved is simply based on the idea that lineages (and thus hemilineages) are more related to one another than neurons from other hemilineages suggesting that the identity of the parent neuroblast plays some role. You make the excellent point that we did not look at the relationship of projections from all NBs in a “row” or “column” within the NB array. That analysis would potentially reveal a role for spatial factors in determining neuron projections. Unfortunately, we have a very limited set of neurons from any one row or column, not enough to make claims about direct relationships between row or column identity and targeting/connectivity.

      Reviewer #3:

      Specific comments:

      1) Figure 1; page 3: The authors refer to the "striking" similarity between EM reconstructions and GFP filled clones and yet there are clear differences in some of the clones in the extent and localization of arborization. This may be in part technical but almost certainly also reflects inter individual differences in single neuron morphology. Since EM reconstructions presumably come for, one animal, the use of GFP clones allows the authors to map the degree of variation between clones and it would be interesting for them to show this.

      That is an interesting point. Elegant work from Tzumin Lee and Jim Truman have shown that clones from larval neuroblasts are very similar, and our qualitative findings support this conclusion. Thus, it would be a quite minor advance for us to quantify clonal similarity in embryonic neuroblasts. Plus, since the number of neurons in a clone varies slightly, we would have to count neuron numbers per clone and only compare those with identical neuron numbers, which is possible but time-consuming. Then there are the covid restrictions which make it difficult to rapidly generate new clones to increase the number with identical neurons. All in all, we decided that the benefit of answering this question was not worth the cost of performing it, and that other experiments were a higher priority in our limited research time. We have toned down the language to remove the word “striking” in the Introduction.

      2) Figures 2 and 4; pages 3-5: Along the same lines as above, the authors make categorical statements about the mapping of arbors to dorsal and ventral regions of the nerve cord and correlate that to hemilineage identity. Again, there is clear mixing in almost all neuroblast lineages, that seems to range from 15-30% as a rough estimate, and perhaps a bit more dorsally than ventrally, which the authors do not comment on (except to say it's "mostly non-overlapping"). This is a pity because they obviously have the tools to do so quantitatively and the information is already there in their data.

      Yes, good point – there is some overlap in most lineages for both axon/dendrite targeting (Figure 2) and synapse targeting (Figure 4). We now quantify the synapse similarity and observe that hemilineage-related neurons have much greater synapse similarity than they have with their sister hemilineage. The non-overlapping relationship between hemilineages is somewhat obscured by the simple posterior view shown in Figures 2 and 4, so we add a new figure (Figure 4 – supplement 2) that shows hemilineage synapse targeting in all three axis: A/P, M/L, and D/V. This makes it possible to see the true relationship.

      3) The analysis of Notch activity in hemilineages is excellent and very interesting, as is the new tool they develop. However, the analysis lacks loss of Notch function data and where and when Notch signaling is required to segregate the connectivity space (i.e. in neurons or in precursors such as Nbs and GMCs). Is this a binary fate specification mechanism or lateral inhibition among competing neurons? What about Notch activity manipulation in single neurons? If the authors wish to draw strong conclusions about the role of Notch in segregating target space and its relation to hemilineage identity, these experiments are essential. Alternatively, drawing subtler conclusions and acknowledging these caveats would be very welcome.

      Great point about the possible role of non-canonical Notch signaling in post-mitotic neurons (PMID: 22608692). We do not have the tools to perform lineage-specific, axon-specific removal of Notch protein. In theory we could do single neuron MARCM experiments, but these are extremely difficult due to the perdurance of the Gal80 protein, which would prevent us from assaying in newly hatched larvae. We add a Discussion section addressing the unresolved issue of post-mitotic neuron Notch function: “Another point to consider is the potential role of Notch in post-mitotic neurons (Crowner et al., 2003), as our experiments generated Notch-intra misexpression in both new-born sibling neurons as well as mature post-mitotic neurons. Future work manipulating Notch levels specifically in mature post-mitotic neurons undergoing process outgrowth will be needed to identify the role of Notch in mature neurons, if any.”

      4) Figure 7; Page 7: The authors state that 75% of hemilineage neurons correlated by temporal identity are separated by 2 synapses or less, suggesting greater connectivity than expected. How are these data normalized? What is the expected connectivity between neurons that are less related along these two developmental axes?

      Thanks for the question, which helped us change the text for clarity. The quantifications in Figure 7 actually do compare connectivity between unrelated neurons. Thus, we have changed “random” to “unrelated” in the text and figure legends. Additionally, the methods for this analysis were obviously not clear enough, so we have updated them with this text below:

      Path Length Analysis:

      We computed the pairwise path length between all hemilineages as well as all sensory and motor neurons in A1 in the undirected connectivity graph. We found that neurons that are unrelated by developmental grouping had an average path length greater than that of neurons related by hemilineage. Additionally, we found that the average path length for neurons related by hemilineage alone had an average path length greater than that of neurons in hemilineagetemporal-cohorts. For this analysis, unrelated neurons were defined as neurons that were in the same D/V axis (i.e. dorsal to dorsal and ventral to ventral) and same hemisegment (left or right), but not in the same hemilineage. Hemilineage comparisons were neurons in the same hemilineage, but not in the same temporal cohort. Significance was determined with a two-sample KS test on the empirical distributions of pairwise path lengths.

      Independent of path length, we also calculated connectivity similarity between related neurons in Figure 8. Similarity here was defined as the cosine of the angle between the input or output vectors of each neuron. Similarity by this metric was also found to be greater for developmentally related neurons. Finally, we added this line to Figure 7 legend to clarify normalization: “Frequency corresponds to the fraction of pairwise distances observed for each group.”

      5) Figure 8; page 7 and discussion: The authors conclude that the combination between temporal identity and hemilineage identity predicts connectivity beyond what would be predicted by spatial proximity alone. This conclusion is problematic at least two levels. First, practically what really matters for proximity is proximity during the time in development when synapses are forming between neuronal pairs, not proximity at the end in the final pattern.

      This is a good point that we need to clarify, although we note that synaptic connectivity is not a "one and done" in the embryo, but rather a continuous process that extends from the late embryo into the third larval instar ("Conserved neural circuit structure across Drosophila larval development revealed by comparative connectomics" by Gerhard, Andrade, Fetter, Cardona, and Schneider-Mizell, eLife 2017).

      Nevertheless, we now add the following additional text to the Results and to the Discussion. To the Results: “Interestingly, even neurons with the highest observed levels of overlap were not always connected (Figure 8A''). Thus, proximity alone can't explain the observed connectivity, consistent with a role for hemilineage-temporal cohorts providing increased synaptic specificity. Of course, our assays are in newly hatched larvae, and it is likely that dendritic arbors are more widely distributed during circuit establishment in the late embryo (Valdes-Aleman et al., 2021), yet only a specific region of the neuropil is targeted by larval hatching, which suggests the initial broad dendrite targeting is not sufficient to establish connectivity to many neurons contacted by these early dendrites, again arguing against a simple proximity mechanism.” To the Discussion: “Our results strongly suggest that hemilineage identity and temporal identity act combinatorially to allow small pools of neurons to target pre- and postsynapses to highly precise regions of the neuropil, thereby restricting synaptic partner choice. Yet precise neuropil targeting is not sufficient to explain connectivity, as many similarly positioned axons and dendrites fail to form connections (Figure 8C), despite active synapse addition throughout larval life (Gerhard et al., 2017).”

      Second, conceptually, opposing spatio-temporal mechanisms with proximity-based bias for connectivity makes no sense because that's exactly what spatio-temporal mechanisms achieve: getting neurons to the same space at the same time so connectivity can happen. At any rate, drawing strong conclusions about where and when neurons meet to form (or not form) synapses requires live imaging and absent that authors should refrain from making such a string statement about what their excellent correlative dataset means.

      Yes, spatiotemporal mechanisms get axons (or dendrites) to precise neuropil domains, but that does not invariably generate connectivity. What is interesting is that hemilineage-temporal cohorts share more connectivity than predicted by proximity alone. Thus, proximity is necessary but not sufficient for proper connectivity. An additional mechanism is in play, and our data suggests that is due to the neuron's hemilineage-temporal identity. We agree that our data are correlative – shared development correlates with shared connectivity – so we have moved any suggestion of possible mechanism from the Results to the Discussion. We agree this is an important change that will increase manuscript accuracy, and also provide a clear future direction for mechanistic experiments. Thanks for helping us focus the paper better.

    1. Author Response:

      Reviewer #1:

      In this manuscript the authors show that a designer exon containing a Fluorescent Protein insert can be used to edit vertebrate genes using an NHEJ based repair mechanism. The approach utilizes CRISPR to generate DSBs in intronic sequences of a target gene along with excision of a donor fragment from a co-transfected plasmid to initiate insertion of the exon cassette by ligation into the chromosome DSB.

      I like the idea here of inserting FP sequences (and other tags) into introns in this way. Focusing on the N- and C-termini for insertions has always seemed arbitrary to me. In practice these internal sites may even tolerate tag insertions better than the termini. However, this remains to be seen.

      My major reservation with this study is that the concepts here are not particularly novel. The approach is very similar to a concept already well established in gene-therapy circles of using introns as targets for inserting a super-exon preceded by a splice acceptor to correct inborn genetic lesions. The methodology employed is essentially HITI (https://www.nature.com/articles/nature20565).

      What is new is the finding that FP insertions are frequently expressed and at least partly functional as evidenced by their ability to localize to the expected intracellular structures. However, no actual functional data is provided in this study so it remains to be seen how frequently the insertion of FP exons is tolerated. It would help the study substantilly to have functional information for a few insertions.

      The value and utility of this study hinges on whether insertions of this type frequently retain function. The authors speculate that "labeling at an internal site of a gene is feasible as long as the insertion does not disrupt the function of the encoded protein. Many introns reside at the junctions of functional domains because introns have evolved in part to facilitate functional domain exchanges (Kaessmann et al., 2002; Patthy, 1999)." Thus an analysis of how often intron tags are tolerated as homozygotes would be helpful for users who will worry that a potentially "quick and dirty" CRISPIE insertion might not accurately report on the function and localization of their protein of interest.

      We thank the reviewer for appreciating our idea. CRISPIE is indeed improved HITI, with the notable difference that the insertion takes place at the intronic region and that a designer intron/exon module is used. This design has a significant benefit in that INDELs in both labeled and unlabeled alleles will be unlikely to cause mutations at the levels of mRNA and proteins. CRISPIE is also different from the super-exon, which is now cited (Bednarski et al, 2016). CRISPIE does not involve the 3’ UTR and the poly A signal. This makes the donor template more standardized and smaller. Transcriptional controls embedded in endogenous introns after the editing sites can be retained in CRISPIE, but not when super-exons are used. We also achieve much higher efficiency in vivo than previous editing methods, which we feel is an important advance.

      We now provide three different experiments to address the function of CRISPIEd β-actin and, in one experiment, the function of CRISPIEd α-tubulin 1B. One of the key functions of the cytoskeleton is to support growth. We now show that neither CRISPIE labeling of β-actin (hACTB), at two different intronic loci, and nor CRISPIE labeling of α-tubulin 1B (TUBA1B) affect the growth of U2OS cells (New Experiment #1; Figure 1H, and Figure 1-figure supplement 4), suggesting that labeled β-actin and α-tubulin are functional. In addition, as suggested, we now demonstrate that cells homozygous for CRISPIE insertions are viable and able to divide (New Experiment #2; Figure 4-figure supplement 1). We also show that two important neuronal functional parameters – the mEPSC frequency and amplitude – are not altered by CRISPIE labeling of hACTB in neurons in cultured hippocampal slices. (New Experiment #3; Figure 5– figure supplement 2).

      Having shown the above results, we also hope to emphasize that, although CRISPIE provides a way to perform FP tagging of endogenous protein with high efficiency and low error rates, it cannot ensure that FP-tagging itself is benign for all proteins. Numerous studies have overexpressed FP-tagged proteins, which is well documented to have side effects. The CRISPIE method empowers researchers by allowing them to tag endogenous proteins without overexpression. However, if the FP-tagging itself affects protein function, CRISPIE will not be helpful. Each FP-tagging project, whether it is based on CRISPIE or other methods, will requires its own systematic characterization. We have now made this clear in the discussion (pg. 17): “… although CRISPIE enables the tagging of endogenous proteins with low error rates, it does not ensure that the tagged protein functions the same as the wild-type protein. Not all tagging is benign, and rigorous characterizations will be needed for each tagging experiment.”

      Other comments:

      1) Were homozygotes identified and were they viable in each instance?

      We now provide data showing that cells homozygous for CRISPIE insertions are viable and able to divide (New Experiment #2; Figure 4-figure supplement 1).

      2) You say: "The CRISPIE method should be broadly applicable for use with different FPs or with other functional domains, different protein targets, and different animal species." I don't know if you optimized your FP to avoid potential reverse strand splice acceptors, but some discussion of this important point should be made so that those trying to apply the approach will make sure that strong acceptors are not included accidentally in reverse oriented inserts.

      Our RT-PCR does not detect reversed inserts at the mRNA level. We now add in the Discussion that donor design needs to eliminate unintended splicing sites in the reverse orientation. We write (pg. 17): “It should also be noted that, when designing the donor template, care should be given to not create unintended splicing acceptor sites in the inverted orientation. Otherwise, inverted insertion events can cause mutations at the mRNA and protein levels.”

      3) Would your mRNA sequencing methodologies detect defective transcripts where the splice acceptor and a portion of the upstream FP exon was inserted causing a frame shifted and mispliced mRNA? Such mRNAs would be unstable due to NMD and thus not detected readily in a PCR based approach. Thus disruption of the mRNA by partial insertion of your donor (or fragments of the other co-injected DNA) might be much more widespread than is measured here. This could be tested by recovering clones that partially inserted the donor in the forward orientation and carefully monitoring for defects in mRNA splicing of the inserted allele. Were such clones detected and how frequently?

      Our method should detect defective mRNAs, if they are not degraded. However, if defective mRNAs are quickly degraded, they are not measured in our current RT-PCR and NGS experiments, as described in Figure 2. While we cannot address this question directly, we now provide evidence that the cell growth and neuronal function after CRISPIE labeling of β-actin remain normal.

      We also thank the reviewer for suggesting the cloning approach. This proposed experiment, however, may potentially be affected if potential defective mRNAs can result in decreased cell survival/growth. Although this experiment will require time beyond the three-month revision period expected by eLife due to the length of time required to clone cells, we will keep this in mind in our future efforts.

      4) You note that in the case of vinculin the coding sequence of the last exon of hVCL was included in the insertion donor sequence, and a stop codon was introduced at the end of the mEGFP coding sequence. This is essentially the strategy for super-exon insertion into targets for gene therapy, instead of a splice donor on the C-terminus you include a stop codon. You should site these previous studies. Inclusion of a stop codon in frame would be expected to cause NMD, did you also include transcription termination signals?

      NMD will happen if the stop codon is further than about approximately 50 nucleotides upstream of any exon-junction complexes (Lewis et al, PNAS 2009). However, NMD won’t occur if it is within 50 nucleotides. For example, synaptophysin – a highly expressed neuronal protein – has its stop codon at its second to last exon within 50 nucleotides of the exon junction. The stop codon we used for labeling hVCL is also within 50 nucleotides (~20 nt) of the exon junction.

      We now cite Bernarski et al, 2016, which describes the use of super-exons in gene therapy. At the same time, we think that our approach is still different from the super-exon concept. After the stop codon, the 3’ UTR is not included. Instead, a splicing donor is included, allowing the exon to be spliced to the subsequent endogenous exon. This allows the insert to remain small for high insertion efficiency and makes it easy to produce the template (some 3’ UTRs can be several kilobase pairs in length), while utilizing the endogenous translational controls built into the native 3’ UTR.

      Reviewer #2:

      In-frame insertion of fluorescent protein tags into endogenous genes allows observation of protein localization at native expression levels, and is therefore an essential approach for quantitative cell biology. Once limited to unicellular model organisms such as yeast, endogenous gene tagging has become well-established in invertebrate model systems such as C. elegans and Drosophila since the advent of CRISPR technology in the last decade. However, a robust and widely accepted endogenous gene tagging strategy for mammalian cells has remained elusive. This is largely due to the fact that homologous recombination, the method used to create knock-ins in invertebrates, is inefficient (or sometimes doesn't work at all) in mammalian cells, especially those that do not divide rapidly.

      Several studies have attempted to bypass the need for homologous recombination by using a different method, non-homologous end joining (NHEJ) to insert GFP tags into vertebrate genomes (e.g. Auer et al. Genome Res 2014; Suzuki et al. Nature 2016; Artegiani et al. Nature Cell Biol. 2020). Such approaches can be orders of magnitude more efficient than homologous recombination, but the generated alleles require careful validation because of the error-prone nature of NHEJ.

      Here, Zhong and colleagues improve upon the existing NHEJ-based gene tagging approaches by designing synthetic exons (comprising a FP coding sequence with 5' and 3' splice sites) that can be inserted into native introns using NHEJ. The beauty of this approach is that any mutations (indels) created by the error-prone NHEJ repair mechanism are spliced out, and therefore do not affect the sequence of the encoded protein. A limitation is that tags must be inserted internally within a protein of interest and cannot be targeted to the extreme N- or C-terminus, but this limitation is clearly stated and discussed by the authors. Overall, this is a novel (to my knowledge) and powerful strategy that is likely to advance the field.

      We thank the reviewer for the very positive comments regarding our CRISPIE method.

    1. Author Response:

      Reviewer #1:

      In this manuscript Lituma et al. provides compelling evidence demonstrating the physiological role of presynaptic NMDA receptors at mossy fiber synapses. The existence of these receptors on the presynaptic site at this synapse was suggested more than 20 years ago based on morphological data, but their functional role was only shown in a single abstract since then (Alle, H., and Geiger, J. R. (2005)). The current manuscript uses a wide variety of complementary technical approaches to show how presynaptic NMDA receptors contribute to shaping neurotransmitter release at this synapse. They show that presynaptic NMDA receptors enhance short-term plasticity and contribute to presynaptic calcium rise in the terminal. The authors use immunocytochemistry, electrophysiology, two-photon calcium imaging, and uncaging to build a very solid case to show that these receptors play a role at synaptic communication at mossy fiber synapses. The authors conclusions are supported by the experimental data provided.

      The study is built on a solid and logical experimental plan, the data is high quality. However, the authors would need to provide stronger evidence to demonstrate the physiological function of these receptors. It is hard to reconcile these experimental conditions with the authors' claim in the abstract: 'Here, we report that presynaptic NMDA receptors (preNMDARs) at hippocampal mossy fiber boutons can be activated by physiologically relevant patterns of activity'. We know that extracellular calcium can have a very significant impact of neurotransmitter release and how short-term plasticity is shaped. For this reason, it would be important to explore how the activity of these receptors at more physiological calcium concentrations contribute to calcium entry and short-term plasticity at these synapses.

      We thank the reviewer for noting our study is “built on a solid and logical experimental plan, the data is of high quality”. We agree with the reviewer that exploring the role of preNMDAR under more physiological conditions is extremely important. In response, we have performed new experiments at 35 ºC and at a more physiological 1.2 mM Ca+2 and 1.2 mM Mg+2 concentrations. Our new results, now included in Figure 4-figure supplement 1, demonstrate that our conclusion that preNMDARs at mossy fiber boutons can be activated by physiologically relevant patterns of activity is also true under more physiological recording conditions.

      Reviewer 2:

      Lituma et al. examined the presence and functions of preNMDARs in dentate gyrus granule cells (GCs) in the hippocampus. The authors found that GluN1+ preNMDARs are indeed present at mossy fiber (mf) terminals with electron microscopy. With pharmacological and genetic approaches, the authors showed that preNMDARs are important in low frequency facilitation (LFF), burst-induced facilitation and information transfer at the mf-CA3 synapse. The authors further demonstrated that this preNMDAR contribution is independent of the somatodendritic compartment of the GCs. With 2-photon calcium imaging, the authors found that preNMDARs contribute to presynaptic Ca2+ transients and can be activated by local glutamate uncaging. Separately, the authors showed that GluN1+ preNMDARs might also contribute to BDNF release at mossy fiber terminals during repetitive stimulation. Lastly, non-postsynaptic NMDARs specifically mediates mf transmission onto mossy cells, similar to mf-CA3 synapses, but not interneurons. The authors concluded that preNMDARs mediate synapse-specific transmission originating from the GCs/mf inputs.

      Overall, the study provides compelling evidence from a battery of techniques, ranging from EM, pharmacology, genetic deletion, electrophysiology to 2-photon imaging/uncaging. The data supports a coherent story on the presence of preNMDARs at mf terminals and that preNMDARs play important roles in LFF.

      In conclusion, this study reveals how NMDA receptors can be found in unexpected locations and how they may have unconventional functions, i.e. outside the narrow textbook view that they primarily serve as coincidence detectors in Hebbian learning. This study thus helps to change the way we think about NMDA receptor functioning, so should be of broad interest.

      We appreciate the reviewer’s comments that our study provides compelling evidence for the presence and role of preNMDARs at mossy fiber terminals. We also agree with the reviewer that our study challenges the way we think about NMDA receptor function.

      Reviewer #3:

      In this manuscript Lituma and colleagues investigate a potential role for presynaptic NMDARs at hippocampal mossy fiber (MF) synapses in regulating synaptic transmission. The combined use of electron microscopy, electrophysiology, optogenetics, calcium imaging, and genetic manipulations expertly employed by the authors yields high quality compelling evidence that presynaptic NMDARs can participate in activity dependent short term facilitation of release onto postsynaptic CA3 pyramid and mossy cell targets but not onto inhibitory interneurons. Moreover, presynaptic NMDAR activation is demonstrated to be particularly effective in promoting BDNF release from MF boutons. The investigation is well designed with a clear hypothesis, appropriate methodological considerations, and logical flow yielding results that fully support he authors conclusions. The manuscript fills an important gap in our understanding of MF regulation by unambiguously confirming a functional role for presynaptic NMDARs that were first described anatomically at MF terminals nearly 30 years ago. Combined with a handful of other studies describing presynaptic NMDARs at various central synapses this study expands the role of NMDARs as critical players in synaptic plasticity on both sides of the cleft.

      We very much appreciate the reviewer’s positive remarks of our study as “well designed with a clear hypothesis, appropriate methodological considerations, and logical flow”. We concur that the manuscript fills an important gap in understanding MF regulation by preNMDARs and expanding the role of NMDARs in synaptic plasticity on both sides of the cleft.

    2. Reviewer #2 (Public Review):

      Lituma et al. examined the presence and functions of preNMDARs in dentate gyrus granule cells (GCs) in the hippocampus. The authors found that GluN1+ preNMDARs are indeed present at mossy fiber (mf) terminals with electron microscopy. With pharmacological and genetic approaches, the authors showed that preNMDARs are important in low frequency facilitation (LFF), burst-induced facilitation and information transfer at the mf-CA3 synapse. The authors further demonstrated that this preNMDAR contribution is independent of the somatodendritic compartment of the GCs. With 2-photon calcium imaging, the authors found that preNMDARs contribute to presynaptic Ca2+ transients and can be activated by local glutamate uncaging. Separately, the authors showed that GluN1+ preNMDARs might also contribute to BDNF release at mossy fiber terminals during repetitive stimulation. Lastly, non-postsynaptic NMDARs specifically mediates mf transmission onto mossy cells, similar to mf-CA3 synapses, but not interneurons. The authors concluded that preNMDARs mediate synapse-specific transmission originating from the GCs/mf inputs.

      Overall, the study provides compelling evidence from a battery of techniques, ranging from EM, pharmacology, genetic deletion, electrophysiology to 2-photon imaging/uncaging. The data supports a coherent story on the presence of preNMDARs at mf terminals and that preNMDARs play important roles in LFF.

      In conclusion, this study reveals how NMDA receptors can be found in unexpected locations and how they may have unconventional functions, i.e. outside the narrow textbook view that they primarily serve as coincidence detectors in Hebbian learning. This study thus helps to change the way we think about NMDA receptor functioning, so should be of broad interest.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Point-by-point response to reviewer comments


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In the current manuscript, Millarte et al reports a novel role of Rabaptin5 in selectively clearing damaged endosomes via canonical autophagy. They have identified FIP200 as a novel interactor of Rabaptin5 under basal conditions using yeast-two hybrid screening and further confirmed the interaction of Rabaptin5 with FIP200 with immunoprecipitation. They next used Chloroquine and monitored colocalization of the Rabaptin5 with WIPI2, ATG16L1 and LC3B to demonstrate the potential interaction of Rabaptin5 with the autophagic machinery. They have primarily used Gal-3 as a marker of membrane damage after 30 minutes of Chloroquine treatment. In order to further elucidate the role of Rabaptin5 in autophagic induction mediated by Chloroquine, they have silenced Rabaptin5, FIP200, ULK1 and ATG13 and observed a decrease in the number of LC3 or WIPI2 autophagosome formation. Based on these observations they tested if Rabaptin5 interacts with ATG16L1 upon Chloroquine treatment and confirmed their interaction with potential interaction sites of both Rabaptin5 with ATG16L1 with IP. The authors confirmed the interaction of Rabaptin5 with ATG16L1 by complementing the KO line with the mutant form of Rabaptin5 containing alanine residues in its consensus motif. Finally, they have used Salmonella and SCV as a model to study the role of Rabaptin5 in endomembrane damage and monitored a 50% decrease in the removal of Salmonella in Rabaptin5 KO or KD cells.

      Major concerns One of the major concerns is the membrane damage reported by chloroquine which is known to induce lysosomal swelling and further targeting of the swollen compartments to degradation by direct conjugation of LC3 onto single membrane as a form of non-canonical autophagy. The evidence regarding membrane damage by Gal3 colocalization on the Rabaptin5 vesicles is preliminary. As suggested by the authors the canonical autophagy pathway recognizing damaged membranes recruits also ALIX to the damaged membrane which was not observed in Supplementary Figure 2. The link to membrane damage by chloroquine and monensin with Rabaptin5 is not convincing as there is not sufficient evidence of membrane damage. In relation to this issue authors should consider using other damage markers as Gal8, p62 or NDP52 to provide additional claim with respect to membrane damage induced by chloroquine.

      To expand on the question of CQ treatment damaging early endosomes, we also tested for Gal8 on Rabaptin5-positive enlarged endosomes and quantified the fraction of Rabaptin5-positive rings positive for Gal3 and Gal8 after 30 min of CQ treatment. We propose to include this data in Figure 2:

      • *

      *

      • *

      We have tested the importance of Gal3 and p62 by siRNA-mediated knockdown where we found a robust inhibition of induction of WIPI2 puncta with CQ, but not with Torin1. Formation of LC3 puncta was less reduced, similar to knockdowns of FIP200, ATG13, or Rabaptin5.

      We propose to add these knockdown experiments as a supplementary figure:

      • *

      • *

      *

      One of the main claims here is that Rabaptin5 regulates the targeting of damaged endosomes to autophagy. Clearly, these are early endosomes as stated in the abstract. However, the evidence presented here showing these are early endosomes is not convincing. Analysing Gal3 and Gal8 positive vesicles that are Rabaptin5 positive and an early endosomal marker will be important in this context. For example, there need to be additional evidence showing that early endosomes are targeted to autophagy. Is the degradation of TfR affected by this targeting? Did the authors look at the effect of Bafilomycin A1? If this process affects exclusively early endosomes, it should be BafA1 independent. This will direct more into the cellular function of this process.

      Rabaptin5 is a bona fide marker of Rab5-positive early sorting endosomes. As a control, we confirmed colocalization of Rabaptin5 with transferrin receptor, another endosomal marker, on CQ-induced rings (Fig. 2B). We now also analyzed swollen endosomes with triple-staining for Rabaptin5, transferrin receptor, and Gal3 as shown in this gallery (30 min CQ, as in Fig. 2). All Rabaptin5-positive swollen endosomes (rings) were positive for transferrin receptor and ~80% for mCherry-Gal3.

      • *

      *

      • *

      We further tested transferrin receptor levels with and without CQ. Since CQ inhibits autophagic flux, this assay may not be very sensitive. Nevertheless, we found a significant reduction of ~15% and ~30% after overnight incubation with CQ in parental HEK293A cells and in Rbpt5-KO cells re-expressing wild-type Rabaptin5, resp., but no reduction in Rbpt5-KO cells expressing the Rabaptin5-AAA mutant defective in binding to ATG16L1:

      • *

      *

      • *

      As to the effect of BafA1, see our general response on top. The osmotic effect of CQ or Mon on endosomes that leads to membrane breakage requires an acidic pH. Preincubation with BafA1 neutralizes the pH, prevents osmotic swelling by CQ/Mon, and was shown to block LC3 lipidation (Florey et al., 2015, Jacquin et al., 2017). When BafA1 was added simultaneously, CQ was found to induce LC3 despite the presence of BafA1 (Mauthe et al., 2018), and Mon was shown to still be able to break endosomal membranes and recruit LC3 to EEA1-positive endosomes (Fraser et al., 2019). However, CQ-induced LC3 recruitment to latex bead-containing phagosomes or entotic vacuoles, i.e. LAP-like autophagy, was blocked (Florey et al., 2015). Consistent with this literature, we found increased LC3B lipidation already within 30 min of CQ treatment independently of BafA1 (no preincubation).

      • *

      *

      • *

      Upon longer incubations, LC3B lipidation is very strong already with BafA1 alone so that the effect of CQ cannot be assessed anymore, since both drugs inhibit autophagic flux.

      Furthermore, we found a CQ-dependent increase in WIPI2- and LC3B-positive puncta to be insensitive to BafA1 (panel A below). Colocalization of Rabaptin5 to LC3B and LC3B to Rabaptin5 significantly increased upon CQ treatment independently of the presence of BafA1 (no pretreatment), indicating that at least a large part of CQ-induced LC3B puncta is not due to LAP-like autophagy.

      • *

      *

      Minor concerns Both for Figure 2 and Supplementary Figure 7 it will be clearer to have the images in colour rather than black and white for better interpretation.

      We thought the grayscale images were clearer, but are happy to provide color images.

      The interaction of FIP200 and ATG16L1 with Rabaptin5 is well characterized with immunoprecipitation and imaging but the interaction of Rabaptin5 in presence of chloroquine with FIP200 and ATG16L1 DWD are missing and it will be important to include if in the presence of chloroquine these interactions will increase or not.

      We can do co-IP experiments also upon CQ treatment.

      In order to further support the role of Rabaptin5 for LC3 lipidation upon chloroquine induced membrane damage, western blots of WT, +Rabaptin5, Rabaptin5 KO, Rabaption5 KO +WT or +AAA cell lines were analysed. However, the lysates were collected upon 30 minutes of chloroquine treatment which does not correlate with the imaging performed in Figure 2 as the number of LC3 vesicles did not show an increase upon 30 minutes of chloroquine treatment. The authors should include the 150 minutes time point for the LC3 lipidation in these conditions.

      Because CQ inhibits autophagic flux, LC3-II accumulates after longer times in all cell lines. The differences can only be seen early.

      The experiments with Salmonella are of great quality. The relationship of Rabaptin5 with SCV and the endomembrane damage induced by Salmonella could be further elucidated with Rabaptin5 positive vesicles at early infection stages. It is not very clear from the text how authors link the endosomal network previously described for chloroquine with infection. It would be important here to show that Salmonella mutants unable to damage endosomal membranes do not have an effect. In Figure 7 panel C, the time points on graphs are in hours but it should be in minutes. corrected.

      Since Salmonella require T3SS for infection of HEK cells and T3SS causes the membrane damage, the proposed experiment is difficult.

      The events of targeting the damaged membranes for degradation was well characterized by the recognition of these membranes by Gal3, Gal8 and recruitment of autophagic receptors to the site of damage (Chauhan et al. 2016; Jia et al. 2019; Thurston et al. 2012; Maejima et al. 2013; Kreibich et al. 2015). This manuscript introduces a new potential platform for the formation of autophagic machinery on endosomes with the interaction of Rabaptin5 with FIP200 and ATG16L1, however more evidence is required to link this to the clearance of damaged membranes. Previously it was shown that endolysosomal compartments that were neutralized and swollen by monensin and chloroquine had been directed to degradation by direct conjugation of LC3 to single membranes via noncanonical autophagy, but here authors propose another mechanism for this event via canonical autophagy.

      As discussed in the general response above, the literature reports CQ and Mon to initiate both canonical autophagy and LAP-like autophagy, the latter particularly on phagosomes containing latex beads or entotic vacuoles. Our results – including the additional data above –concern the effects of CQ and Mon damaging early endosomes and causing recruitment of galectins and ubiquitination, triggering autophagy dependent on the ULK complex and WIPI2 as hallmarks of canonical autophagy, and Rabaptin5. The reviewer comments highlighted the possibility of LAP-like autophagy occurring in parallel, perhaps on endosomes that are not broken, which might explain the relative insensitivity of LC3 puncta induced by CQ and Mon – compared to the strong and robust reduction of WIPI2 puncta – on the knockdown of FIP200, ATG13, or Rabaptin5. In an alternative explanation, inhibition of autophagic flux causes remaining canonical autophagy to accumulate, while WIPI2 puncta are strongly inhibited. In support of the latter interpretation, ULK inhibition by MRT68921 (Fig. 4C and D) or FIP200 knockout (Fig. 6B and C) abolished CQ-induced LC3 structures, suggesting that – unlike on phagosomes or entotic vacuoles – there is little LAP-like autophagy. We propose to revise the manuscript to discuss these considerations more clearly.

      Reviewer #1 (Significance (Required)):

      Overall this work is very novel and shows some evidence of early endosomal autophagy. It could be relevant for some for of receptor-mediated signalling (although it is not discussed by the authors) My experience is in intracellular trafficking of pathogens and membrane damage.

      **Referee Cross-commenting**

      In my opinion, the only way you can distinguish between double or single membrane is by EM. For me, the important part is to show this is targeting of early endosomes to autophagy, either using other early endosomal markers, analysing by WB some early endosome receptors such as TfR or other inhibitors. If the authors are able to address some these comments, I agree the paper will be in a better position for publication.


      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Millarte et al study the role of Radaptin-5 (Rbpt5) during early endosome damage recognition by autophagy. The authors focus on using chloroquine (CQ) as an agent to induce endosomal swelling/damage and suggest that Rbpt5 is required for the recruitment of the autophagy machinery to perturbed endosomes. They further use salmonella infection as a model to confirm the role of Rbpt5 in this process. The authors initially show that Rbpt5 binds to FIP200 and subsequently focus on its interaction with ATG16L1 and identify a mutant that is unable to bind ATG16L1 or mediate the recognition of early endosomes by autophagy. Overall, this is an interesting study which provides molecular insights of how early endosomes can be targeted by the autophagy machinery where Rbpt5 may act as an autophagy receptor. Some specific comments are as follows:

      Fig.3A: siRbpt5 seems to induce the localization of LC3 to ring-like structures during CQ treatment. Are these LAP-like structures (e.g. sensitive to BafA1 treatment)? And were they included in the quantification in Fig.3C?

      Ring-like LC3 structures were also counted.

      As discussed in the general remarks above, it is a possibility that knockdown-resistent LC3 recruitment (particularly rings) is due to a CQ-induced LAP-like process. The alternative explanation is that there is residual canonical autophagy upon knockdown of Rabaptin5, ATG13, or FIP200: while WIPI2 puncta are strongly reduced, LC3-positive structures accumulate due to inhibition of autophagic flux. In support of the latter interpretation, ULK inhibition by MRT68921 (Fig. 4C and D) or FIP200 knockout (Fig. 6B and C) abolished CQ-induced LC3 puncta or rings.

      We can also test BafA1 treatment. Certainly, we will revise the text to discuss this point in more detail.

      • *

      Fig.4A&B: Since Rbpt5 KD has a weak effect on LC3 puncta formation (Fig.3) and to distinguish the effects of CQ in inducing LAP, the effects of ATG13 and ULK1 KD should be assessed by localising Rbpt5 with WIPI2 or ATG16L1.

      We can do that.

      Fig.4: It is not clear why ULK1 KD would affect Torin1-induced autophagy but not LC3/WIPI2 localisation during CQ induced early endosome-damage. As the ULK inhibitors can target other pathways, the authors should confirm this finding in ULK1/2 double KO or KD cells.

      We have used **MRT68921, because it is frequently used in the literature for this purpose with high specificity. It was used for example by Lystad et al. (2019) together with VPS34IN1 to block all canonical autophagy to analyze exclusively noncanonical effects of monensin treatment. We could perform ULK1/2 double knockdowns, but since ULK2 cannot be detected on immunoblots in HEK293 cells, the result would be interpretable only when there is an effect.

      Fig.5: The contribution of FIP200 in the interaction between Rbpt5 and ATG16L1 is unclear. Is binding between Rbpt5 and ATG16L1 mediated by ATG16L1's interaction with FIP200? The plasmid details describing the delta-WD40 deletion plasmid used in this study are missing and could be important to confirm that the detla-WD40 still retains binding to FIP200.

      We will of course include the details on the deletion plasmid, which were missing by mistake. Our WD deletion construct of ATG16L1 consists of residues 1–319, precisely deleting just the WD40 repeats, but retaining the FIP200 interaction sequence and the second membrane binding segment (b).

      We did a co-immunoprecipitation experiment and found both wild-type ATG16L1 and the ∆WD mutant to co-immunoprecipitate with FIP200:

      • *

      *

      Fig.5E: the authors should test Rbpt5 AAA mutant binding to FIP200. Since the mutant appears to express less, its binding to ATG16L1 should be quantified or repeated with more comparable expression levels.

      We will quantify the immunoblots and perhaps attempt getting more equal expression levels.

      Fig.6: CQ treatment can induce various endosomal damage (in addition to early endosomes) and LC3 lipidation processes (e.g. LAP-like). The authors show that Rbpt5 is specifically involved in damaged early endosome autophagy. In this figure, it would be important to distinguish CQ-induced LC3 puncta as a result of early endosome damage or other lipidation processes (e.g. canonical or non-canonical autophagy). The use of FIP200 KO cells shows that LC3 puncta is inhibited. However, here a specific readout to look at early endosome recognition by autophagy is important. The authors can localize early endosome markers (EEA1) with autophagy players (e.g. WIPI2 and LC3). This is also relevant to other figures (e.g. supplementary figure 7E).

      Rabaptin5 is a bona fide marker of Rab5-positive early sorting endosomes. As a control, we confirmed colocalization of Rabaptin5 with transferrin receptor, another endosomal marker, on CQ-induced rings (Fig. 2B). We also analyzed swollen endosomes with triple-staining for Rabaptin5/ transferrin receptor/ Gal3 as shown in this gallery (30 min CQ, as in Fig. 2). All Rabaptin5-positive swollen endosomes (rings) were positive for transferrin receptor and ~80% for mCherry-Gal3.

      • *

      *

      • *

      Our results are in agreement with Fraser et al. (2019) where they use EEA1 as an endosomal marker upon monensin treatment.

      We also performed a colocalization analysis for Rabaptin5 and LC3B, showing enhanced colocalization after CQ treatment for 150 min: ~20% of LC3B is (still) pos for Rabaptin5 after 150 min of CQ treatment:

      *

      Fig.6F&G: the authors should show representative images of these localization images quantified here. These can be added in the supplementary figures.

      We are happy to do this.

      **Minor comments:**

      Fig.2E: FIP200 seems to be highly overexpressed in this image. Commercial antibodies that recognise endogenous FIP200 are widely used and should be tested to confirm the colocalisation between FIP200 and Rbpt5.

      We plan to do this.

      Fig.7C image: the different setting denoted by +/-, +/+ ..etc are not clearly defined.

      We will improve this.

      Reviewer #2 (Significance (Required)):

      This is a interesting study and provides important mechanistic insights underlying the recognition of perturbed early endosomes by the autophagy machinery. Researchers interested in endosomal trafficking or autophagic substrate recognition are likely to benefit from this study.

      **Referee Cross-commenting**

      In my opinion, the authors have attempted to distinguish single membrane from double membrane LC3 lipidation by looking at the ULK complex requirement. As other reviewers suggested, this can be further confirmed by using ATG16L1 mutants. It is important however that these experiments are supplemented by co-localising autophagy proteins with alternative early endosome markers when Rbpt5 is inhibited.

      I think if the authors are able to address the suggested experiments, this would help improve the manuscript and make it suitable for publication.


      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Millarte and colleagues find that Rabaptin5, a critical regulator of Rab5 activity, and a protein localized to early endosomes, interacts with FIP200 and ATG16L1. This interaction is confirmed and validated by a number of approaches (yeast 2 H, co-immunoprecipitation) and the binding sites of Rabaptin5 are mapped on FIP200 and ATG16L1. More precisely the binding site for ATG16L1 is nicely mapped on Rabaptin 5 by analogy with other ATG16L1 binders. The authors investigate the significance of this binding of Rabaptin5 to the autophagy proteins by proposing this interaction is required for targeting "autophagy to damaged endosomes". Endosomes are damaged with short treatments of chloroquine, a well studied compound previously shown to inhibit autophagy by disrupting fusion of autophagosomes with lysosomes. They propose the recruitment of autophagy (proteins) to the damaged endosomes may allow them to be eliminated. They use another model (phagocytosis of salmonella) to probe the role for rabaptin5 and its partners FIP200 and ATG16L1 in the well-documented role of autophagy on the elimination of salmonella in SCVs (Salmonella containing vacuole) formed from endosomes. Using short infection time points, and the Rabaptin5 mutants which prevent ATG16L1 binding they suggest Rabaptin5 binding contributes to elimination and killing of Salmonella by recruitment of ATG16L1.

      **Major comments:**

      1. The authors make an unfortunate and confusing choice of wording in the title and the text of "autophagy being recruited" to damaged early endosomes. A protein can recruit another protein but it can not recruit a process or pathway to a membrane.

      In the title we use the term "target". It is OK for us to avoid the expression "recruiting autophagy".

      The authors conclude that Rabaptin5 may have a role in autophagy directed to damaged early endosomes. The conclusion that Rabaptin5 binds FIP200 and ATG16L1 are convincing. The main issue is however in identifying what sort of process they are following. They have shown WIPI2 and LC3 can be recruited to early endosomes after 30 min chloroquine treatment but there is no data to explain the consequences of the binding of these proteins. They do not provide proof that canonical autophagosomes are formed which engulf and remove the damaged endosomes, nor do they show that the recruitment of WIPI2 is to a single membrane (presumably damaged early endosomes) which would be a non-canonical pathway. They often use the terminology "chloroquine-induced autophagy" (see Figure 4) but have virtually no proof they have induced either canonical or non-canonical pathways in their experiments. The only evidence they provide that there is some alteration in a membrane-mediated event is increase in lipidation of LC3 in Figure 6. The authors must follow either an early endosome protein or cargo to demonstrate lysosome-mediated degradation indicative of autophagy, or demonstrate the process is a variation on non-canonical autophagy.

      We analyzed transferrin receptor levels with and without CQ to test degradation of an early endosomal marker protein. Since CQ inhibits autophagic flux, this assay may not be very sensitive. Nevertheless, we found a significant reduction of ~15% and ~30% after overnight incubation with CQ in parental HEK293 cells and in Rbpt5-KO cells re-expressing wild-type Rabaptin5, resp., but no reduction in Rbpt5-KO cells expressing the Rabaptin5-AAA mutant defective in binding to ATG16L1:

      • *

      *

      There are concerns about the replicates done for many experiments in particular the co-immunoprecipitations which are not quantified (Figure 1 and 5).

      We will quantify these blots.

      The rescue experiments, even if done with stable cells lines made in the parental HEK293 cell line should be viewed with caution because of the very different amounts of Rabaptin5 (see Figure 6A). The overexpression of Rabaptin5 has not been well studied and comparisons with the mutants are therefore preliminary (Figure 6F and G).

      Fig 6A shows that Rabaptin5 levels are similar except for +Rbpt, where they are higher, and R-KO, which has none. Additional Rabaptin5 seems not to significantly enhance early WIPI and ATG16L1 colocalization.

      Conclusions about the role of the ULK complex, or ULK1 versus ULK2, should be expanded by studying the activity of the complex (phosphorylation of ATG13 for example) in order to make the conclusions more significant.

      We consider this to be beyond the scope of this study. Rabaptin5-dependent autophagy depends on the components of the ULK complex.

      **Minor comments:**

      1. Much of the labelling in the immunofluorescence images is not visible even on the screen version.

      We were careful to have the signals within the dynamic range of the image, but we can enhance the signals for better visibility.

      The LC3-lipidation experiment (Figure 6D) should be re-analysed by normalization to the loading control. The result may be significantly different and is open to re-interpretation. The quality of this western blot is also very poor.

      Quantitation was based on the ratio between LC3B-I and -II or the **percentage of II of the total, always within the same lane and therefore largely independent of loading.

      Reviewer #3 (Significance (Required)):

      This manuscript topic fits into the field of study of canonical versus non-canonical autophagy. This literature is best described as "LAP" first discovered by Doug Green, (Sanjuan in 2009) but more recently as a phenomena induced by monesin, and viral infection amongst others. Most relevant to this study are the references (in the text) by Florey (Autophagy 2015), Fletcher (EMBO J, 2018) and others. However, this manuscript fails to cite and consider the critical findings in a key study published by Lystad et al., Nature Cell Biology 2019, which examines the role of ATG16 in both canonical and non-canonical autophagy. The current study if placed into the context of the Lystad study would have significantly more value, and potentially make the findings more significant.

      We did not refer to Lystad et al. (2019), because they analyzed different ATG16L1 mutants on their contribution to monensin-induced processes on LC3 lipidation after completely blocking canonical autophagy with the ULK inhibitor MRT68921 and the VPS34 inhibitor VPS34IN1. The Rabaptin5-dependent CQ-induced processes are blocked by MRT68921 (Fig. 4C). We plan to refer to this study in the revision.

      Furthermore, the short chloroquine treatments used here could be of interest to the field if using the cited study of Mauthe et al., (which very clearly defines the effect of chloroquine after long (5 hrs treatment)) the authors would to revisit and repeat some of the key experiments in order to demonstrate the effects of 30 minute treatment. Does such short treatment block fusion? Does it affect the pH of the acidic compartments? Does it inactivate the endocytitic pathway? As the manuscript stands the lack of this understanding of the effect of chloroquine at short times, makes the observations difficult to be place into any biological context.

      This reviewer has expertise in autophagy, autophagosome formation and is familiar with the areas of endocytosis and infection.

      **Referee Cross-commenting**

      I think a major concern about the manuscript which is present in all reviews is the lack of clarity about what type of membrane LC3 is added to- is this the damaged endosome or a forming autophagosome? This leads to the question of what type of process is being observed here? non-canonical versus canonical autophagy.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      We thank the reviewers for their constructive and critical feedback on our original manuscript.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): In this study, the authors explored the tissue-specific regulation of DT size using both global and targeted deletion of Fgf9. They found cell hypertrophy and mineralization dynamics of the DT, as well as transcriptional signatures from skeletal muscle but not bone, were influenced by the global loss of Fgf9. Deletion of Fgf9 in skeletal muscle leads to postnatal enlargement of the DT. However, the innovation of this paper is not enough, the phenotypes of global deletion of Fgf9 were previously reported, most of the data in this paper are mainly descriptive analysis of the phenotypes, and internal cellular and molecular mechanisms were not well investigated.

      Here are the major issues:

      1.The data showed that fewer osteoclasts were present at both E16.5 and P0 in Figure 2R, V. Whether FGF9 affects both osteogenesis and osteoclast formation?

      • Authors’ response to Reviewer: Thank you for your feedback. We revised this manuscript to reflect the concerns of Reviewer 1 related to the lack of cellular and molecular mechanisms as described below. **Based on this question from the Reviewer, we have revised our discussion to clarify our findings as follows: “From our EdU proliferation assays, we observed a decline in cell proliferation in Fgf9null attachments, suggesting an accelerated chondrocyte maturation. Though we saw similar levels of Pthlh expression (a chondrocyte hypertrophy suppressor) in both WT and Fgf9null attachments, we also saw increased expression of Gli1 (a marker of chondrocyte hypertrophy) localized to the attachment in Fgf9null embryos compared to WT embryos. This decrease in proliferation was in parallel with increased hypertrophy of chondrocytes adjacent to the attachment cells within the Fgf9null DT, which may have led to a rapid expansion of matrix in the DT. Even though the DT was enlarged in Fgf9null mutants, we found fewer Sost+ cell clusters in their DTs compared to WT mice. Mature osteocytes express Sost (Winkler et al., 2003), and fewer Sost+ cells may indicate an impaired ability of Fgf9null osteoblasts to embed and mature into osteocytes. Overexpression of FGF9 in the perichondrium has been previously shown to suppress chondrocyte proliferation and limit bone growth in the limb (Karuppaiah et al., 2016); in our study, we found that loss of Fgf9 globally leads to an accelerated enlargement of chondrocytes in the tuberosity. This accelerated enlargement may limit the ability of these cells to deposit matrix and mineral and therefore limit osteocyte differentiation. We also found fewer osteoclasts in the Fgf9null DT which mirrors previous reports using the same mutation to study the length and vascularity of developing limb (Hung et al., 2007). Because the DT is enlarged and resides on the surface of a shortened bone, this phenotype may elucidate a divergent role of FGF9 in patterning of an arrested (e.g., attachment) growth plate compared to a regular (e.g., long bone) growth plate. This includes unexplored roles of FGF9 in vascularity of the tendon attachment and formation of bone ridges that overlap with or deviate from its role in growth plate development that are beyond the scope of the current study.”
      1. RNA-sequencing analysis showed the decreased expression of mitochondria/ energy and lipid associated genes in Fgf9 null muscle compared to WT muscle, how does this relate to the enlargement of the DT? What are the detailed molecular mechanisms?
      • Authors’ response to Reviewer:
      • Based on this question from the Reviewer, we have revised our discussion to reflect the potential molecular mechanisms related to muscle mitochondria, fiber type, and metabolism as follows:

      “Fgf9 is expressed in muscle during embryonic stages, which we and others have observed using ISH (Colvin et al., 1999; Garofalo et al., 1999; Hung et al., 2007; Yang and Kozin, 2009). Previous work has established a connection between Fgf9 and muscle, as treatment of muscle and muscle progenitor cells with FGF9 slows maturation, enhances proliferation, and decreases expression of various myogenic genes (Huang et al., 2019). This study found supporting evidence that Fgf9 expression in muscle may be a limiting factor in tuberosity growth. However, it remains unknown how other FGFs and their receptors, FGFRs, regulate superstructure and attachment formation. In this study, we identified potential mediators of skeletal muscle metabolism in Fgf9null muscle, including downregulated mitochondrial-related genes associated with oxidative respiration and proton transport (i.e., Slc36a2 and Ucp1, amongst others). In cultured myoblasts, FGF9 can inhibit myogenic differentiation potentially via increased production of Myostatin (Huang et al., 2019), a well-established mediator of fast glycolytic muscle fibers (Girgenrath et al., 2005; Hennebry et al., 2009). While the role of FGF9 in myoblast fusion has been investigated in vitro, its effect on muscle fiber type and fiber metabolism (i.e., oxidative vs. glycolytic) has not yet been explored. Our findings from bulk RNA-seq of Fgf9null muscle point to potential mechanisms in muscle metabolism that may contribute to the enlarged phenotype that is mimetic of that found in Myostatin deficient mice and other animals (Elkasrawy and Hamrick, 2010; Hamrick et al., 2002). Additionally, further investigations are needed to investigate the potential role of Fgf9 in mitochondrial function and lipid metabolism. Recent work by Huang et al. also identified FGF9 as a potent regulator of calcium signaling and homeostasis in myoblast culture in vitro, and calcium release from the sarcoplasmic reticulum in muscle plays a critical role during embryonic skeletal myogenesis via ryanodine receptor 1 (RYR1). Although Ryr1 was not significantly different in between Fgf9null and WT muscle in the present study, we did find that calmodulin-associated genes (e.g., Calm4, Calml3, Camsap3, Calm5) were all significantly upregulated in Fgf9null muscle compared to WT muscle. Calmodulin interacts with RYR1 and its activation is required for intracellular binding of calcium (Newman et al., 2014, 1). Calmodulin is a crucial component of the calcium signal transduction pathway and also plays an important role in lipid and glucose metabolism (Nishizawa et al., 1988). Taken together, our findings along with recent work by Huang et al. support more mechanistic studies to investigate the metabolic effects of loss and gain of function of Fgf9 on skeletal muscle as well as the muscle secretome.”

      Reviewer #1 (Significance (Required)):

      R1 The authors compared the phenotypes between globally and muscle-specifically deletion of Fgf9 in mice, and found that Fgf9 secreted by muscle may induced the enlargement of the DT. However, the detailed molecular mechanisms were not well investigated.

      **Referees cross-commenting**

      R2 I do not disagree with Rev 1, but I do not think such a task is so trial reason why I don't suggest; it could take years to determine molecular mechanisms of anything. The authors could expand the discussion, offer some possibilities. If they had some RNAseq data they maybe could suggest some of the key signaling pathways involved.

      **Referees cross-commenting**

      R1 We still suggested that the internal cellular and molecular mechanisms should be well investigated in this papaer.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      • This paper deals with an important topic which is exact molecular mechanisms regulating the growth of bony tuberosities; because this region is essential for force transmission and movement.
      • Based on the previous information they had that in the global KO of the gene FGF9 the deltoid muscle is enlarged; and this muscle is in a very important tuberosity; they decided to look at FGF9 as a potential genetic regulator.
      • The manuscript is clear, objective, concise. Very clear. Authors used both the global and targeted deletions, very high reproducibility. Reviewer #2 (Significance (Required)):

      • This manuscript advances several areas since we know little about the mechanisms controlling local mechanisms of tuberosities. It also advances our knowledge of FGF9. There were several studies before mostly in vitro showing that FGF9 when added to muscle cells could arrest myogenesis, but the types of experiments in vivo had not been performed yet. The authors used an array of methods; the studies are unbiased and very rigorous and also they always show all experimental points, which is excellent. The conclusions are supported by the data.

      • The main suggestion for authors: They essentially do not discuss the nature of the potential muscle to bone signaling occurring when they target the deletion of FGF9 in skeletal muscles and muscles enlarge and there is a series of adaptions in the tuberosity. Do the authors believe this to be all the genetic changes or potentially through secreted myokines? In the paper of Huang et al, 2019 the authors document an effect of FGF9 in intracellular calcium homeostasis/signaling; could this be part of the mechanism? Perhaps the authors could propose a model?

      Authors’ response to Reviewer:

      • Future studies could investigate the secretome of muscle in Fgf9null or muscle-specific knockouts, as well as assess calcium signaling homeostasis in Fgf9 mutant muscles. We did find calcium- and ion-associated genes in the RNAseq and revised the discussion to include this information.
      • Based on this question from the Reviewer, we have revised our discussion to reflect the potential molecular mechanisms related to muscle mitochondria, fiber type, and metabolism as follows: “Fgf9 is expressed in muscle during embryonic stages, which we and others have observed using ISH (Colvin et al., 1999; Garofalo et al., 1999; Hung et al., 2007; Yang and Kozin, 2009). Previous work has established a connection between Fgf9 and muscle, as treatment of muscle and muscle progenitor cells with FGF9 slows maturation, enhances proliferation, and decreases expression of various myogenic genes (Huang et al., 2019). This study found supporting evidence that Fgf9 expression in muscle may be a limiting factor in tuberosity growth. However, it remains unknown how other FGFs and their receptors, FGFRs, regulate superstructure and attachment formation. In this study, we identified potential mediators of skeletal muscle metabolism in Fgf9null muscle, including downregulated mitochondrial-related genes associated with oxidative respiration and proton transport (i.e., Slc36a2 and Ucp1, amongst others). In cultured myoblasts, FGF9 can inhibit myogenic differentiation potentially via increased production of Myostatin (Huang et al., 2019), a well-established mediator of fast glycolytic muscle fibers (Girgenrath et al., 2005; Hennebry et al., 2009). While the role of FGF9 in myoblast fusion has been investigated in vitro, its effect on muscle fiber type and fiber metabolism (i.e., oxidative vs. glycolytic) has not yet been explored. Our findings from bulk RNA-seq of Fgf9null muscle point to potential mechanisms in muscle metabolism that may contribute to the enlarged phenotype that is mimetic of that found in Myostatin deficient mice and other animals (Elkasrawy and Hamrick, 2010; Hamrick et al., 2002). Additionally, further investigations are needed to investigate the potential role of Fgf9 in mitochondrial function and lipid metabolism. Recent work by Huang et al. also identified FGF9 as a potent regulator of calcium signaling and homeostasis in myoblast culture in vitro, and calcium release from the sarcoplasmic reticulum in muscle plays a critical role during embryonic skeletal myogenesis via ryanodine receptor 1 (RYR1). Although Ryr1 was not significantly different in between Fgf9null and WT muscle in the present study, we did find that calmodulin-associated genes (e.g., Calm4, Calml3, Camsap3, Calm5) were all significantly upregulated in Fgf9null muscle compared to WT muscle. Calmodulin interacts with RYR1 and its activation is required for intracellular binding of calcium (Newman et al., 2014, 1). Calmodulin is a crucial component of the calcium signal transduction pathway and also plays an important role in lipid and glucose metabolism (Nishizawa et al., 1988). Taken together, our findings along with recent work by Huang et al. support more mechanistic studies to investigate the metabolic effects of loss and gain of function of Fgf9 on skeletal muscle as well as the muscle secretome.

      In conclusion, this work established a new role of skeletal muscle derived Fgf9 during skeletal development and tuberosity growth. Additionally, our unbiased transcriptomic approaches and rigorous analyses identified new potential mechanisms associated with muscle development, mitochondrial bioenergetics, and muscle metabolism that warrant further investigation into the role of FGF9 in muscle-bone crosstalk.”

    1. Author Response:

      Reviewer #1 (Public Review):

      The work by Wang et al. examined how task-irrelevant, high-order rhythmic context could rescue the attentional blink effect via reorganizing items into different temporal chunks, as well as the neural correlates. In a series of behavioral experiments with several controls, they demonstrated that the detection performance of T2 was higher when occurring in different chunks from T1, compared to when T1 and T2 were in the same chunk. In EEG recordings, they further revealed that the chunk-related entrainment was significantly correlated with the behavioral effect, and the alpha-band power for T2 and its coupling to the low-frequency oscillation were also related to behavioral effect. They propose that the rhythmic context implements a second-order temporal structure to the first-order regularities posited in dynamic attention theory.

      Overall, I find the results interesting and convincing, particularly the behavioral part. The manuscript is clearly written and the methods are sound. My major concerns are about the neural part, i.e., whether the work provides new scientific insights to our understanding of dynamic attention and its neural underpinnings.

      1) A general concern is whether the observed behavioral related neural index, e.g., alpha-band power, cross-frequency coupling, could be simply explained in terms of ERP response for T2. For example, when the ERP response for T2 is larger for between-chunk condition compared to within-chunk condition, the alpha-power for T2 would be also larger for between-chunk condition. Likewise, this might also explain the cross-frequency coupling results. The authors should do more control analyses to address the possibility, e.g., plotting the ERP response for the two conditions and regressing them out from the oscillatory index.

      Many thanks for the comment. In short, the enhancement in alpha power and cross-frequency coupling results in the between-cycle condition compared with those in the within-cycle condition cannot be accounted for by the ERP responses for T2.

      In general, the rhythmic stimulation in the AB paradigm prevents EEG signals from returning to the baseline. Therefore, we cannot observe typical ERP components purely related to individual items, except for the P1 and N1 components related to the stream onset, which reveals no difference between the two conditions and are trailed by steady-state responses (SSRs) resonating at the stimulus rate (Fig. R1).

      Fig. R1. ERPs aligned to stream onset. EEG signals were filtered between 1–30 Hz, baseline-corrected (-200 to 0 ms before stream onset) and averaged across the electrodes in left parieto-occipital area where 10-Hz alpha power showed attentional modulation effect.

      To further inspect the potential differences in the target-related ERP signals between the within- and between-cycle conditions, we plotted the target-aligned waveforms for these experimental conditions. As shown in Fig. R2, a drop of ERP amplitude occurred for both conditions around T2 onset, and the difference between these two conditions was not significant (paired t-test estimated on mean amplitude every 20 ms from 0 to 700 ms relative to T1 onset, p > .05, FDR-corrected).

      Fig. R2. ERPs aligned to T1 onset. EEG signals were filtered between 1–30 Hz, and baseline-corrected using signals -100 to 0 ms before T1 onset. The two dash lines indicate the onset of T1 and T2, respectively.

      Since there is a trend of enhanced ERP response for the between-cycle relative to the within-cycle condition during the period of 0 to 100 ms after T2 onset (paired t-test on mean amplitude, p =.065, uncorrected), we then directly examined whether such post-T2 responses contribute to the behavioral attentional modulation effect and behavior-related neural indices. Crucially, we did not find any significant correlation of such T2-related ERP enhancement with the behavioral modulation index (BMI), or with the reported effects of alpha power and cross-frequency coupling (PAC). Furthermore, after controlling for the T2-related ERP responses, there still remains a significant correlation between the delta-alpha PAC and the BMI (rpartial = .596, p = .019), which is not surprising given that the PAC is calculated based on an 800-ms time window covering more pre-T2 than post-T2 periods (see the response to point #4 for details) rather than around the T2 onset. Taken together, these results clearly suggest that the T2-related ERP responses cannot explain the attentional modulation effect and the observed behavior-related neural indices.

      2) The alpha-band increase for T2 is indeed contradictory to the well known inhibitory function of alpha-band in attention. How could a target that is better discriminated elicit stronger inhibitory response? Related to the above point, the observed enhancement in alpha-band power and its coupling to low-frequency oscillation might derive from an enhanced ERP response for T2 target.

      Many thanks for the comment. We have briefly discussed this point in the revised manuscript (page 18, line 477).

      A widely accepted function of alpha activity in attention is that alpha oscillations suppress irrelevant visual information during spatial selection (Kelly et al., 2006; Thut et al., 2006; Worden et al., 2000). However, it becomes a controversial issue when there exists rhythmic sensory stimulation at alpha-band, just like the situation in the current study where both the visual stream and the contextual auditory rhythm were emitted at 10 Hz. In such a case, alpha-band neural responses at the stimulation frequency can be interpreted as either passively evoked steady-state responses (SSR) or actively synchronized intrinsic brain rhythms. From the former perspective (i.e., the SSR view), an increase in the amplitude or power at the stimulus frequency may indicate an enhanced attentional allocation to the stimulus stream that may result in better target detection (Janson et al., 2014; Keil et al., 2006; Müller & Hübner, 2002). Conversely, the latter view of the inhibitory function of intrinsic alpha oscillations would produce the opposite prediction. In a previous AB study, Janson and colleagues (2014) investigated this issue by separating the stimulus-evoked activity at 12 Hz (using the same power analysis method as ours) from the endogenous alpha oscillations ranging from 10.35 to 11.25 Hz (as indexed by individual alpha frequency, IAF). Interestingly, they found a dissociation between these two alpha-band neural responses, showing that the RSVP frequency power was higher in non-AB trials (T2 detected) than in AB trials (T2 undetected) while the IAF power exhibited the opposite pattern. According to these findings, the currently observed increase in alpha power for the between-cycle condition may reflect more of the stimulus-driven processes related to attentional enhancement. However, we don’t negate the effect of intrinsic alpha oscillations in our study, as the current design is not sufficient to distinguish between these two processes. We have discussed this point in the revised manuscript (page 18, line 477). Also, we have to admit that “alpha power” may not be the most precise term to describe our findings of the stimulus-related results. Thus, we have specified it as “neural responses to first-order rhythms at 10 Hz” and “10-Hz alpha power” in the revised manuscript (see page 12 in the Results section and page 18 in the Discussion section).

      As for the contribution of T2-related ERP response to the observed effect of 10 Hz power and cross-frequency coupling, please refer to our response to point #1.

      References:

      Janson, J., De Vos, M., Thorne, J. D., & Kranczioch, C. (2014). Endogenous and Rapid Serial Visual Presentation-induced Alpha Band Oscillations in the Attentional Blink. Journal of Cognitive Neuroscience, 26(7), 1454–1468. https://doi.org/10.1162/jocn_a_00551

      Keil, A., Ihssen, N., & Heim, S. (2006). Early cortical facilitation for emotionally arousing targets during the attentional blink. BMC Biology, 4(1), 23. https://doi.org/10.1186/1741-7007-4-23

      Kelly, S. P., Lalor, E. C., Reilly, R. B., & Foxe, J. J. (2006). Increases in Alpha Oscillatory Power Reflect an Active Retinotopic Mechanism for Distracter Suppression During Sustained Visuospatial Attention. Journal of Neurophysiology, 95(6), 3844–3851. https://doi.org/10.1152/jn.01234.2005

      Müller, M. M., & Hübner, R. (2002). Can the Spotlight of Attention Be Shaped Like a Doughnut? Evidence From Steady-State Visual Evoked Potentials. Psychological Science, 13(2), 119–124. https://doi.org/10.1111/1467-9280.00422

      Thut, G., Nietzel, A., Brandt, S., & Pascual-Leone, A. (2006). Alpha-band electroencephalographic activity over occipital cortex indexes visuospatial attention bias and predicts visual target detection. The Journal of Neuroscience : The Official Journal of the Society for Neuroscience, 26(37), 9494–9502. https://doi.org/10.1523/JNEUROSCI.0875-06.2006

      Worden, M. S., Foxe, J. J., Wang, N., & Simpson, G. V. (2000). Anticipatory Biasing of Visuospatial Attention Indexed by Retinotopically Specific α-Bank Electroencephalography Increases over Occipital Cortex. Journal of Neuroscience, 20(6), RC63–RC63. https://doi.org/10.1523/JNEUROSCI.20-06-j0002.2000

      3) To support that it is the context-induced entrainment that leads to the modulation in AB effect, the authors could examine pre-T2 response, e.g., alpha-power, and cross-frequency coupling, as well as its relationship to behavioral performance. I think the pre-stimulus response might be more convincing to support the authors' claim.

      Many thanks for the insightful suggestion. We have conducted additional analyses.

      Following this suggestion, we have examined the 10-Hz alpha power within the time window of -100–0 ms before T2 onset and found stronger activity for the between-cycle condition than for the within-cycle condition. This pre-T2 response is similar to the post-T2 response except that it is more restricted to the left parieto-occipital cluster (CP3, CP5, P3, P5, PO3, PO5, POZ, O1, OZ, t(15) = 2.774, p = .007), which partially overlaps with the cluster that exhibits a delta-alpha coupling effect significantly correlated with the BMI. We have incorporated these findings into the main text (page 12, line 315) and the Fig. 5A of the revised manuscript.

      As for the coupling results reported in our manuscript, the coupling index (PAC) was calculated based on the activity during the second and third cycles (i.e., 400 to 1200 ms from stream onset) of the contextual rhythm, most of which covers the pre-T2 period as T2 always appeared in the third cycle for both conditions. Together, these results on pre-T2 10-Hz alpha power and cross-frequency coupling, as well as its relationship to behavioral performance, jointly suggest that the observed modulation effect is caused by the context-induced entrainment rather than being a by-product of post-T2 processing.

      4) About the entrainment to rhythmic context and its relation to behavioral modulation index. Previous studies (e.g., Ding et al) have demonstrated the hierarchical temporal structure in speech signals, e.g., emergence of word-level entrainment introduced by language experience. Therefore, it is well expected that imposing a second-order structure on a visual stream would elicit the corresponding steady-state response. I understand that the new part and main focus here are the AB effects. The authors should add more texts explaining how their findings contribute new understandings to the neural mechanism for the intriguing phenomena.

      Many thanks for the suggestion. We have provided more discussion in the revised manuscript (page 17, line 447).

      We have provided more discussion on this important issue in the revised manuscript (page 17, line 447). In brief, our study demonstrates how cortical tracking of feature-based hierarchical structure reframes the deployment of attentional resources over visual streams. This effect, distinct from the hierarchical entrainment to speech signals (Ding et al., 2016; Gross et al., 2013), does not rely on previously acquired knowledge about the structured information and can be established automatically even when the higher-order structure comes from a task-irrelevant and cross-modal contextual rhythm. On the other hand, our finding sheds fresh light on the adaptive value of the structure-based entrainment effect by expanding its role from rhythmic information (e.g., speech) perception to temporal attention deployment. To our knowledge, few studies have tackled this issue in visual or speech processing.

      References:

      Ding, N., Melloni, L., Zhang, H., Tian, X., & Poeppel, D. (2016). Cortical tracking of hierarchical linguistic structures in connected speech. Nature Neuroscience, 19(1), 158–164. https://doi.org/10.1038/nn.4186

      Gross, J., Hoogenboom, N., Thut, G., Schyns, P., Panzeri, S., Belin, P., & Garrod, S. (2013). Speech Rhythms and Multiplexed Oscillatory Sensory Coding in the Human Brain. PLoS Biol, 11(12). https://doi.org/10.1371/journal.pbio.1001752

      Reviewer #2 (Public Review):

      In cognitive neuroscience, a large number of studies proposed that neural entrainment, i.e., synchronization of neural activity and low-frequency external rhythms, is a key mechanism for temporal attention. In psychology and especially in vision, attentional blink is the most established paradigm to study temporal attention. Nevertheless, as far as I know, few studies try to link neural entrainment in the cognitive neuroscience literature with attentional blink in the psychology literature. The current study, however, bridges this gap.

      The study provides new evidence for the dynamic attending theory using the attentional blink paradigm. Furthermore, it is shown that neural entrainment to the sensory rhythm, measured by EEG, is related to the attentional blink effect. The authors also show that event/chunk boundaries are not enough to modulate the attentional blink effect, and suggest that strict rhythmicity is required to modulate attention in time.

      In general, I enjoyed reading the manuscript and only have a few relatively minor concerns.

      1) Details about EEG analysis.

      . First, each epoch is from -600 ms before the stimulus onset to 1600 ms after the stimulus onset. Therefore, the epoch is 2200 s in duration. However, zero-padding is needed to make the epoch duration 2000 s (for 0.5-Hz resolution). This is confusing. Furthermore, for a more conservative analysis, I recommend to also analyze the response between 400 ms and 1600 ms, to avoid the onset response, and show the results in a supplementary figure. The short duration reduces the frequency resolution but still allows seeing a 2.5-Hz response.

      Thanks for the comments. Each epoch was indeed segmented from -600 to 1600 ms relative to the stimulus onset, but in the spectrum analysis, we only used EEG signals from stream onset (i.e., time point 0) to 1600 ms (see the Materials and Methods section) to investigate the oscillatory characteristics of the neural responses purely elicited by rhythmic stimuli. The 1.6-s signals were zero-padded into a 2-s duration to achieve a frequency resolution of 0.5 Hz.

      According to the reviewer’s suggestion, we analyzed the EEG signals from 400 ms to 1600 ms relative to stream onset to avoid potential influence of the onset response, and showed the results in Figure 4. Basically, we can still observe spectral peaks at the stimulus frequencies of 2.5, 5 (the harmonic of 2.5 Hz), and 10 Hz for both power and ITPC spectrum. However, the peak magnitudes were much weaker than those of 1.6-s signals especially for 2.5 Hz, and the 2.5-Hz power did not survive the multiple comparisons correction across frequencies (FDR threshold of p < .05), which might be due to the relatively low signal-to-noise ratio for the analysis based on the 1.2-s epochs (only three cycles to estimate the activity at 2.5 Hz). Importantly, we did identify a significant cluster for 2.5 Hz ITPC in the left parieto-occipital region showing a positive correlation with the individuals’ BMI (Fig. R3; CP5, TP7, P5, P7, PO5, PO7, O1; r = .538, p = .016), which is consistent with the findings based on the longer epochs.

      Fig. R3. Neural entrainment to contextual rhythms during the period of 400–1600 ms from stream onset. (A) The spectrum for inter-trial phase coherence (ITPC) of EEG signals from 400 to 1600 ms after the stimulus onset. Shaded areas indicate standard errors of the mean. (B) The 2.5-Hz ITPC was significantly correlated with the behavioral modulation index (BMI) in a parieto-occipital cluster, as indicated by orange stars in the scalp topographic map.

      Second, "The preprocessed EEG signals were first corrected by subtracting the average activity of the entire stream for each epoch, and then averaged across trials for each condition, each participant, and each electrode." I have several concerns about this procedure.

      (A) What is the entire stream? It's the average over time?

      Yes, as for the power spectrum analysis, EEG signals were first demeaned by subtracting the average signals of the entire stream over time from onset to offset (i.e., from 0 to 1600 ms) before further analysis. We performed this procedure following previous studies on the entrainment to visual rhythms (Spaak et al., 2014). We have clarified this point in the “Power analysis” part of the Materials and Methods section (page 25, line 677).

      References:

      Spaak, E., Lange, F. P. de, & Jensen, O. (2014). Local Entrainment of Alpha Oscillations by Visual Stimuli Causes Cyclic Modulation of Perception. The Journal of Neuroscience, 34(10), 3536–3544. https://doi.org/10.1523/JNEUROSCI.4385-13.2014

      (B) I suggest to do the Fourier transform first and average the spectrum over participants and electrodes. Averaging the EEG waveforms require the assumption that all electrodes/participants have the same response phase, which is not necessarily true.

      Thanks for the suggestion. In an AB paradigm, the evoked neural responses are sufficiently time-locked to the periodic stimulation, so it is reasonable to quantify power estimate with spectral decomposition performed on trial-averaged EEG signals (i.e., evoked power). Moreover, our results of inter-trial phase coherence (ITPC), which estimated the phase-locking value across trials based on single-trial decomposed phase values, also provided supporting evidence that the EEG waveforms were temporally locked across trials to the 2.5-Hz temporal structure in the context session.

      Nevertheless, we also took the reviewer’s suggestion seriously and analyzed the power spectrum on the average of single-trial spectral transforms, i.e., the induced power, which puts emphasis on the intrinsic non-phase-locked activities. In line with the results of evoked power and ITPC, the induced power spectrum in context session also peaked at 2.5 Hz and was significantly stronger than that in baseline session at 2.5 Hz (t(15) = 4.186, p < .001, FDR-corrected with a p value threshold < .001). Importantly, Person correlation analysis also revealed a positive cluster in the left parieto-occipital region, indicating the induced power at 2.5 Hz also had strong relevance with the attentional modulation effect (P7, PO7, PO5, PO3; r = .606, p = .006). We have added these additional findings to the revised manuscript (page 11, line 288; see also Figure 4—figure supplement 1).

      2) The sequences are short, only containing 16 items and 4 cycles. Furthermore, the targets are presented in the 2nd or 3rd cycle. I suspect that a stronger effect may be observed if the sequence are longer, since attention may not well entrain to the external stimulus until a few cycles. In the first trial of the experiment, they participant may not have a chance to realize that the task-irrelevant auditory/visual stimulus has a cyclic nature and it is not likely that their attention will entrain to such cycles. As the experiment precedes, they learns that the stimulus is cyclic and may allocate their attention rhythmically. Therefore, I feel that the participants do not just rely on the rhythmic information within a trial but also rely on the stimulus history. Please discuss why short sequences are used and whether it is possible to see buildup of the effect over trials or over cycles within a trial.

      Thanks for the comments. Typically, to induce a classic pattern of AB effect, the RSVP stream should contain 3–7 distractors before the first target (T1), with varying lengths of distractors (0–7) between two targets and at least 2 items after the second target (T2). In our study, we created the RSVP streams following these rules, which allowed us to observe the typical AB effect that T2 performance was deteriorated at Lag 2 relative to that at Lag 8. Nevertheless, we agree with the reviewer that longer streams would be better for building up the attentional entrainment effect, as we did observe the attentional modulation effect ramped up as the stream proceeded over cycles, consistent with the reviewer’s speculation. In Experiments 1a (using auditory context) and 2a (using color-defined visual context), we adopted two sets of target positions—an early one where T2 appeared at the 6th or 8th position (in the 2nd cycle) of the visual stream, and a late one where T2 appeared at the 10th or 12th position (in the 3rd cycle) of the visual stream. In the manuscript, we reported T2 performance with all the target positions combined, as no significant interaction was found between the target positions and the experimental conditions (ps. > .1). However, additional analysis demonstrated a trend toward an increase of the attentional modulation effect over cycles, from the early to the late positions. As shown in Fig. R4, the modulation effect went stronger and reached significance for the late positions (for Experiment 1a, t(15) = 2.83, p = .013, Cohen’s d = 0.707; for Experiment 2a, t(15) = 3.656, p = .002, Cohen’s d = 0.914) but showed a weaker trend for the early positions (for Experiment 1a, t(15) = 1.049, p = .311, Cohen’s d = 0.262; for Experiment 2a, t(15) = .606, p = .553, Cohen’s d = 0.152).

      Fig. R4. Attentional modulation effect built up over cycles in Experiments 1a & 2a. Error bars represent 1 SEM; p<0.05, * p<0.01.

      However, we did not observe an obvious buildup effect across trials in our study. The modulation effect of contextual rhythms seems to be a quick process that the effect is evident in the first quarter of trials in Experiment 1a (for, t(15) = 2.703, p = .016, Cohen’s d = 0.676) and in the second quarter of trials in Experiment 2a (for, t(15) = 2.478, p = .026, Cohen’s d = 0.620.

      3) The term "cycle" is used without definition in Results. Please define and mention that it's an abstract term and does not require the stimulus to have "cycles".

      Thanks for the suggestion. By its definition, the term “cycle” refers to “an interval of time during which a sequence of a recurring succession of events or phenomena is completed” or “a course or series of events or operations that recur regularly and usually lead back to the starting point” (Merriam-Webster dictionary). In the current study, we stuck to the recurrent and regular nature of “cycle” in general while defined the specific meaning of “cycle” by feature-based periodic changes of the contextual stimuli in each experiment (page 5, line 101; also refer to Procedures in the Materials and Methods section for details). For example, in Experiment 1a, the background tone sequence changed its pitch value from high to low or vice versa isochronously at a rate of 2.5 Hz, thus forming a rhythmic context with structure-based cycles of 400 ms. Note that we did not use the more general term “chunk”, because arbitrary chunks without the regularity of cycles are insufficient to trigger the attentional modulation effect in the current study. Indeed, the effect was eliminated when we replaced the rhythmic cycles with irregular chunks (Experiments 1d & 1e).

      4) Entrainment of attention is not necessarily related to neural entrainment to sensory stimulus, and there is considerable debate about whether neural entrainment to sensory stimulus should be called entrainment. Too much emphasis on terminology is of course counterproductive but a short discussion on these issues is probably necessary.

      Thanks for the comments. As commonly accepted, entrainment is defined as the alignment of intrinsic neuronal activity to the temporal structure of external rhythmic inputs (Lakatos et al., 2019; Obleser & Kayser, 2019). Here, we are interested in the functional roles of cortical entrainment to the higher-order temporal structure imposed on first-order sensory stimulation, and used the term entrainment to describe the phase-locking neural responses to such hierarchical structure following literature on auditory and visual perception (Brookshire et al., 2017; Doelling & Poeppel, 2015). In our study, the consistent results of power and ITPC have provided strong evidence that neural entrainment at the structure level (2.5 Hz) is significantly correlated with the observed attentional modulation effect. However, this does not mean that the entrainment of attention is necessarily associated with neural entrainment to sensory stimulus in a broader context, as attention may also be guided by predictions based on non-isochronous temporal regularity without requiring stimulus-based oscillatory entrainment (Breska & Deouell, 2017; Morillon et al._2016).

      On the other hand, there has been a debate about whether the neural alignment to rhythmic stimulation reflects active entrainment of endogenous oscillatory processes (i.e., induced activity) or a series of passively evoked steady-state responses (Keitel et al., 2019; Notbohm et al., 2016; Zoefel et al., 2018). The latter process is also referred to as “entrainment in a broad sense” by Obleser & Kayser (2019). Given that a presented rhythm always evokes event-related potentials, a better question might be whether the observed alignment reflects the entrainment of endogenous oscillations in addition to evoked steady-state responses. Here we attempted to tackle this issue by measuring the induced power, which emphasizes the intrinsic non-phase-locked activity, in addition to the phase-locked evoked power. Specifically, we quantified these two kinds of activities with the average of single-trial EEG power spectra and the power spectra of trial-averaged EEG signals, respectively, according to Keitel et al. (2019). In addition to the observation of evoked responses to the contextual structure, we also demonstrated an attention-related neural tracking of the higher-order temporal structure based on the induced power at 2.5 Hz (see Figure 4—figure supplement 1), suggesting that the observed attentional modulation effect is at least partially derived from the entrainment of intrinsic oscillatory brain activity. We have briefly discussed this point in the revised manuscript (page 17, line 460).

      References:

      Breska, A., & Deouell, L. Y. (2017). Neural mechanisms of rhythm-based temporal prediction: Delta phase-locking reflects temporal predictability but not rhythmic entrainment. PLOS Biology, 15(2), e2001665. https://doi.org/10.1371/journal.pbio.2001665

      Brookshire, G., Lu, J., Nusbaum, H. C., Goldin-Meadow, S., & Casasanto, D. (2017). Visual cortex entrains to sign language. Proceedings of the National Academy of Sciences, 114(24), 6352–6357. https://doi.org/10.1073/pnas.1620350114

      Doelling, K. B., & Poeppel, D. (2015). Cortical entrainment to music and its modulation by expertise. Proceedings of the National Academy of Sciences, 112(45), E6233–E6242. https://doi.org/10.1073/pnas.1508431112

      Henry, M. J., Herrmann, B., & Obleser, J. (2014). Entrained neural oscillations in multiple frequency bands comodulate behavior. Proceedings of the National Academy of Sciences, 111(41), 14935–14940. https://doi.org/10.1073/pnas.1408741111

      Keitel, C., Keitel, A., Benwell, C. S. Y., Daube, C., Thut, G., & Gross, J. (2019). Stimulus-Driven Brain Rhythms within the Alpha Band: The Attentional-Modulation Conundrum. The Journal of Neuroscience, 39(16), 3119–3129. https://doi.org/10.1523/JNEUROSCI.1633-18.2019

      Lakatos, P., Gross, J., & Thut, G. (2019). A New Unifying Account of the Roles of Neuronal Entrainment. Current Biology, 29(18), R890–R905. https://doi.org/10.1016/j.cub.2019.07.075

      Morillon, B., Schroeder, C. E., Wyart, V., & Arnal, L. H. (2016). Temporal Prediction in lieu of Periodic Stimulation. Journal of Neuroscience, 36(8), 2342–2347. https://doi.org/10.1523/JNEUROSCI.0836-15.2016

      Notbohm, A., Kurths, J., & Herrmann, C. S. (2016). Modification of Brain Oscillations via Rhythmic Light Stimulation Provides Evidence for Entrainment but Not for Superposition of Event-Related Responses. Frontiers in Human Neuroscience, 10. https://doi.org/10.3389/fnhum.2016.00010

      Obleser, J., & Kayser, C. (2019). Neural Entrainment and Attentional Selection in the Listening Brain. Trends in Cognitive Sciences, 23(11), 913–926. https://doi.org/10.1016/j.tics.2019.08.004

      Zoefel, B., ten Oever, S., & Sack, A. T. (2018). The Involvement of Endogenous Neural Oscillations in the Processing of Rhythmic Input: More Than a Regular Repetition of Evoked Neural Responses. Frontiers in Neuroscience, 12. https://doi.org/10.3389/fnins.2018.00095

      Reviewer #3 (Public Review):

      The current experiment tests whether the attentional blink is affected by higher-order regularity based on rhythmic organization of contextual features (pitch, color, or motion). The results show that this is indeed the case: the AB effect is smaller when two targets appeared in two adjacent cycles (between-cycle condition) than within the same cycle defined by the background sounds. Experiment 2 shows that this also holds for temporal regularities in the visual domain and Experiment 3 for motion. Additional EEG analysis indicated that the findings obtained can be explained by cortical entrainment to the higher-order contextual structure. Critically feature-based structure of contextual rhythms at 2.5 Hz was correlated with the strength of the attentional modulation effect.

      This is an intriguing and exciting finding. It is a clever and innovative approach to reduce the attention blink by presenting a rhythmic higher-order regularity. It is convincing that this pulling out of the AB is driven by cortical entrainment. Overall, the paper is clear, well written and provides adequate control conditions. There is a lot to like about this paper. Yet, there are particular concerns that need to be addressed. Below I outline these concerns:

      1) The most pressing concern is the behavioral data. We have to ensure that we are dealing here with a attentional blink. The way the data is presented is not the typical way this is done. Typically in AB designs one see the T2 performance when T1 is ignored relative to when T1 has to be detected. This data is not provided. I am not sure whether this data is collected but if so the reader should see this.

      Many thanks for the suggestion. We appreciate the reviewer for his/her thoughtful comments. To demonstrate the AB effect, we did include two T2 lag conditions in our study (Experiments 1a, 1b, 2a, and 2b)—a short-SOA condition where T2 was located at the second lag of T1 (i.e., SOA = 200 ms), and a long-SOA condition where T2 appeared at the 8th lag of T1 (i.e., SOA = 800 ms). In a typical AB effect, T2 performance at short lags is remarkably impaired compared with that at long lags. In our study, we consistently replicated this effect across the experiments, as reported in the Results section of Experiment 1 (page 5, line 106). Overall, the T2 detection accuracy conditioned on correct T1 response was significantly impaired in the short-SOA condition relative to that in the long-SOA condition (mean accuracy > 0.9 for all experiments), during both the context session and the baseline session. More crucially, when looking into the magnitude of the AB effect as measured by (ACClong-SOA - ACCshort-SOA)/ACClong-SOA, we still obtained a significant attentional modulation effect (for Experiment 1a, t(15) = -2.729, p = .016, Cohen’s d = 0.682; for Experiment 2a, t(15) = -4.143, p <.001, Cohen’s d = 1.036) similar to that reflected by the short-SOA condition alone, further confirming that cortical entrainment effectively influences the AB effect.

      Although we included both the long- and short-SOA conditions in the current study, we focused on T2 performance in the short-SOA condition rather than along the whole AB curve for the following reasons. Firstly, for the long-SOA conditions, the T2 performance is at ceiling level, making it an inappropriate baseline to probe the attentional modulation effect. We focused on Lag 2 because previous research has identified a robust AB effect around the second lag (Raymond et al., 1992), which provides a reasonable and sensitive baseline to probe the potential modulation effect of the contextual auditory and visual rhythms. Note that instead of using multiple lags, we varied the length of the rhythmic cycles (i.e., a cycle of 300 ms, 400 ms, and 500 ms corresponding to a rhythm frequency of 3.3 Hz, 2.5 Hz, and 2 Hz, respectively, all within the delta band), and showed that the attentional modulation effect could be generalized to these different delta-band rhythmic contexts, regardless of the absolute positions of the targets within the rhythmic cycles.

      As to the T1 performance, the overall accuracy was very high, ranging from 0.907 to 0.972, in all of our experiments. The corresponding results have been added to the Results section of the revised manuscript (page 5, line 103). Notably, we did not find T1-T2 trade-offs in most of our experiments, except in Experiment 2a where T1 performance showed a moderate decrease in the between-cycle condition relative to that in the within-cycle condition (mean ± SE: 0.888 ± 0.026 vs. 0.933 ± 0.016, respectively; t(15) = -2.217, p = .043). However, by examining the relationship between the modulation effects (i.e., the difference between the two experimental conditions) on T1 and T2, we did not find any significant correlation (p = .403), suggesting that the better performance for T2 was not simply due to the worse performance in detecting T1.

      Finally, previous studies have shown that ignoring T1 would lead to ceiling-level T2 performance (Raymond et al., 1992). Therefore, we did not include such manipulation in the current study, as in that case, it would be almost impossible for us to detect any contextual modulation effect.

      References:

      Raymond, J. E., Shapiro, K. L., & Arnell, K. M. (1992). Temporary suppression of visual processing in an RSVP task: An attentional blink? Journal of Experimental Psychology: Human Perception and Performance, 18(3), 849–860. https://doi.org/10.1037/0096-1523.18.3.849

      2) Also, there is only one lag tested. The ensure that we are dealing here with a true AB I would like to see that more than one lag is tested. In the ideal situation a full AB curve should be presented that includes several lags. This should be done for at least for one of the experiments. It would be informative as we can see how cortical entrainment affects the whole AB curve.

      Many thanks for the suggestion. Please refer to our response to the point #1 for “Reviewer #3 (Public Review)”. In short, we did include two T2 lag conditions in our study (Experiments 1a, 1b, 2a and 2b), and the results replicated the typical AB effect. We have clarified this point in the revised manuscript (page 5, line 106).

      3) Also, there is no data regarding T1 performance. It is important to show that this the better performance for T2 is not due to worse performance in detecting T1. So also please provide this data.

      Many thanks for the suggestion. Please refer to our response to the point #1 or “Reviewer #3 (Public Review)”. We have reported the T1 performance in the revised manuscript (page 5, line 103), and the results didn’t show obvious T1-T2 trade-offs.

      4) The authors identify the oscillatory characteristics of EEG signals in response to stimulus rhythms, by examined the FFT spectral peaks by subtracting the mean power of two nearest neighboring frequencies from the power at the stimulus frequency. I am not familiar with this procedure and would like to see some justification for using this technique.

      According to previous studies (Nozaradan, 2011; Lenc e al., 2018), the procedure to subtract the average amplitude of neighboring frequency bins can remove unrelated background noise, like muscle activity or eye movement. If there were no EEG oscillatory responses characteristic of stimulus rhythms, the amplitude at a given frequency bin should be similar to the average of its neighbors, and thus no significant peaks could be observed in the subtracted spectrum.

      References:

      Lenc, T., Keller, P. E., Varlet, M., & Nozaradan, S. (2018). Neural tracking of the musical beat is enhanced by low-frequency sounds. Proceedings of the National Academy of Sciences, 115(32), 8221–8226. https://doi.org/10.1073/pnas.1801421115

      Nozaradan, S., Peretz, I., Missal, M., & Mouraux, A. (2011). Tagging the Neuronal Entrainment to Beat and Meter. The Journal of Neuroscience, 31(28), 10234–10240. https://doi.org/10.1523/JNEUROSCI.0411-11.2011

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Summary:

      In this work the authors present a simple mathematical model for the distribution of morphogen molecules that travel via cytonemes through a 1- dimensional system. This model is used as a basis for a software package called Cytomorph that takes as an input a set of experimentally measured distributions of cytoneme dynamics as well as experimenter determined parameters such as contact probability and method of cytoneme growth and retraction. The Cytomorph package then outputs spatial and temporal information on the distribution of morphogen as well as cytonemes and their contacts with cells and other cytonemes, all obtained over thousands of simulation runs. A number of in silico experiments are then performed to show that these outputs agree with experimentally measured morphogen distributions of Hedgehog in the imaginal wing disc and abdominal histoblast nest. Further in silico experimentation is done to study how this distribution is affected by a wide array of parameters such as producer row number, cytoneme connection method, and connection probability function. Comparisons to the traditional diffusion based model are also made. The authors find a suite of results based on these experiments and accordingly present the Cytomorph software package as a useful and adaptable tool for the community.

      Major comments:

      While the various in silico experiments present an expansive and exhaustive study of the different ways in which Cytomorph can be used to examine a cytoneme based distribution system, the machinery behind the software is left notably underdescribed. The authors do not sufficiently make clear what exactly happens within each iteration of the simulations run by Cytomorph, leaving the results irreproducible without the reader going into and deciphering the software code itself.

      In order to improve the description of the mathematical and computational steps behind the software, we have created a visual organigram (new Supplementary Figure S.1) with a detailed depiction of the steps. We have also included a short description in the main text and an extended explanation in the Supplementary Material section.

      Some of the specific details left undiscussed are how it is determined when and where a cytoneme will spawn or what its maximum length will be, the dynamics of morphogen transport within the cytonemes, the effects of one cytoneme making multiple connections on how much morphogen is delivered through each connection, and where exactly stochasticity is introduced so as to allow for variations between simulation runs; amongst others.

      In the new description of the software steps, we have tried to address the Referee’s comments about the dynamics and stochasticity in more detail. In order to help the understanding of the variables, we have also tried to improve their description in the main text.

      Additionally, when the authors investigate the diffusion model their stated boundary conditions do not match those presented at the end of the Materials and Methods section. The initial condition u(x,0)=0 and boundary condition du(L,t)/dt=0 represent a perfectly absorbing molecule sink at the x=L end of the system, not the reflecting boundary condition du(L,t)/dx=0 that would correspond to a zero morphogen flux.

      We thank the Referee for noticing this annotation mistake since the equation is really dx instead of dt. We have corrected this error and included in the Supplementary Materials the exact lines of code used in Matlab pdepe to certify the conditions used in the resolution of the diffusion equation (new Supplementary Figure S.10).

      Finally, while the authors spend a great deal of effort analyzing signal variability between simulation runs, there is no effort made to account for the inherently stochastic nature of molecular production, movement, and degradation. Particularly if molecule numbers are small, fluctuations in these processes could greatly increase signal variability. The authors should either address why these fluctuations are negligible or include them in the modelling.

      This work is mainly focused on the transport of the morphogen; other terms as degradation were introduced directly using published experimental data. Regarding the main concern about the negligibility of the fluctuations for cytoneme transport, we agree with the Referee on the importance of this point. Therefore, we have included a detailed description of the variability and fluctuations in a new section of the Supplementary Material. To help its understanding, we have also included a new Supplementary Figure (Supplementary Figure S.11).

      The largest fluctuations were found at the tail of the morphogen gradient (last rows of receiving cells). Since this corresponds to the region where the amount of morphogen is low, the absolute fluctuations do not change the activation of the low-threshold target. We then conclude that those fluctuations are biologically negligible for our study.

      Minor comments:

      The authors should double check all equation and figure references as I noted several instances in which it appeared that the wrong equation or figure was being referred back to. Similarly, the authors should double check the equations themselves, particularly those in the supplemental material.

      We thank the Referee for noticing these mistakes. We have reviewed those references in order to fix the wrongly linked ones.

      Eqs. SM1.1 and SM1.2 have a plethora of parameters with a wide array of different sub- and superscripts that are left unexplained and possibly incorrectly labelled in some cases,

      Equations SM1.1 and SM1.2 described a general form of Triangular and Trapezoidal dynamics and the different sub- and superscripts come from the published experimental data. Nevertheless, in order to make them more intuitive we have simplified the expressions and included a more detailed description of those parameters and their scripts in the revised version.

      while the second line of Eq. SM2.2 is nonsensical unless r_I*p=0 and p_i<=1.

      We thank the Referee for noticing the uncertainty in this equation, since it was written in an iterative syntax as it is coded in the software. Therefore, in the code we did not have this nonsensical range of data, but we agree that it should be specified with a mathematical syntax as the rest of the equations in the manuscript. Therefore, we redefined the notation and specified better the numerical domains of those variables.

      Additionally, the notation used in Figs. 5 and 6 as well as the bottom part of Fig. 7 is confusing. The caption should more explicitly state what the various expressions in the second row of each column represent.

      The second row represents the statistical analysis between cases coded in a color matrix, as it is described in the footnote. We thank the Referee for this recommendation because this is not the usual representation. Therefore, we have changed the previous explanation to one hopefully clearer and intuitive; we have also included a specific label in the figures.

      In Fig. 5A specifically it is unclear what exactly the variable phi represents.

      Phi is a widely used annotation in biology to define cell size diameter and cell position. We didn´t realize it could be unclear. For a better understanding within a multidisciplinary field we have changed this symbol.

      Does it have anything to do with the phi that is used as a position variable for the cells, and if it is a ratio of cytoneme length to cell diameter then why does it have units of microns?

      We agree that this phi notation is confusing. It has been used to indicate distance position as well as cell diameter. Although these variables are biologically related, in the new version of the manuscript we have changed the notation to separate both concepts and avoid misunderstandings.

      Significance:

      As the Cytomorph model and software can be applied to a wide variety of systems involving morphogen transport via cytonemes, it provides a technical advance in our ability to analyze and discuss the results of measurements on cytonemes in a more homogenous way. This work and the resulting software is particularly applicable to and build off of studies done by other groups that study the dynamics of cytonemes such as the Kornberg lab (works from which are cited by the authors) and the Scholpp lab (such as Stanganello E, Scholpp

      S. Role of cytonemes in Wnt transport. J Cell Sci. 2016; 129(4):665-672), and as such it is experimental labs such as these that will be the most interested in this manuscript and its findings.

      My field of expertise lies primarily in stochastic modeling and linear response theory. As such, I feel I do not have sufficient expertise to evaluate the experimental methods outlined in this manuscript and determine their level of scientific rigor.

      Reviewer #2

      The manuscript "Improving the understanding of cytoneme-mediated morphogen gradients by in silico modelling" addresses the role of in silico modelling in understanding pattern formation via cytonemes: filopodia that transport signalling molecules to and from cells. Investigating the role of cytonemes and, in particular, their dynamics, during development is an important and emerging field in developmental biology, and there is great potential for mathematical modelling to aid in understanding these processes.

      The present manuscript attempts to derive a general set of equations describing pattern formation in the context of cytonemes, akin to that of the classic Turing model of morphogenesis. The authors replace the standard diffusion term in the PDE with a non-local term, intended to represent transport via cytonemes. This model is then posed over a one-dimensional domain with a source at one end and no flux boundary conditions at the other and is shown to be able to generate a morphogen gradient profile that could pre-pattern a biological tissue. The model is tested against a key experimental system, namely, Hh signalling in the Drosophila wing imaginal disc and is shown to reproduce some experimental results. Finally, the authors have developed a Matlab-based software package that they claim will be applicable to a wide range of systems. This GUI-based software allows users to input experimentally measured averages of cytoneme properties and explore the effect of these properties on tissue patterning.

      My primary concern is that the paper presents itself as a mathematical model of cytoneme formation in general. The authors themselves state in their introduction that the mechanisms for cytoneme generation and maintenance are presently unknown. In fact, it is not even known if they are consistent across biological systems (and in fact, are probably not in general). As such, any present instantiation that connects cytoneme dynamics to tissue patterning can only hope to be specific to a particular system (in this case, the Drosophila wing imaginal disc.

      As mentioned in the introduction, the connection of cytonemes with patterning has been described in several works. We had included a list of publications describing the implication of cytoneme-mediated signaling for several morphogens (FGF, Egf, Hh, Dpp, Wnt or Notch) and in many vertebrate and invertebrate systems (Drosophila, chicken, Xenopus, Zebra fish, mouse and human tissue culture cells).

      Whilst one may use general models (like the heat equation) to study pattern formation since it requires only specification of parameters, the model here requires specification of families of functions, that are likely to differ from context to context and so the model is not general.

      Our model inputs are parameters determined experimentally rather than families of functions. This misunderstanding might derive from the use of triangular and trapezoidal dynamics, which are equations included in the software code but not input functions. To avoid this confusion, we have specified the input data in tables S.1 and S.2 and clarified in the main text that the triangular or trapezoidal family of functions are just the names for the basic dynamics of cytonemes (triangular for elongation and retraction, and trapezoidal when there is a stationary phase in between).

      Ultimately, the model is a statistical modelling framework masquerading as a mechanistic one.

      In this work, we have not specified the mathematical area to which the model belongs. Furthermore, we always explicitly described the different variables and functions modeled. Therefore, we do not understand what the supposed masquerade is.

      As further evidence of the lack of generality of the model, the studied domain is only one dimensional and has signalling sources at one end. This scenario is perfectly adequate for theoretical explorations of pattern-forming systems but is highly unlikely to capture the geometrical intricacies of real-world systems (and I note that even in the diffusive case, boundary conditions are critical for understanding what patterns ultimately arise for a given system).

      We agree with the Referee that there are cases in biological systems in which it is required to work in 2D or even 3D to have a full comprehension of the process. Nevertheless, those are mainly related to biological patterns rather than to biological signaling gradients, which usually are studied (experimental and theoretically) in 1D. Therefore, we have limited our model to this case and compared our in silico results with the published experimental data. In any case, we have emphasized in the text that our model is limited to signaling gradients with the source at one end, which is the case of the best studied morphogens: Hh (Sonic-hh), Dpp (BMP) or Wg (Wnt).

      Actually, as prove of the generality of the model, we have predicted different properties of Dpp and Wg gradients using our model. We then validated the simulated results using the experimental data obtained from independent publications.

      To simulate their model, the authors need to specify triangular and trapezoidal functions, which are unlikely to be generalisable to all contexts. As such, the model is not general and, in particular, there is no way to change the software to make it so.

      Cytonemes are filopodial structures based on actin filaments that polymerize and depolymerize to elongate and retract. This is a general process for all filopodial structures and it is why cytonemes were classified in a previous published work as a triangular behavior or, if this dynamic has a stationary phase, as a trapezoidal behavior (Gonzalez-Méndez et al., 2017). Therefore, these functions are just a categorization introduced to better describe the intrinsic dynamics of cytonemes, that could be applied to most of the experimental cases. To attend this Referee’s concern, we have included in the introduction a more detailed description of these behaviors, as well as the references of publications describing the dynamic behaviors of cytonemes for different morphogens and in different organisms.

      Trying to make a generalization for all cases, we included in the model those situations in which the cytonemes were static rather than dynamic (detailed simulations comparing dynamic and static cases can be found in the old Supplementary Figure S.5 A (now S.7 A)).

      We have concluded that the model can be considered generalizable since it includes the simplest and most general cases in terms of cytoneme dynamics.

      Whilst the development of a GUI for this scenario is a nice contribution, I feel that the lack of generalisability will, at best, mean that the software enjoys little use, and at worst, may lead researchers unfamiliar with the modelling context to misuse it in error.

      Once we knew the model could be generalized, we were concerned about the misuse of the mathematical model, and that was the reason why we decided to develop a GUI as simple as possible.

      Furthermore, in the online repository there is, together with the open software, an user guide of Cytomorph with a full description of parameters, variables and outputs and how to use them properly.

      In my opinion, this work would be better suited as a presentation of specific mathematical modelling of tissue patterning in the Drosophila wing imaginal disc. In this case, many of the above concerns would be addressed.

      We have rewritten part of the text to indicate the limits of the model and make clear that it has been tested experimentally for the Hh pathway and in two different developing systems: wing imaginal discs and abdominal histoblast nests.

      As evidence of a more general use of Cytomorph, we have added in the revised version of the manuscript a new section focused on data prediction for the gradients of Dpp and Wg. We have also included supplementary figures that validate the predictions of our model using published experimental data.

      That said, there are still a number of issues with the presentation of the model and results. I shall detail these in the bullet point list below:

      1. The domain for Eq. 1 needs to be made explicit. Later, it appears that the domain is a closed one-dimensional interval, but the use of arrows here implies that x is a vector and hence x ∈_ D _Rn with n > 1.

      We initially described the general equation for morphogens as x ∈ ℝ𝑛 and later we limited it to 1D. This is why at the beginning x, as a vector, contained an arrow, although later it was a scalar variable. Since we were interested in 1D in this work, to avoid this kind of misunderstanding we have rewritten from the beginning the equations as 1D and clearly specified the x domain used: the set of natural numbers x ∈ ℕ0.

      1. It is unclear over what the sum in Eq. 2 is being taken.

      The sum in Eq. 2 is over the number of producing cell rows. We have changed the notation to clarify this point.

      1. The statement "we used the discrete cell position x = φ as spatial coordinate" is vague and does not help the reader understand the discretization._

      The number of cell diameters is a widely used discrete unit for position in Developmental Biology. As we expect the readers of this publication to be multidisciplinary, we have changed the notation to avoid misunderstandings and clarify this discretization.

      1. p is used both as a probability and as an index for producer cells. This is confusing._

      We have changed the notation to avoid misunderstandings.

      1. As previously stated, the choice of trapezoidal/triangular cytoneme dynamics is not general. More work needs to be done to showcase how the authors came to the conclusion that this is the best choice, and how the functions (and their associated parameters) describing them were selected.

      The names triangular and trapezoidal stand for the published dynamics for elongation and the retraction of cytonemes and we already argued about its generality. As we specified in the manuscript, these types of behaviors have been experimentally observed and, therefore, we considered that the experimental observation was reason enough to include them in the model. If more details are required, the Material and Method section and the Supplementary Table S.3 show that the times measured for triangular and trapezoidal dynamics are statistically different and, consequently, both behaviors have to be considered.

      As mentioned in the manuscript, the associated parameters represent the times and velocities for the elongation or retraction that have already been thoroughly analyzed and published (González-Méndez et al., 2017). The question of the Referee about how these functions affect the gradient is answered in the text and in Figure 7 F.

      1. I can see how Type 1 and Type 2 cytonemes could be expanded naturally to a higher dimensional case, but it is not clear how Type 3 cytonemes could be, since the probability of any two cytonemes occupying the same space in higher dimensions is likely to be small (if they are imbued with independent dynamics).

      We agree with the Referee on this point. It is something that shall be considered for future improvements of the model in higher dimensions. For instance, a complex scenario in 2D will be required of a cytoneme guiding model. Nevertheless, since the present study is limited to 1D, this concern is not applicable for the current model.

      1. The statement: "the distance between cells must be smaller than, or equal to, the maximum length of the cytonemes" seems inconsistent with the equations below since λ(t) does not appear to be a maximum length.

      The length of the cytonemes is controlled as a dynamic function described by λ(t). Our statement referred to the maximum length for each time step that is given by λ(t). We agree that the initial statement could lead to misunderstanding, so we have suppressed the word “maximum”.

      1. I think the authors are confusing probabilities and rates in their discussion of the model. Eq. 1 is a density model and so calling events probabilities here is slightly misleading. As a more general statement, I am currently interpreting contact function C as one defined as a rate, rather than as a set of probabilistic terms. If the latter is true, then Eq. 1 is invalid since it mixes processes at different levels of description._

      We thank the Referee for this comment. We have studied in depth this observation but we could not exactly find why the Referee considers that the model is working at different levels. Even though we could not find where in the text we called “probabilities” to the events of eq1, we rewrote the text to make clear what we consider either probability or rate. In addition, in the Supplementary Material section we clarify how the model works and at what levels of modeling we are working.

      Significance

      In general, the paper is well written, however, the focus of the findings should be on patterning within an epithelium such as the Drosophila wing imaginal disk.

      The work will be interesting for the developmental biology community as well as for the upcoming biomathematical modelling community.

      Expertise: Developmental biologist with experience in tissue patterning and morphogen gradients

      Referees cross-commenting

      I agree with Reviewer 3 that the importance of cytoneme-mediated signalling has been described in several systems - invertebrates and vertebrates. However, I think the focus of this work in particular should be on cytoneme signalling in the wing imaginal disc. IMO, this would not limit the conclusion but rather focus it and make it then applicable to epithelial tissues in general. I agree with the other point.

      Reviewer #3

      There is much to like in this thoughtful and worthwhile study that develops mathematics to describe how cytonemes might generate experimentally observed Hh gradients. Two suggestions:

      1. I am not equipped to evaluate the mathematics and as a non-expert would find it helpful if the authors explicitly stated at the outset what assumptions they took, the specific contexts they sought to model, and the parameters that they explored.

      We agree with the Referee on the excessively mathematical focus of our interpretation of the results in the old version of the manuscript. We have rewritten part of the text to clarify the biological implications of the variables and simulations explored.

      Am I correct that they assume that the Hh gradient correlates with a cytoneme gradient, that all cytoneme contacts have the same duration and exchange equivalent amounts of Hh, and that the variables that were characterized are cytoneme length distributions, cytoneme extension rate, contact duration, and cytoneme density?

      Since the mechanism of morphogen exchange is not fully identified, we assumed the simplest case in which all the contacts have the same duration and exchange the same amount of morphogen. Using this approach, we were able to reproduce the gradient and concluded that it is not strictly necessary to propose a more complex mechanism to establish a graded distribution of morphogens. We therefore worked under this assumption.

      The variables characterized were the ones pointed out by the Referee, mainly cytoneme features, as the cytoneme length distributions or the different parameters of the temporal dynamics. We tried to define better these variables in the new version of the manuscript.

      1. One of the unusual features of the Hh gradient in the wing disc is that the size of the posterior compartment field of Hh-producing cells is large relative to the size and extent of the Hh gradient in the adjacent anterior compartment. Wing discs with large hh mutant clones, wing discs with large smo mutant clones, and wing discs with ttv mutant clones that block Hh uptake provide evidence that the Hh gradient is constituted with Hh that is produced by many cells, some that are far from the compartment border as well as some that are close. Has this been factored into the author's model?

      Indeed. Being aware of the importance of the size of the signal source, we simulated how changing the size of the posterior compartment affects the gradient (altering the number of producing cell rows involved, figure 5B). In the old version of the manuscript we had focused on the theoretical approach, so we thank the reviewer for noticing that we should introduce a more biological point of view. Therefore, we included in the revised version of the manuscript a biological interpretation of how our simulations can help to understand the question posed by the reviewer.

      Does the fact that the relative size of the posterior compartments and Hh gradients in the histoblasts is not as extreme as it is in the wing disc influence their model?

      Following the Referee’s question, we decided to simulate the influence of the relative size of the posterior compartment in the abdominal histoblast nests. We found that in both wing discs and histoblasts, the size of the posterior compartment affects the gradient but in a different scale factor. We have included these data in the revised version of the manuscript (new supplementary figure S.5).

      Interestingly, this feature of the Hh gradient in the wing disc is not shared with other gradients in the wing disc such as the Wg, Dpp, and Bnl gradients. I would be interested to know if the author's model can be queried to suggest what properties might contribute to this difference?

      In order to answer the reviewer question, we have used our model to tentatively simulate Wg and Dpp gradients. Our preliminary results suggest that considering only cell position and cytoneme length, the Wg and Dpp gradient lengths can be predicted in wing imaginal disc. Nevertheless, each morphogen has its own particularities and further studies are required for a precise simulation of these gradients. We included these results in a new section of the manuscript and in the new Supplementary Figure S.9.

      Significance

      This is an important contribution to gaining a basic understanding of the role of various properties of dynamic cytonemes to gradient formation.

      Referees cross-commenting

      I discount the apparently strongly held opinion of Reviewer #2 that "it is not even known if they [cytonemes] are consistent across biological systems (and in fact, are probably not in general)". I do not know where this comes from and do not think that such opinions are appropriate for anonymous reviews.

      Cytoneme-mediated signaling has in fact been observed and characterized in many diverse biological systems. I submit that in contrast, mechanisms of dispersion based on diffusion are inferred and lack direct experimental evidence. I do agree that it is fair to ask the authors to carefully describe their work in the context of epithelial signaling, but it is not correct to ask them to limit their conclusions to the wing disc as the authors analyze both wing disc and histoblast signaling. They clearly state that their work is limited to 1D and so we understand that it is inadequate to model 3D morphologies. I do not criticize them for this.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The manuscript "Improving the understanding of cytoneme-mediated morphogen gradients by in silico modelling" addresses the role of in silico modelling in understanding pattern formation via cytonemes: filopodia that transport signalling molecules to and from cells. Investigating the role of cytonemes and, in particular, their dynamics, during development is an important and emerging field in developmental biology, and there is great potential for mathematical modelling to aid in understanding these processes.

      The present manuscript attempts to derive a general set of equations describing pattern formation in the context of cytonemes, akin to that of the classic Turing model of morphogenesis. The authors replace the standard diffusion term in the PDE with a non-local term, intended to represent transport via cytonemes. This model is then posed over a one-dimensional domain with a source at one end and no flux boundary conditions at the other and is shown to be able to generate a morphogen gradient profile that could pre-pattern a biological tissue. The model is tested against a key experimental system, namely, Hh signalling in the Drosophila wing imaginal disc and is shown to reproduce some experimental results. Finally, the authors have developed a Matlab-based software package that they claim will be applicable to a wide range of systems. This GUI-based software allows users to input experimentally measured averages of cytoneme properties and explore the effect of these properties on tissue patterning.

      My primary concern is that the paper presents itself as a mathematical model of cytoneme formation in general. The authors themselves state in their introduction that the mechanisms for cytoneme generation and maintenance are presently unknown. In fact, it is not even known if they are consistent across biological systems (and in fact, are probably not in general). As such, any present instantiation that connects cytoneme dynamics to tissue patterning can only hope to be specific to a particular system (in this case, the Drosophila wing imaginal disc. Whilst one may use general models (like the heat equation) to study pattern formation since it requires only specification of parameters, the model here requires specification of families of functions, that are likely to differ from context to context and so the model is not general. Ultimately, the model is a statistical modelling framework masquerading as a mechanistic one.

      As further evidence of the lack of generality of the model, the studied domain is only one dimensional and has signalling sources at one end. This scenario is perfectly adequate for theoretical explorations of pattern-forming systems but is highly unlikely to capture the geometrical intricacies of real-world systems (and I note that even in the diffusive case, boundary conditions are critical for understanding what patterns ultimately arise for a given system). To simulate their model, the authors need to specify triangular and trapezoidal functions, which are unlikely to be generalisable to all contexts. As such, the model is not general and, in particular, there is no way to change the software to make it so. Whilst the development of a GUI for this scenario is a nice contribution, I feel that the lack of generalisability will, at best, mean that the software enjoys little use, and at worst, may lead researchers unfamiliar with the modelling context to misuse it in error.

      In my opinion, this work would be better suited as a presentation of specific mathematical modelling of tissue patterning in the Drosophila wing imaginal disc. In this case, many of the above concerns would be addressed. That said, there are still a number of issues with the presentation of the model and results. I shall detail these in the bullet point list below:

      1. The domain for Eq. 1 needs to be made explicit. Later, it appears that the domain is a closed one-dimensional interval, but the use of arrows here implies that x is a vector and hence x ∈ D ⊂ Rn with n > 1.
      2. It is unclear over what the sum in Eq. 2 is being taken.
      3. The statement "we used the discrete cell position x = φ as spatial coordinate" is vague and does not help the reader understand the discretization.
      4. p is used both as a probability and as an index for producer cells. This is confusing.
      5. As previously stated, the choice of trapezoidal/triangular cytoneme dynamics is not general. More work needs to be done to showcase how the authors came to the conclusion that this is the best choice, and how the functions (and their associated parameters) describing them were selected.
      6. I can see how Type 1 and Type 2 cytonemes could be expanded naturally to a higher dimensional case, but it is not clear how Type 3 cytonemes could be, since the probability of any two cytonemes occupying the same space in higher dimensions is likely to be small (if they are imbued with independent dynamics).
      7. The statement: "the distance between cells must be smaller than, or equal to, the maximum length of the cytonemes" seems inconsistent with the equations below since λ(t) does not appear to be a maximum length.
      8. I think the authors are confusing probabilities and rates in their discussion of the model. Eq. 1 is a density model and so calling events probabilities here is slightly misleading. As a more general statement, I am currently interpreting contact function C as one defined as a rate, rather than as a set of probabilistic terms. If the latter is true, then Eq. 1 is invalid since it mixes processes at different levels of description.

      Significance

      In general, the paper is well written, however, the focus of the findings should be on patterning within an epithelium such as the Drosophila wing imaginal disk.

      The work will be interesting for the developmental biology community as well as for the upcoming biomathematical modelling community.

      Expertise: Developmental biologist with experience in tissue patterning and morphogen gradients

      Referees cross-commenting

      I agree with Reviewer 3 that the importance of cytoneme-mediated signalling has been described in several systems - invertebrates and vertebrates. However, I think the focus of this work in particular should be on cytoneme signalling in the wing imaginal disc. IMO, this would not limit the conclusion but rather focus it and make it then applicable to epithelial tissues in general. I agree with the other point.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to Reviewers "Cell-cell communication through FGF4 generates and maintains robust proportions of differentiated cell types in embryonic stem cells"

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this manuscript Raina et al. use an in vitro model of PE specification based on the transient overexpression of GATA4 in ESCs to show that the acquisition of primitive endoderm (PE) identity is governed at the population levels by cell-cell interactions mediated by FGF signaling. The authors further argue that the specification of a defined proportion of "PE" and "Epiblast" cells in a differentiating population of ESC is an emergent property of a system where paracrine signaling shifts the balance between two alternative stable states. Overall, the work does not reach radically new conclusions: broadly similar models are outlined in several other publications, including from the authors. Yet this study makes use of elegant genetic models and is particularly well executed. In addition, it includes a very accurate characterisation of the spatial range of FGF signaling activity that is original and adds on the existing knowledge. Moreover, the authors show novel evidence suggesting that GATA factors inhibits Fgf4 transcription and the activity of the FGF signaling pathway in ESCs.

      We thank the Reviewer for commending the execution of the experiments, and for highlighting the novel insights that they bring. The Reviewer acknowledges that the specification of a defined proportion of PrE-like and Epiblast-like cells in a differentiating population of ESCs is an emergent property which is mediated by paracrine FGF4 signaling. This has not been experimentally demonstrated before. In contrast to the Reviewer’s assertion, we therefore think that our work does reach a conclusion that is radically different from previous experimental studies, a view that is also shared by Reviewer #3 below. In a revised version of the manuscript we will further emphasize the conceptual differences between published models that focus on single cell dynamics, and our experimental and theoretical demonstration of qualitatively different dynamics that emerge at the population level as a consequence of cell fate coupling.

      **Two major points deserve further clarification:**

      In this manuscript the authors claim that the proportions of cells acquiring PE fate is, at least in the experimental setup adopted, largely independent from the levels of GATA4 induction, and therefore of the initial state of the gene regulatory network regulating this cell fate transition. However, the authors should discuss how the current findings relate to their previous results, showing that the duration/levels of Gata4 induction, in a similar experimental setting, play an important role in determining the final proportion of cells cell acquiring "PE" fate. Absolute expression levels may be crucial for this distinction, but the authors seem to exclude this possibility (see figure S3).

      The different roles of GATA4-mCherry induction levels for determining the final proportion of cells acquiring a PrE-like fate reported in our previous (PMID: 26511924) and the current work is because of important differences in the experimental settings between the two studies. In PMID: 26511924, we assayed PrE-like differentiation in medium supplemented with serum and LIF, which provides exogenous signals that promote PrE-like differentiation. These conditions reveal the function of the cell-autonomous circuit, in which GATA4-mCherry levels do control the probability of PrE-like differentiation. In the current work, we likewise observe that cell type proportions depend on GATA4-mCherry induction levels when we supply exogenous FGF4 during the differentiation of wild type cells (Figures S2C and S3D, lower panel). Differentiation in the absence of exogenous factors in contrast reveals the behavior of the coupled system, in which cell type proportions are independent from GATA4-mCherry induction levels.

      Furthermore, in the present manuscript, we use new inducible cell lines in which the majority of cells can be induced above the critical GATA4-mCherry threshold required for PrE-like differentiation, in contrast to our previous study where the distribution of GATA4-mCherry induction levels was straddling this threshold.

      In a revised version of the manuscript, we will more explicitly emphasize these important differences in the experimental design between the two studies, and discuss how the specific conditions in the present study lead to new conclusions.

      Most importantly, the authors incorporate in their model the notion that GATA6 inhibits FGF signaling. It would be interesting to understand how such inhibition is mechanistically mediated. For instance GATA6 has been shown to bind in proximity of the Fgfr2 gene (Wamaitha et al., Genes and Dev., 2015). Alternatively, the authors show a direct effect on Fgf4 expression. The short time window of the reported repressive transcriptional effects (8h, Fig 2 middle), might suggest a direct regulation. The authors should test this possibility, and discuss what alternative modes of regulation could be envisaged (for instance, indirect effects mediated by Nanog). This is a key result that deserves a more detailed mechanistic characterisation.

      The regulation of FGF signaling by GATA factors has been pointed out as a central new result of our study by all three reviewers that we will be happy to further expand on in a revised manuscript. Regulation of Fgfr2 expression by GATA6 as suggested by the ChIP-seq data in Wamaitha et al., 2015 (PMID: 26109048) is one possible mechanistic explanation that we will of course discuss.

      Most importantly, we will test possible direct effects of GATA factors on Fgf4 expression that are indicated by the short timescales of the transcriptional effects shown in Fig. 2, as noted by the Reviewer. We have already mined the ChIP-seq data from Wamaitha et al., 2015 (PMID: 26109048) and found a GATA6-binding peak approximately 10 kb upstream of the Fgf4 start codon in a region that is highly enriched for GATA6 consensus binding sites. To test the functional role of this binding region, we propose to delete it by CRISPR-mediated mutagenesis in the inducible lines, and to test its ability to regulate reporter gene expression in heterologous assays.

      To address the question of alternative modes of regulation of Fgf signaling through NANOG, we have already performed in situ mRNA stainings for Fgf4 expression in cells grown for 40 h in N2B27 medium. While Nanog expression is much reduced under these conditions, Fgf4 mRNA continues to be expressed, indicating that positive regulation through NANOG is not essential for Fgf4 mRNA expression in ESCs. We will add this data to a revised manuscript, and discuss its implications for the regulation of Fgf4 transcription (see also our response to Reviewer #3 below). As a complementary approach to further test the role of indirect effects mediated through NANOG, we will dissect more closely the timing of Fgf4 downregulation reported in Fig. 2B relative to the upregulation of the inducible GATA4-mCherry protein and the downregulation of NANOG protein.

      **Minor points:**

      Fig S1: The authors should show quantifications of Nanog and GATA6 levels before the beginning of the differentiation protocol.

      We will be happy to add this data in a revised version, as part of a more extensive analysis of GATA4-mCherry and GATA6 expression at early stages of the differentiation protocol. See also our response to the next point.

      Line 106: The authors write "the initially large proportion of GATA6+; NANOG+ double positive cells". It appears that at 16h of differentiation ESCs have already partitioned between Gata6 or Nanog expressing cells. The authors should rephrase the sentence to reflect what seems to be an almost total absence of truly double positive cells. Possibly, an analysis conducted at earlier time points could clarify these dynamics.

      The Reviewer rightly points out that at 16 h of differentiation, most cells are already associated with one of two clusters in the NANOG/GATA6 expression space. The misleading classification of a large number of cells as double positive at 16 h was caused by applying a single gating strategy to the entire experiment, even though the mean expression levels of NANOG and GATA6 in the two clusters change significantly over time. We will update our gating strategy and rephrase this section to more appropriately describe cell clustering and gene expression dynamics over the time course. We will also extend Figure S1 with analysis of GATA6 and NANOG expression levels at earlier time points of the differentiation protocol, to test whether this allows detecting a truly double positive population.

      Line 124: The authors write "... concentration dependent downregulation of NANOG expression". The effects may rather depend on the time of doxycycline stimulation.

      We agree with the Reviewer that in isolation, the data shown in Fig. 1 and Fig. S2 leave open the possibility that the stronger downregulation of NANOG at higher GATA4-mCherry expression levels is caused by the extended time of doxycycline stimulation rather than GATA4-mCherry concentration. However, in our opinion, this concern is already addressed by the experiments performed in the four clonal lines with independent integrations shown in Figure S3. Here, the time of doxycycline induction is held constant, and a similar relationship between GATA4-mCherry and NANOG expression levels is observed as in the experiments where we modulate induction time in a single clonal line (compare Fig. S2A to Fig. S3B). In a revised version of the manuscript we will describe more clearly how the experiments shown in Figure S3 control for time-dependent effects of doxycycline stimulation.

      Line 192: The authors write "...and confined to cells with low GATA4-mCherry expression levels". It would be helpful to have an indication of the cell boundaries, possibly showing localisation of a membrane bound protein.

      We agree that more firmly establishing a correlation between GATA4-mCherry expression levels and Fgf4 mRNA expression in single cells would greatly benefit from co-staining with a plasma membrane marker. However, the protocol for mRNA in situ hybridization involves incubation steps with ethanol and formamide and is thus incompatible with staining for commonly used membrane markers. There is one commercially available membrane stain (CellBrite by Biotium) that promises to survive the treatments necessary for in situ hybridization and that we will try to use in our stainings. Should this not be successful, we will resort to identifying a subset of the cytoplasm corresponding to each nucleus by dilating nuclear masks that we will segment based on the DNA stain.

      It would be interesting for the authors to discuss how the spatial range of FGF activity measured in culture could affect PE specification in the embryo.

      During lineage specification in the embryo, Epi and PrE cells are initially arranged in a salt-and-pepper pattern (PMID: 16678776; PMID: 18725515; PMID: 30514631). In Fig. 4 and Fig. S9 of our manuscript, we show experimentally and theoretically how similar patterns in ESC colonies arise from the short range of FGF activity. In a revised version of the manuscript, we will discuss how the spatial range of FGF activity measured in culture provides a possible mechanistic explanation for the spatial arrangement of cell types in the embryo.

      Reviewer #1 (Significance (Required)):

      See above.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In their manuscript entitled "Cell-cell communication through FGF4 generates and maintains robust proportions of differentiated cell types in embryonic stem cells" Raina et al study the effect of Fgf-signalling based local cell-cell communication for the establishment of PrE-like and Epi-like cells. The authors use an elegant, albeit artificial, system to analyse the effect of Fgf signalling on establishing 'normal' lineage proportions after transient induction of Gata4 expression. The main conclusions of the manuscript are: i) Gata6 positive cells emerge through short range Fgf4 based cell-cell cummunication. ii) Fgf4 signalling can compensate a wide range of initial levels of Gata6 expression and produce properly portioned cell identities. The authors also state that this mechanism could operate in a range of developing tissues.

      **Major points:**

      1. Fgf4 KOS ESCs are deficient in initiating epiblast lineage differentiation (Kunath 2007). Therefore, the effect studied by the authors might be multifactorial and the general inability of Fgf4 deficient cells to enter differentiation might contribute to the observed differentiation defects and defects of cell fate proportioning. Specifically, it could be expected that Nanog regulation is affected in Fgf4 mutants, although, to my knowledge, the specific phenotype of Fgf4 depletion has not been evaluated in Gata4 induced cell programming towards PrE. What steps have the authors taken to exclude an impact of general cell fate change defects in Fgf4 KO ESCs.

      While it is true that Fgf4 mutant cells have a general deficiency in initiating epiblast lineage differentiation, it was already shown in the original publication by Kunath et al. (PMID: 17660198) that general differentiation of Fgf4 mutant cells is restored to wild type levels by supplementing the culture medium with 5 ng/ml recombinant FGF4. This is a concentration that is well within the range of concentrations applied in our study. In initial experiments to characterize our Fgf4 mutant lines, we have measured NANOG expression to test the effectiveness of recombinant FGF4 to restore epiblast lineage differentiation. We found that FGF4 treatment of Fgf4 mutant cells in the absence of doxycycline induction leads to a downregulation of NANOG expression, to levels comparable to those seen in wild type cells grown in N2B27. These data indicate that treatment with recombinant FGF4 rescues defects of general cell fate change in Fgf4 KO ESCs. We will add these data to Figure S4 of a revised manuscript, and explicitly mention the function of recombinant FGF4 to rescue lineage differentiation potential more generally.

      Increasing the time of Gata4 expression results in increasing levels of Gata4 levels (Fig 1C). This is shown at the overall mean fluorescence level. However, it is important to also quantify how many cells do actually show some increase in Gata4 levels. Fig1D suggests that the number of Gata4 expressing cells is quite similar between 4h and 8h induction, but this needs to be quantified. An explanation for the apparent dosage independence of Gata4 could then be simple threshold effects, such that there is no additional effect of increased Gata4 levels in WT cells without any further requirement of feedback regulation after a certain threshold level of Gata4 is reached. Have the authors considered such a simple model?

      The current version of the manuscript already contains quantifications of GATA4-mCherry expression levels in single cells - see Fig. S2A for the experiments where we vary doxycycline induction time, and Fig. S3B for experiments with independent clonal lines. This analysis confirms the Reviewer’s visual impression of Fig. 1D - the number of GATA4-mCherry expressing cells is similar for different induction times and clonal lines, such that the increase in overall mean fluorescence levels is mainly due to an increase in GATA4-mCherry expression levels in single cells. This analysis therefore rules out the simple model based on threshold effects proposed by the Reviewer. In a revised version of the manuscript, we will more explicitly discuss the quantifications in Fig. S2A and Fig. S3B.

      An important point is that in the current setup distinguishing between dosage effects and effects of extended presence of Gata4 cannot be distinguished. Wouldn't titrating the amount of doxycycline used for induction be a more direct way to achieve different initial levels of Gata4 expression?

      This concern has also been raised by Reviewer #1, and is addressed in detail in our response to their comment above. Briefly, in our opinion this concern is addressed in the current manuscript by the experiments performed in the four clonal lines with independent integrations (Figure S3). Here, the duration of doxycycline induction and hence time of GATA4-mCherry exposure is held constant, such that the only difference between the conditions is GATA4-mCherry dosage. We will discuss this important function of Fig. S3 in a revised version of a manuscript.

      Unfortunately titrating doxycycline does not allow titrating transgene induction levels in a meaningful way, as sub-saturating doses of doxycycline lead to an increased heterogeneity in transgene expression with many non-expressing cells, rather than to reduced expression levels across all cells. See PMID: 17048983 for a possible explanation of this observation.

      Another point the authors should appropriately discuss and consider is that a lack of effect of different doses/durations of Gata4 expression could be due to the fact that by the time Gata6 is induced, the levels of Gata4 in cells previously treated for different periods of time are no longer detectably different. Such a regulation would equally result in indistinguishable cell fate proportioning. Can the authors exclude such a regulation? This is an important point at the heart of the authors conclusion.

      The Reviewer seems to suggest that by separating the initiation of GATA6 expression from the GATA4-mCherry pulse in time, the decision to initiate PrE-like differentiation could be independent from GATA4-mCherry concentration, thus explaining the robust cell type proportions. The data shown in Figs. S2C, S3D and Fig. 3 A - C clearly exclude such a regulation: In conditions where we supply recombinant FGF4, the proportions of the different cell types scale with GATA4-mCherry expression levels, indicating that GATA4-mCherry dose does indeed affect Gata6 expression. In a revised version of the manuscript we will discuss and consider how these observations argue against a model where the decision to initiate PrE-like differentiation occurs independently from GATA4-mCherry levels.

      The authors make some general statements on cell differentiation (e.g. l205). They also claim that the Fgf4-based mechanism of lineage proportioning could act in a range of tissues during development. However, the use of the term differentiation for the induction of PrE-identity (or Gata-factor expression to be exact, see comment below) after Gata4 overexpression is problematic. The system chosen by the authors is entirely artificial. ES cells normally do not differentiate into extraembryonic cell types. It needs to be made clear in the manuscript that they do not study a differentiation process that normally occurs in the embryo or in differentiating ESC cultures. The system the authors are using would, in my opinion, rather qualify as cell programming or transdifferentiation than as differentiation. I suggest presenting the system using clearer unambiguous language and to try to avoid any generalisations based on an artificial transgene-overexpression based system. The results have to be presented with this limitation in mind.

      To address the Reviewer’s concerns regarding terminology, we will expand on the relationship of our system to normal ESC differentiation and lineage specification in the embryo, and discuss its possible limitations. We disagree however with the Reviewer’s assertion that using a transgene-based overexpression system precludes drawing any general conclusions. Rather, the system allows mimicking Epi- and PrE-like differentiation in a uniquely accessible context, and thereby to exploit the molecularly simple regulation of this cell fate decision for studying basic principles of cell differentiation. This view is supported by Reviewer #3 in the referees cross-commenting section below, who emphasizes the value of such models and notes that they are very common in developmental biology.

      It is unclear how 'PrE-like' (as stated e.g. in the abstract) the cells really are after a short pulse of Gata4 expression. No proper characterisation has been performed but needs to be included, if the authors want to term these cells PrE-like.

      A recent study by Amadei et al. (PMID: 33378662) supports the notion that a short pulse of GATA4 expression can trigger bona fide PrE-like differentiation. In this study, the authors induced a similar doxycycline-inducible GATA4 expression system for 6 hours, and observed subsequent differentiation into several PrE derivatives, including the anterior visceral endoderm. In a revised version, we will cite this study to support our claim that the GATA6-positive cells are indeed PrE-like. Additionally, we offer to perform immunostainings with an extended panel of known PrE marker proteins to substantiate the PrE-like character of the GATA6-expressing cells.

      How is the statement in l112 that "The clear separation between the two populations suggests that the increase in the proportion of double negative cells at the expense of GATA6+; NANOG- PrE-like cells beyond 40 h is mostly fueled by the downregulation of NANOG expression in the GATA6-negative cell population, combined with a slower proliferation of the GATA6-positive population, rather than by the reversion of PrE-like into double negative cells." supported by the data?

      We realize from the comments of all three reviewers that this section was confusing and potentially misleading in the original version of the manuscript. In a revision, we will reword this paragraph to better bring out the major conclusions from the GATA6 and NANOG expression patterns shown in Fig. S1A. These data show that the majority of cells belong to one of two discrete clusters from 16 h onwards. The clear separation of the two clusters furthermore indicates that cells rarely switch their gene expression patterns. Given these observations, the changes of cell type proportions reported in Figure S1B can be explained as a consequence of slower proliferation of cells in the GATA6-positive relative to the GATA6-negative cluster. In addition, NANOG expression in the GATA6-negative cluster declines over time, such that progressively more cells are classified as double negative.

      Would the data and modelling performed by the authors be in line with a model in which the decision to express Gata6 is a stochastic choice (with a certain probability based on the levels of Gata4 induction) that is then stabilized and reinforced by Fgf signalling rather than Fgf signalling having an instructive role?

      The simulations shown for the Fgf4 mutant case in Fig. 3 D - G, right column, are based on a model in which the decision to express Gata is a stochastic choice with a probability based on the initial levels of GATA expression, and reinforced by FGF signaling. Thus, our data from the Fgf4 mutant, but not the wild type, are perfectly in line with such a model.

      We realize from the Reviewer’s comment that we have not made sufficiently clear the conceptual differences between the models for the mutant and the wild type case. We suspect that this lack of clarity stems from the fact that the two models rely on the same circuitry, except for the regulatory link between GATA and FGF. This link however makes a crucial difference: It transforms the simple single cell input-output model of the mutant case, which is common to many previous publications, into a population level model with cell-cell feedback which shows new emergent behavior. And only this population level model, but not the single cell model for the Fgf4 mutant, can recapitulate the experimental data observed in the wild type. In a revised version of the manuscript we will expand on these crucial differences when describing the model and data in Fig. 3.

      The statement in line 187 "This indicates that GATA4-mCherry expression negatively regulates FGF4 signaling during cell type specification." is not supported by the data. The authors show only a correlation and actually correctly say so in line 195.

      Prompted by the comments of both Reviewer #1 and #3, we will carry out experiments to mechanistically explore the regulation of Fgf4 expression by GATA factors (see our response to Reviewer #1 above for a detailed description). Depending on the outcome of these experiments we will reword this statement.

      In Fig 2F statistical analysis between the re-seeded conditions is required for the conclusion that "the proportion of PrE-like cells systematically increased with cell density". Replating itself appears to quite drastically impact lineage distribution. Do the authors have an explanation for this?

      The p-value in line 221 of the original manuscript refers to a test for a linear trend between the three conditions following a one-way ANOVA in GraphPad Prism. We apologize that this has not been made clear and will add this information in a revised version.

      The observation that replating drastically impacts lineage distribution is perfectly in line with the overall conclusion from this section, namely that FGF signaling is enhanced by cell-cell contacts. Replating strongly reduces the number of direct cell-cell contacts by disrupting the colony structure of the culture. Thus it is expected that the proportion of the PrE-like cells - which require exposure to FGF ligands - is reduced under these conditions compared to the condition that has not been replated. We will discuss this explanation in a revision.

      Fig 2G shows a key experiment illustrating the local effect of Fgf4 expression on first and second neighbours. The authors have investigated this effect using a Fgf-signalling reporter. Why did they not assay Gata6 expression in this assay instead of a Spry reporter? This would be the experiment to show that also Gata6 expressing cells (after transient Gata4 induction) are clustered around Fgf4 producing cells and be a strong piece of evidence to show that local Fgf4 signalling and cell-cell communication is indeed involved in cell identity proportioning. The cell lines required for this experiment (including Fgf4 mutant Gata4 inducible ESCs) appear to be available.

      We decided to measure the FGF4 signaling range with a Spry4:H2B-Venus reporter because its response time is faster than that of GATA6 expression during differentiation. Furthermore, the Spry4:H2B-Venus reporter provides a quantitative readout for FGF4 signaling, in contrast to a binary read-out that would be expected for GATA6 expression. We will be happy to discuss these considerations in a revised manuscript.

      We agree that measuring FGF4 signaling range with Fgf4 mutant Gata4-mCherry inducible cells as suggested by the Reviewer constitutes a complementary approach to further corroborate the role of local FGF4 signaling in cell differentiation. However, we would like to stress that our demonstration of local FGF4 signaling is already supported by two fully orthogonal quantitative experiments, one relying on cell replating and the other one relying on the signalling reporter. The concept of local signaling is further supported by our quantitative analysis of the spatial arrangement of cell types in Fig. 4. The additional experiment suggested by the Reviewer is therefore unlikely to substantially change the paper’s conclusions, as also pointed out by Reviewer #3 in the referees cross-commenting section. Therefore, we offer to perform this experiment for a revision, but would like to seek the editor’s opinion if this is deemed necessary to make the paper acceptable for publication.

      The authors conclude from data in Fig 3A that proper cell type proportioning depends on initial Gata4 levels in Fgf4 mutants, in contrast to WT cells where the initial levels appear more irrelevant. Is 10ng/ml too high a dose? Would using a lower concentration (such as ~2ng/ml suggested by Fig 2D to give WT-like distribution) result in a complete rescue of cell lineage proportioning in this assay? Formally a control of adding additional Fgf4 to WT cells will also ne needed to control for a potential effect of exogenous Fgf4 addition.

      In our initial characterization of the Fgf4 mutant cell lines, we have performed experiments where we examined cell type proportions upon culture in the presence of different doses of FGF4 following doxycycline induction times between 1 h and 8 h. These experiments confirm the suspicion of the Reviewer that cell type proportions similar to the wild type can be obtained with a lower dose of 2.5 ng/ml FGF4 after 8 h of induction. For shorter induction times followed by differentiation in the presence of 2.5 ng/ml FGF4 however, cell type proportions were strongly skewed towards Epiblast-like cells. These data thus further support the major conclusion from Fig. 3A quoted by the Reviewer: Proper cell type proportioning in Fgf4 mutants depends on GATA4 levels, and this behavior is independent from the FGF4 concentration applied. We offer to add this data to a revised manuscript.

      The effects of adding FGF4 to wild type cells are shown in Fig. S2C and S3D in the current version of the manuscript. This control has been performed in all experiments shown in Fig. 3A - C, but we decided to omit it for clarity. We are happy to add this information back in as requested by the Reviewer.

      Does the model in Fig 3E consider potentially varying doses of exogenous Fgf4? Can the model also predict what happens if Fgf4 is added to WT cells, as suggested above as control? In general, the value of this model is unclear. Figure 3E is near impossible to understand, no quantitative information is given.

      The model in Fig. 3E can of course be simulated with different doses of exogenous FGF4. These simulations recapitulate the experimental results described under point 10 above: Cell type proportions for the Fgf4 mutant case are skewed towards NANOG-positive cells at lower FGF4 doses, and vary with initial conditions irrespective of FGF4 dose. We offer to show the results of these simulations in a revised manuscript alongside the experimental data discussed above.

      It is also possible to incorporate into the model addition of exogenous FGF4 to the wild type. Simulations of this condition confirm the experimentally observed increase in PrE-like cells shown in Fig. S2C and S3D of the current manuscript.

      To help the reader digest Fig. 3E, we will add separating lines similar to the gates of the flow cytometry data in panel A, and indicate the proportion of cells in the respective quadrants.

      The Reviewer’s comment that the value of the model is unclear indicates to us that we have not explained in sufficient detail the conceptual differences between the behavior of the model of the wild type and the mutant case. As detailed in our response to Reviewer’s comment 6. above, we will rewrite the text to bring out more clearly the insight that the model brings.

      Fig4A: What were WT and Fgf4 mutant cells treated differently in this assay (8h vs 4h, respectively)?

      The spatial arrangement of cell types in Fgf4 mutant cells has been assayed in two conditions that give similar cell type proportions as seen in the wild type, as motivated in lines 366 - 370 of the current manuscript. We decided to show the condition with 4 h induction followed by differentiation in the presence of 10 ng/ml FGF4 in the main Figure 4 because it is most similar to the condition that gives wild-type like cell type proportions in the Fgf4 mutant shown in the immediately preceding main Figure 3, while the condition that uses 8 h induction followed by differentiation in the presence of 2.5 ng/ml FGF4 refers back to the main Figure 2. We show both primary data and the complete analysis for the latter condition in Figures S8D and S10. Fig. S10 provides a direct comparison between the two conditions and clearly demonstrates that they show similar dynamics. We do not think that exchanging the two datasets between main and supplementary Figures will add value to the manuscript.

      Does the interpretation that at 24h there is a difference in Fig 4C survive statistical scrutiny? Only few datapoints are shown and any apparent differences seem due to outliers rather than a shift in cluster radii. How often were these experiments independently repeated? This information is missing. In Fig 4B, I cannot appreciate any difference between cell lines.

      We will perform statistical testing to assess whether the spatial arrangement of cell types is significantly different between the time points, and mention the results in the text.

      To evaluate the spatial arrangement of cell types, we have performed two independent experiments in the wild type, and analyzed two conditions for the mutant case. In each experiment, we have analyzed at least eight positions per condition and control. Spatial clustering of wild type cells at 40 h is also observed in earlier Figures in the manuscript (e.g. Fig. 1D, S2B, S3C).

      The similarities between wild type and Fgf4 mutant cells shown in Fig. 4B are not surprising and fully in line with the data shown in panel C, which shows that differences between time points are much more pronounced compared to the differences between genotypes. However, we realize that the micrographs and analysis plots in Fig. 4A and B were perhaps not fully representative for the aggregate behavior shown in panel C. In a revision, we will therefore show data from more representative colonies in panels A and B.

      **Minor points:**

      a) More information on statistics should be given in the Figures and legends.

      To address this concern we will perform statistical tests for differences in proportions of the main cell types in Figures 1D and 3C. In addition, we will perform statistical testing on Fig. 4C as detailed in point 13 above.

      b) Percentages should be indicated in the quadrants of the FACS plots of Fig 3A and E.

      This is a good suggestion, we will add this information. See also our response to point 11 above.

      c) What is the underlying evidence for the statement: "The specification of Epi- and PrE-like cells in ESCs shows both molecular and functional parallels to the patterning of the ICM of the mouse preimplantation embryo."

      In the current manuscript, this statement is further substantiated in the subsequent paragraph (lines 483 - 503). We realize that this order is potentially confusing and will change it. We will further modify this section as part of our response to major point 3. above.

      d) Fig 5C is difficult to interpret without a comprehensive decoding of colour information.

      To facilitate interpretation of this panel, we will add a legend to decode the colour information of the traces (purple: VNPhigh, cyan: VNPlow)

      Reviewer #2 (Significance (Required)):

      This manuscript provides novel insights into the role of Fgf-mediated cell-cell communication to establish proper ratios of cell identities in a PrE-induction system. The authors provide some interesting data and interpretation. Overall, the significance is slightly impaired by the highly artificial nature of the studied cell fate specification event.

      This manuscript will be of interest to readers working on early embryonic cell fate decision as well as researchers working on modelling of cellular processes.

      My expertise lies in the field of cell fate decision and pluripotency.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      It is well established that FGF signalling plays a role in the partitioning of the Primitive Endoderm and Epiblast fates during preimplantation mammalian development. Recent work has shown that this fate decisions is associated with a mechanism that is able to maintain the proportions of the two fates stable in the face of perturbations. Here, the authors address this mechanism and show that it is dependent on FGF signalling and associated with the fate decision. In the process they suggest and test a novel mechanism based on short range FGF signalling. A series of carefully designed and executed experiments, refine and provide evidence for the model. This is an original and important piece of work that will influence the field of pattern formation.

      Overall the manuscript is well written but, at least from the perspective of this reviewer, there are places in which clarity can be improved.

      Lines 104 and ff: the description of the dynamics of the different populations fater the GATA4 pulse, can be clarified. The reference to the double negative population emerging from the PrEnd population is not clear. It is stated that the proportion of these cells increased continuously and it said to be at the expense of the decrease of the PrEnd population whose variation is referred to as 'slightly declined". How can a slight decline fuel a steady increase in the double negative?

      Also, what are these double negative? Could they be cells differentiating into embryonic lineages?

      We realize from the comments of all three Reviewers on this paragraph that it was confusing and potentially misleading in the original manuscript. In a revised version we will rewrite this section to clarify our interpretation of the data in Fig. S1. First, the clear separation of the two clusters observed in NANOG-GATA6 expression space indicates that cells rarely switch between the two clusters. Then, a likely explanation for the slow decline in the fraction of GATA6-positive cells is a slower proliferation compared to the GATA6-negative cells. Third, the increase in the proportion of double negative cells is caused by a progressive downregulation of NANOG expression in the GATA6-negative cluster. These NANOG expression dynamics are consistent with NANOG expression dynamics in epiblast cells of the embryo, and could indeed indicate differentiation towards embryonic lineages. We will mention this possibility in a revised manuscript.

      See also our response to Reviewer #1 and Reviewer #2, point 5..

      In Figure 1 and its discussion, it would be good to see a representation of the stability of the final proportions relative to the different initial conditions, a variation on 1E.

      This is a good suggestion. In a revised version, we plan to add a panel to Fig. 1 in which we plot the final proportions of the different lineages versus the GATA4-mCherry expression levels for the different induction times. This will illustrate more clearly that the final proportions of cell types are largely independent from the initial conditions.

      Paragraph lines 182 and ff: the report that GATA4 expression is able to suppress FGF4 signalling, autonomously is, at least for this reviewer, a novel and important result and one that impinges on the understanding of the process. The authors should emphasize this.

      We agree with the Reviewer that the direct regulation of Fgf4 expression through GATA factors is a new regulatory link suggested by our data that has not been described before and that is crucial for the functioning of the system. Prompted by a similar comment of Reviewer #1 above, we offer to further explore the mechanistic basis of this link through an analysis of published ChIPseq data, functional studies of a GATA binding site upstream of the Fgf4 start codon, or a more detailed temporal dissection of NANOG, GATA and Fgf4 expression dynamics following doxycycline induction (see our response to Reviewer #1 above for more details). These new experiments and analyses will allow us to emphasize this novel result, and thereby significantly strengthen our paper.

      Paragraph lines 274 and ff (section on the involvement of FGF4 in the robustness of the process) needs some explanations. The derivation of the conclusion that 'recursive communication vis FGF4 underlies a population-level phenotype ...characterized by the differentiation of robust proportions of cell types..." from the experiments requires some unwrapping. It would be helpful if the authors could reason how the conclusion follows from the experiments.

      We realize from this Reviewer’s comment and the comments of Reviewer #2 above that we have not explained well enough how the results shown in Fig. 3 A-C (lines 274 - 283) lead to our conclusion of emergent behavior, which are then further substantiated in the modelling in panels D - G. The central conclusion of this paragraph rests on the observation that cell type proportions are dependent on initial conditions in the Fgf4 mutant, but not in wild type cells. As we had supplied FGF4 externally to the Fgf4 mutant cells, the only difference between these two conditions is that FGF4 dose in wild type cells is regulated by the cell population, i.e. cells can communicate via FGF4, whereas mutant cells cannot. We will expand on this line of reasoning, and also explain in more detail the differences in the models for the mutant case and the wild type, which we believe will help to conceptualize the experimental results. See also our response to Reviewer #2, points 6. and 11..

      Their model does not seem to include the commonly agreed regulatory interaction between Nanog and FGF4, at least not directly, and it would be helpful if a reasoning could be provided for this decision.

      A discussion of the regulatory interaction between NANOG and Fgf4 has also been requested by Reviewer #1. In our response to their point above, we provide a reasoning why we have omitted it in the current manuscript. Briefly, our decision not to include a direct positive link between NANOG and Fgf4 expression rests on our observation that Fgf4 mRNA continues to be expressed 2 days after switching cells from 2i + LIF medium to N2B27, a time at which NANOG already starts to be downregulated as a consequence of differentiation along embryonic lineages. We will add this data to a revised manuscript. In addition, we propose above to dissect in more detail the temporal sequence of GATA4-mCherry, Fgf4 and NANOG expression upon doxycycline induction. This analysis will provide further information about the role of NANOG for Fgf4 mRNA expression in ESCs.

      Reviewer #3 (Significance (Required)):

      In this manuscript, Raina and colleagues use an Embryonic Stem (ES) cell based experimental system to address a central problem in developmental biology, namely the emergence of stable scaled populations of different cell fates. The experiments are elegant in design, carefully executed and the effort provides a solution to the problem: a novel mechanism based on short range FGF signalling that provides homeostatic control of relative cell populations. This is an important piece of work with sound conclusions that establishes a new paradigm in pattern formation whose implications are likely to lead to a reassessment of the role of FGF in different patterning paradigms. The experiments are quantitative and supported by a modelling effort based on a theoretical piece of work (Stanoev et al. 2021) which underpins the conclusion.

      This manuscript will appeal to a wide audience including developmental and stem cell biologists as well as modellers.

      My expertise cover the areas addressed in the manuscript.

      **Referees cross-commenting**

      It looks as if, with some nuances, we all agree on the value of the work. I do not have any issues with the comments of Reviewer 1, though I disagree that the model tested and improved here is similar to existing ones. While it is true that this work is related to a theory paper by some of the authors, the experimental test and resulting conclusions are very important. On the other hand, I am very surprised by the comments of Reviewer 2 who, after conceding the value and potential significance of the work, raises a list of queries, largely small details and opinions rather than points of substantial concerns, hinting at a need for the authors to perform extra work and analysis that will not change the conclusions of the manuscript. Some of this e.g. #9 would be a nice piece of additional evidence, but more an adornment than a necessary piece of additional evidence. The main problem of this reviewer is the lack of appreciation of what they define as 'highly artificial nature' of the study without providing any reason for why such experiments (very common in developmental biology) can lead to misleading conclusions. It seems to me that most, if not all, of their significant concerns can be dealt with in a rebuttal or by altering the text, either to discuss the issues raised, to clarify the points or qualify the conclusions.

  6. Apr 2021
    1. SciScore for 10.1101/2021.04.26.21255801: (What is this?)

      Please note, not all rigor criteria are appropriate for all manuscripts.

      Table 1: Rigor

      NIH rigor criteria are not applicable to paper type.

      Table 2: Resources

      <table><tr><th style="min-width:100px;text-align:center; padding-top:4px;" colspan="2">Software and Algorithms</th></tr><tr><td style="min-width:100px;text=align:center">Sentences</td><td style="min-width:100px;text-align:center">Resources</td></tr><tr><td style="min-width:100px;vertical-align:top;border-bottom:1px solid lightgray">We operationalize the intensity of this intervention by means of the mobility index produced by Google (xm), specifically the average of the transit and workplaces indexes as previously explained(16).</td><td style="min-width:100px;border-bottom:1px solid lightgray"><div style="margin-bottom:8px"><div>Google</div><div>suggested: (Google, RRID:SCR_017097)</div></div></td></tr></table>

      Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


      Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
      However, we think that these cases are out of the mainstream of the epidemic, and they pose no serious limitation to the model. On the other hand, the proposed model is highly flexible, allowing for both completely asymptomatic and mild symptomatic cases, that are thought to play a significant role in the SARS-CoV2 epidemic. Among the limitations of our research, we have to mention the data quality. Changes in real-time data due to corrections, poor data-quality and slow reporting may affect therefore the assumptions of our model. Under-reporting due to slow data processing, restrictive testing policies and lack of testing availability impacted on the cumulative number of cases acknowledged by official data sources. While we use Russell’s method to correct for this, we are introducing potential limitations of Russell’s method into our model. Also regarding data quality, imported cases also introduce uncertainty in the model, and data is not as granular as it is required to account for that. Another limitation is that in our model, 81% of the infections are asymptomatic. While as mentioned previously, some local data shows that for every PCR-diagnosed case there were 9 IgG SARS CoV-2 positive individuals that had not been diagnosed during the outbreak in a very poor neighborhood in the City of Buenos Aires (30) reaching 50-60% seroprevalence, studies conducted in Europe after local outbreaks show 15 to 20% seroprevalence (31). Further research is required to understand if this...

      Results from TrialIdentifier: No clinical trial numbers were referenced.


      Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


      Results from JetFighter: We did not find any issues relating to colormaps.


      Results from rtransparent:
      • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
      • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
      • No protocol registration statement was detected.

      Results from scite Reference Check: We found no unreliable references.


      <footer>

      About SciScore

      SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.

      </footer>

    1. Author Response:

      Reviewer #1:

      Guo et al. describes interesting experiments recording from various sites along a cortico-cerebellar loop involved in limb control. Using neuropixels recordings in motor cortex, pontine nuclei, cerebellar cortex and nuclei, the authors amass a large physiological dataset during a cued reach-to-grasp task in mice. In addition to these data, the authors 'ping' the system with optogenetic activation of pontocerebellar neurons, asking how activity introduced at this node of the loop propagates through the cerebellum to cortex and influences reaching. From these experiments they conclude the following: the cerebellum transforms activity originating in the pontine nuclei, this activity is not sufficient to initiate reaches, and supports the long standing view that the cerebellum 'fine tunes' movement, since reaches are dysmetric in response to pontine stimulation. Overall these data are novel, of high quality, and will be of interest to a variety of neuroscientists. As detailed below however, I think these data could provide much more insight than they currently do. Thus below I provide some suggestions on improving the manuscript.

      1) Since the loop is the focus of this study, it would be nice if the authors better characterized latencies of responsivity to pontine stimulation through the loop, to address how cortically derived information routed to the cerebellum may loop back to influence cortical function. In the data provided, we know that pontine stimulation modulates Purkinje and deep nuclear firing (but latency to responses are not transparently provided in the main text, if anywhere), while motor cortical responses peak at 120 ms (after stimulus onset?, unclear), and that this responsivity is preferentially observed in neurons engaged early in the reaching movement. Is the idea, then, that cortical activity early in the reach is further modulated by cerebellar processing to (Re) influence that same cortical population? Does this interpretation align with the duration of reaches, the duration of early responsive activity during reach, and the latency of responsivity; or is the idea that independent information from other modalities entering the pontine nuclei modulates early cells? Latency to respond at the different nodes, might aid in thinking through what these data mean for the function of the loop.

      We thank the reviewer for this important suggestion, and we have now added measurements of the latency from the onset of sinusoidal PN stimulation to neural responses in Purkinje cells, DCN neurons, and motor cortex (Supplemental Fig. 7), and observe a progressive recruitment of laser-evoked spiking along this pathway. There is a tradeoff between temporal resolution (which increases with decreasing bin width) and statistical power (which decreases with decreasing bin width), and we have opted to use 10 ms bins in a sliding window, which provides a reasonable compromise between these criteria. Although we potentially detect fewer tagged neurons at shorter latencies than we would with larger bins, this approach enables us to detect the timing of the earliest responses (defined as the earliest time point at which 5% of the neurons eventually recruited are responsive). Note that the sinusoidal stimulation used in these experiments is not ideal for latency measurements, as it takes 6.25 ms for the laser to reach peak power. We have also added a similar analysis for the response latency of PN neurons to pulse train stimulation of motor cortex (Supplemental Fig. 1). Based on these analysis, our estimate of the delay for signals to propagate across the entire loop is 26 ms: PN to motor cortex (21 ms) + motor cortex to PN (5 ms). Given that the movement duration (lift-to-grab) is approximately 110 ms on average, this would allow ~4 full feedback cycles throughout the reach. Thus, these delays are consistent with the possibility that cortical activity during planning or early in the reach is further modulated by cerebellar processing to influence that same cortical population later in the reach. Regarding the earliest motor cortical responses that we observe in PN-tagged units, it's possible that they may result from ponto-cerebellar input driven by other cortical regions. Alternatively, the responses of motor cortical neurons early in the movement may be driven more directly by other cortical areas or the basal ganglia, but these early-responding neurons may also receive strong ponto-cerebellar input due to plasticity during development or learning.

      2) Many of the figures need work to aid interpretation. Axis labels are often missing (eg 2F); color keys are often unlabeled (2F); color gradients often used but significance thresholds are hard to evaluate (using same colors for z scores and control / laser is confusing 6, 8); and within-figure keys would be useful (5D-h). These issues occur throughout the manuscript.

      We have added the axis and color labels in Fig. 2F, and have added additional annotation throughout the main and supplemental figures. For firing rate z-score heatmaps, we have kept the gray color scale for control and laser to facilitate direct comparison between the panels, but have added orange and blue boxes around the heatmaps in Fig. 6, 7, S8, and S9 to emphasize that they reflect different experimental conditions.

      3) Relatedly, but also conceptually, Figure 3B has particular issues, such as identifying where the neuropixel multiunit activity is coming from. I assume that in the gray boxes illustrating the spatio-temporal profile of spiking band activity that the lower part of the box is the ventral direction, upper, dorsal. This is not spelled out. From the two examples it would seem that the spiking band is in different places in the cerebellum, undermining, I think, the objective of the figure. It would be sensible to revisit this entire figure to identify the key takeaways and design figures around those ideas. As it stands, these examples appear anecdotal. Consider moving this to a supplement. Powerband density strength is missing an axis. More importantly, it would be nice to corroborate the interpretation of the MUA with the single unit recordings, since the idea is that many neurons are entraining to the PN activity. Yet, the examples don't seem particularly entrained. Is the activity being picked up on just axonal firing of the PN axons? Fourier analysis of spiking of isolated neurons in cerebellum should be used to corroborate the idea that cerebellar neurons are entraining, rather than the neuropixel picking up entrained PN axons.

      To examine spike entrainment to the 40 Hz PN stimulation for Purkinje cells and DCN neurons, we computed the phase of sinusoidal stimulation coinciding with each individual spike. If a neuron is entrained to the stimulation, the phase distribution for its spikes will differ from the uniform distribution on the circle; this can be assessed for each cell using a Rayleigh test. Furthermore, we can calculate the strength of entrainment and preferred phase by calculating the magnitude and angle of the mean resultant for each cell. If a neuron’s spikes are completely unrelated to the stimulation phase, the mean resultant length will tend to 0 as the number of spikes observed goes to infinity. If, on the other hand, a neuron is completely entrained (with every spike occurring at exactly the same phase), the mean resultant length will be 1. This approach is illustrated schematically in Supplemental Fig. 6A.

      This new analysis revealed two key features of the data we had not previously appreciated. First, it revealed PN-stimulation-induced changes in neural activity that were not apparent from the mean firing rate profiles: most Purkinje cells and DCN neurons were significantly entrained to the 40 Hz stimulation. Second, the entrainment strength was higher in the DCN than Purkinje cells (Supplemental Fig. 6B-D), suggesting the corticonuclear pathway amplifies the rhythmic input. This result is strikingly similar to published observations obtained from slice electrophysiology and anesthetized mice (Person & Raman, 2012), which we now discuss in the text. It is also possible that direct excitation from PN collaterals contributes to the DCN entrainment.

      We agree that the original analysis of multiunit activity is difficult to interpret, for two reasons: (1) the signal likely reflects the combined contribution of multiple cell types, including pontine mossy fiber terminals, and (2) the depth profile will differ for different electrode penetrations, due to the geometry of the cerebellar cortex. Furthermore, this analysis is largely redundant, since we have recorded from individual Purkinje cells and added new analyses demonstrating their entrainment to the 40 Hz stimulation (Supplemental Fig. 6). We have now moved this figure to the supplement and added labels to all axes (Supplemental Fig. 3).

      4) The use of the GLM is puzzling. In addressing the question of how cerebellum and motor cortex interact (from the Abstract, "how and why" do these regions interact) it is unclear why these regions are treated separately. I would have expected some kind of joint GLM where DCN activity is used to predict M1 variance (5 co-recordings are reported but nothing to analyze?); or where DCN + M1 activity is used to decode kinematics to see if it is better than one or the other alone. As it stands, we learn that there is more kinematic information in the motor cortex than in DCN. This is not necessarily surprising given previous literature on cerebellar contributions to reaching movements. In principle the idea that 'PN stimulation might perturb reaching kinematics through descending projections to the spinal cord, or by altering activity in motor cortex' is treated as mutually exclusive outcomes, though it is highly unlike to be so.' Analyzing M1+DCN together could address whether DCN activity adds nothing to decoding kinematics that isn't there in M1 or adds something that M1 does not have access to. The main point here is that the physiological datasets could be better leveraged with these fits to derive insight into the interactions of the loop. R2 should be provided in the GLMs (Fig 8) to assess statistically how well they perform relative to one another, not just correlations between the two.

      We have added two additional analyses to address these questions. First, in addition to motor cortex-based and DCN-based decoders for all sessions (Fig.8 and Supp. Fig.12A-D, G-H; all the R2 values are reported in Supp. Fig. 12C-D, G-H) we now also train a decoder using both motor cortical and DCN multiunit activity in sessions with simultaneous recordings (Supp. Fig.12E-F, I-J). When we train only on control trials, the decoder performs about equally well with or without the DCN multi-units for control trials (Supplemental Fig. 12E), but performs slightly worse on laser trials in comparison to using only cortical data (Supplemental Fig. 12F). When we train on both control and laser trials, adding DCN multi-units slightly degrades decoding performance on both control and laser trials in 3 out of 5 sessions (Supplemental Fig. 12I-J). Based on this comparison, it does not appear that DCN contributes kinematic information that is not already present in cortex. However, there are several cautionary notes to consider in interpreting these results. (1) This dataset consist of only 5 sessions, in all of which the recording yield in DCN was not as high as in cortex, so it is possible that dimensions of activity unique to DCN may not have been sampled enough in these experiments. (2) Our task involves only a single reaching target (in comparison to, e.g., center-out reaching tasks with eight targets which are possible in primates) so we cannot assess whether DCN contains directional-specific kinematic information not present in cortex. Thus, in light of these factors, it is difficult to draw strong conclusions from our experiments about differences in kinematic information between motor cortex or DCN. A more rigorous comparison requires carefully controlled experiments with many reaching targets, as in Fortier, Smith, & Kalaska (1993).

      Second, we have added an additional analysis to determine how predictive cortical activity is of DCN activity at the single-trial level, and vice versa. We considered several possible statistical approaches to this issue. Computing pairwise correlations of neurons in the cortex and DCN would be one possible method, but the outcome of this analysis would be difficult to interpret, as the sign and timing of firing rate peaks will vary across neurons. Another approach would be to regress principal component scores in one region - or their derivatives, as in Sauerbrei et al., 2020 - on the scores in another region. However, because cortex and DCN are bidirectionally connected, the choice of which region’s scores should be considered as the dependent variables is ambiguous, and this approach will merely “align” activity in one region (as a projection onto regression coefficients) with activity in the other. Ideally, we would like to find simultaneous linear transformations of both cortical and DCN activity that would maximally “align” them with one another, and to compute the correlations of the aligned neural trajectories. This is precisely what canonical correlation analysis (CCA) does, and CCA has been used increasingly in recent years to align population activity from different brain regions or samples - e.g., Lara et al., Nat. Comm. (2018), Perich et al., Neuron (2020), and Gallego et al., Nat. Neuro (2020). We took this approach with our simultaneous recordings of multiunit activity in the motor cortex and DCN, and found that:

      (a) In each of the 5 sessions, CCA found two pairs of canonical variates that were strongly correlated (Supplemental Fig. 11A, first two columns; Supplemental Fig. 11B, correlations in the range 0.58-0.88 for the first two canonical variates), and two pairs of canonical variates weakly correlated (Supplemental Fig. 11B, correlations <0.27 for the last two canonical variates)

      (b) The first two canonical variates accounted for half or more of the variance in each region (49%-64% in cortex, 51%-70% in DCN; Supplemental Fig. 11C, left column)

      (c) Between a quarter and a half of the variance in each region was accounted for by canonical variates in the other region (25%-50% of variance in DCN explained by cortex, 26%-47% in cortex explained by DCN; Supplemental Fig. 11C, right column)

      From these results we conclude that, within the constraints of our behavioral task, some but not all of the dominant dimensions of cortical and cerebellar activity are strongly correlated. We also performed additional CCA analyses using only laser trials or only control trials, to assess whether PN perturbation strongly affected the similarity in population activity between the two regions, but found limited differences between the results of the two analyses (Supplemental Fig. 11D).

      Reviewer #2:

      Guo et al examine the cortico-cerebellar loop during skilled forelimb movements in mice. The authors use optogenetic stimulation of the pontine nuclei (PN) and recordings in PN, cerebellar cortex, cerebellar nuclei (DCN), and motor cortex to show that PN output is transformed into a variety of activity patterns at different stages of the cortico-cerebellar loop. Stimulation only slightly alters movement-related activity in these structures and degrades movement accuracy. The authors conclude that the cortico-cerebellar loop fine tunes dexterous movement. The study is technically impressive, employing recordings in 4 brain regions, and recordings during optogenetic manipulations and behavior. The experiments are well done and the analyses are appropriate. The comparison across brain regions is comprehensive. The results that PN perturbation alters skilled movement and the perturbed activity could predict perturbed movement are important. The study adds to a long line of work supporting the view that the cortico-cerebellar pathway is required for fine motor control. I have a few comments on the interpretation and analysis which I believe could be addressed with changes to the text and additional analysis.

      1) The authors conclude that the cortico-cerebellar loop "does not drive movement" but "fine tunes" the movement. While I generally agree with this interpretation, I wonder if the authors could flush out the concepts of "driving movement execution" vs. "fine-tuning movement" more clearly. Do authors consider them separate processes? How can they be disentangled? I also feel the data on its own has some limitations that should be considered or discussed. First, the data shows that PN stimulation degrades movement accuracy. However, this does not yet reveal the function of the cerebellar loop in fine motor control. Certain places in the text makes stronger assertions (for example, "cortico-cerebellar loop fine-tunes movement parameters") that I feel the data does not support. It is not clear from the data how the loop tunes movement parameters. Second, Fig. 5F shows that stimulating PN blocked movement initiation in some sessions (this is also mentioned in the Methods). Could the authors consider the possibility that stimulating PN at a higher intensity might block movement? This is related to the distinction between "driving" vs. "fine-tuning" movement. At the very least, the authors should discuss these limitations and possibilities.

      In our view, the claim that a brain area drives reaching means that it is necessary for generating the large changes in muscle activity that set the limb in motion towards the target. The claim that a brain area fine-tunes reaching means that it is necessary for generating smaller changes in muscle activity that subtly adjust the limb trajectory and enable precise and accurate behavior. Previous work has demonstrated that motor cortex drives reaching: if it is transiently silenced, the initiation of reaching is robustly blocked (see Guo et al. 2015, Sauerbrei et al. 2020, and Galinanes et al. 2018). In the present manuscript, we show that perturbation of the PN has a very different effect: mice are usually able to initiate reaching, but they are less skillful (the success rate drops), slower (movement duration increases), and less precise (endpoint standard deviation increases). Our interpretation of these results is that while the total output of cortex drives movement (likely through corticospinal and cortico-reticulospinal routes), the cortico-cerebellar loop makes more subtle adjustments to the ongoing movement; that is, it fine- tunes. We have updated the text (in particular, the Abstract, Introduction par. 1, and Discussion par. 1-2) to clarify the distinction between driving and fine-tuning.

      We agree that several interpretive statements in the previous version (especially concluding sentences at the end of some Results paragraphs) were not clearly connected with the data, and we have removed or modified these statements. We now lay out our interpretation of the data as evidence for a cortico-cerebellar contribution to fine-tuning, rather than driving, in the first two paragraphs of the Discussion, but emphasis that this is an interpretation, rather than a direct description of the data. We have also changed the title to more directly state our experimental observations.

      We now mention the possibility that stronger stimulation or inactivation of PN neurons might have robustly blocked movement, and also mention several experimental variables which might have contributed to animal-to-animal variability in behavioral effects: “It is possible that the variability of behavioral effects ...” (Discussion).

      2) Related to point 1, in Fig. 5F, for stimulation trials in which mice failed to initiate movement, did mice fail to move altogether, or did they move in an abnormal fashion?

      We have added a new video documenting the behavior of the animal with the largest blocking effect from PN stimulation (supplemental video 2). This animal does not struggle through a partial reach, but fails to initiate movement. Small movements of the arm occurred (this also occurred in control trials), but these were not tightly synchronized with the onset of the laser across trials.

      3) In the abstract, the authors state that PN stimulation is "reduced to transient excitation in motor cortex". Also in the results (page 5) and discussion (page 8), "pontine stimulation only led to increases in cortical firing rates". These statements are based on the comparison between Fig 3D, 3F, and 4B. But I think the current presentation is somewhat misleading. First, Fig 3D, 3F, and 4B use different neuron selections that make direct comparison difficult. Fig 3 shows all neuron from Purkinje cell and DCN recordings. Fig 4B shows only PN-tagged motor cortex neurons. Furthermore, based on the methods description, it appears that PN-tagged neurons were defined using one-sided sign-rank test. Since the test is one tailed, does that mean neurons shown in Fig 4B are, by definition, neurons significantly excited by photostimulation? Looking at Fig 4B and 4C closely, there appear to be neurons suppressed by PN stimulation. Could the authors organize the rows in Fig 4 in the same way as Fig 3, where neurons that show suppression are grouped together?

      We now display the PN stimulation-aligned firing rates in the same format for Purkinje cells (Fig. 3B), DCN neurons (Fig. 3D), and motor cortical cells (Fig. 4A, lower), with all neurons in a single panel, sorted by response magnitude, for each area. The dominant response pattern in the cortical population is a transient firing rate increase, and this is more readily apparent with the new panel in Fig. 4A (lower). We also use a two-tailed test (which has slightly less statistical power, but allows us to test for both firing rate increases and decreases) for the identification of PN-tagged cortical neurons, and display neurons with stimulation-locked increases (n = 94) and decreases (n = 13) separately (Fig. 4B). In Fig. 4B-C, we still sort the neurons by their reach- related responses, as this reveals a difference in lift-aligned patterns between tagged and non- tagged neurons, which would be masked if we ordered according to stimulation-aligned responses. In Fig. 4D-E, we pool neurons with PN-stimulation-aligned increases and decreases into a “PN-tagged” group, as the small number of stimulation-aligned decreasing neurons (n = 13) does not allow adequate statistical power for a 3x3 contingency table test or for within-group averaging of lift-aligned firing rates.

      4) Fig 7 shows that PN stimulation has only subtle effects on movement-related activity in motor cortex. However, only a small portion (1/8) of the motor cortex neurons show modulation to PN stimulation. Fig 7 shows all neurons. Would the results look similar for PN-tagged neurons?

      We have added a new analysis to address this question, shown in Supplemental Fig. 10. The laser - control difference in lift-aligned activity are indeed larger for PN-tagged neurons; however, the largest peak in this difference occurs before lift, when the laser has been turned on, but the animal hasn’t started to move (Supplemental Fig. 10C).

      5) Page 3 "Our observation that the activity of some motor cortex-recipient PN neurons is aligned both to the cue and movement suggests that these neurons might integrate signals of multiple modalities." Presumably, motor cortex neurons also have cue and movement-related activity and PN simply inherits this activity from the motor cortex.

      As described in our response to the first reviewer’s seventh comment, we cannot conclude that the cue-related responses in the PN are inherited entirely from motor cortex. Briefly, (1) it has been difficult for us to reliably disassociate cue and movement responses for individual motor cortical cells (for instance, the GLM approach we took with PN neurons resulted in very poor model fits when applied to cortical cells), though our previous work has suggested that at the population level, the dominant signal in motor cortex is aligned to movement onset. To reliably disentangle cue and movement responses in cortex, we would need to train mice to wait for a relatively long and variable delay period before reaching. (2) The PN receive convergent input from many cortical areas, and there is likely a convergence of multiple inputs onto the motor- cortex-tagged PN units (c.f. the convergence of inputs from visual and somatosensory cortex onto individual PN neurons in rats reported in Potter, Ruegg, & Wiesendanger,1978). Hence it is possible (if not likely) that the multi-modal activity we observe in PN neurons results from the integration of inputs from different cortical areas, rather than being entirely inherited from motor cortex.

      6) Do Purkinje cells follow the 40 Hz PN stimulation like in the multi-unit recordings. The PSTHs in Fig 3 are too smoothed out to see this.

      As described in the response to reviewer 1.3 above, we have added a new analysis to the manuscript to address this question (Supplemental Fig. 6). Most Purkinje cells and DCN neurons are entrained to the 40 Hz stimulation, and the entrainment is much stronger in the DCN, consistent with previous work (Person & Raman, 2012).

      7) For the correlation analysis in Fig 6C top and 7C top, is the correlation computed from z-scored firing rates rather than on raw firing rates? This is not clear from the text. If computed on raw firing rates, one would expect the correlation to be above 0 even before photostimulation, since different neurons exhibit different baseline firing rates that presumably will be the same across control and stim trials.

      The correlations were indeed computed on z-scores, rather than raw firing rates, for this reason. We have clarified this in the Methods section. This analysis was designed to capture correlations in movement-related modulation between control and laser trials, and we z-scored the firing rates to avoid the confound that would have been introduced by baseline differences.

      Reviewer #3:

      It is generally thought that the cerebellum is primarily involved in the short-timescale control of movements, while motor cortex is involved in motor planning. The present paper follows classic studies in primates and a recent study in mouse that investigated the role of cortico-cerebellar loops in motor control. To date, studies in both species applied perturbations to the cerebellum to then study changes in cortical activity. For example, it has been long known that cooling deep cerebellar nucleus produces changes in the responses of motor cortex neurons in primate (e.g., Meyer-Lohmann et al., 1975). Further, Gao and colleagues' recent paper (Nature 2018) used optogenetics to perturb responses in the deep cerebellar nucleus before licking movements. The authors of this 2018 nature paper conclude that persistent neural dynamics are maintained during voluntary movements by connectivity in within this cortico-cerebellar loop.

      The experiments are well performed, and the results are logically organized and presented. However, a main concern is that the authors have not well justified that these experiments prove a conceptual advance. The conclusions appear to be largely consistent with those of prior work, both regarding changes in the responses of motor cortex neurons, and resultant (subtle) changes in behavior (i.e., altered arm kinematics). The impact of the paper would be improved if the authors adapted a more precise style of reporting the novelty of their results throughout.

      Major concerns:

      1) The experiments are well performed, and the results are logically organized and presented. However, a main concern is that the authors have not well justified that these experiments prove a conceptual advance. As noted above, prior studies have probed the role of cortico-cerebellar loops by applying perturbations to cerebellar activity (cerebellar cortex and/or deep cerebellar nuclei) and quantifying changes in cortical activity prior to and during movement. The main novelty of the present study is that the authors perturbed the loop at a different locus, namely in the pontine nuclei (PN) which send projections to the cerebellum rather than directly to the cerebellum. The rationale for why this specific perturbation provides a conceptual advance to the field was not adequately motivated.

      The authors do clearly review prior literature showing that perturbation of cortico-cerebellar projections impacts the rest of the loop and behavior, they also well explain the application of their exciting new tool to specifically target PN neurons with their optogenetic stimulation. Yet, the authors do not motivate why it is important to specifically perturb the pontine nuclei (PN) to gain new insights into the role of "cortico-cerebellar loops" nor do they provide any reason to expect a difference in changes in loop dynamics for perturbations applied versus to the DCN. Indeed, the conclusions appear to be largely consistent with those of prior work, both regarding changes in the responses of motor cortex neurons, and resultant (subtle) changes in behavior (i.e., altered arm kinematics). Generally, these results are similar to those previously reported in primate DCN cooling experiments characterizing changes in hand movement in in a voluntary tracking task (e.g., Brooks et al., 1973; Conrad and Brooks 1974).

      We agree that the rationale and conceptual advance require clarification. Previous work has established that silencing motor cortex blocks reaching (Guo et al. 2015, Sauerbrei et al. 2020, Galinanes et al. 2018), but the perturbations used in these studies were not selective to specific output channels (e.g., corticospinal, corticoreticulospinal, or corticocerebellar), and simultaneously influenced many projection targets of motor cortex. Other work from the Brooks, Prut, Person, and Svoboda groups has shown that altering cerebellar output impairs movement planning or execution, but their methodology did not test the effects of disrupting specific cerebellar inputs (e.g., from cortex). Thus, we would argue that previous studies have not provided direct evidence of the behavioral and neural effects of disrupting cortico-cerebellar signals. The central goal of the present manuscript is to test how selective impairment of cortico-cerebellar communication - not the simultaneous impairment of corticospinal, corticoreticulospinal, and cortico-cerebellar communication, and not a nonselective disruption of cerebellar output - disrupts behavior and neural dynamics across the cortico-cerebellar loop. Our conceptual advance, then, is to show that impairment of cortico-cerebellar communication does not typically block movement execution (as simultaneous perturbation of all motor cortical outputs does), but disrupts the fine kinematic details, similar to a direct manipulation downstream in the cerebellum. We have updated the text, particularly the Abstract, Introduction par. 1, and Discussion par. 1-2, to clarify this rationale and conclusion.

      2) The description of the connectivity of the loop illustrated in Figure 1 is straightforward. Motor cortex recipient PN neurons project to PN neurons, which then project directly to the cerebellar cortex and deep cerebellar nuclei, etc. Thus, the effect of any perturbation to PN neurons should be realized rapidly within neurons in the cerebellar cortex and deep cerebellar nuclei if they are part of this direct loop. However, onset latencies for the effect of the perturbations are not documented for these experiments (Figs 3&6 in the test/reaching conditions, and associated text). Similarly, latencies are not reported for the onset of changes in motor cortex neuron responses to PN perturbations in either condition (Figs 4&7 in the test/reaching conditions, and associated text). The only reference I could find to latencies specified the that required to reach the peak firing rate - not latency of the change. Specifically: "these were stereotypical, mostly consisting of transient excitation (Fig. 4B, left; median time of firing rate peak 120 ms)" - 120ms seems very long for the loop in Fig 1. It would be useful to know the latency between optogenetic stimulation in PN and changes in PN firing rate. And then the question is at what latency are the neurons in subsequent nodes altered? Quantification of latencies of the effects that are observes in the different nodes of the cortico-cerebellar loops would strengthen the authors' conclusion that they are actually studying the direct loop in Figure 1 which would then make the study's conclusions more compelling.

      We agree that it is important to characterize the latencies of neural responses to PN stimulation, and now provide these numbers for Purkinje cells, DCN neurons, and motor cortical neurons in the text and Supplemental Fig. 7. On stimulation of the PN, activity propagates first to Purkinje cells, then the DCN, and finally to motor cortex. We also quantify the latency of PN responses to motor cortical stimulation in Supplemental Fig. 1. (For a discussion of the rationale and limitations of our method, see also our response above to reviewer 1’s first comment.) Unfortunately, we have not been able to measure the delay from stimulation onset to the earliest spikes induced by ChR2 currents in PN neurons, as this would require simultaneous insertion of a stimulation fiber and recording probe to a deep target in the PN. Furthermore, we note that the earliest measurable response in Purkinje cells occurs 10 ms after stimulation onset, and this is likely an overestimate of the minimum latency, as it takes 6.25 ms for the laser to reach peak power under sinusoidal stimulation.

      3) Overall, there was often a sharp incongruity between the complexity of many of the findings described in results and accompanying figures and the short summary conclusion provided for the Results. Here is one of many examples (bottom of page 5), where the authors conclude "These results demonstrate that the cortico-cerebellar loop does not drive reaching, but fine-tunes the behavior to enable precise and accurate movement." Yet, what the results above describe is considerable heterogeneity and variability across animals and cases. These conclusion should be more aligned with/ justified by the author's description of their actual results.

      Throughout the Results section, we have now tied the interpretations more closely to the data. For example, in the instance the reviewer mentions, we now state: “These results demonstrate that PN stimulation impairs reaching performance, typically by disrupting precision, accuracy, duration or success rate of the movement.” In the first two paragraphs of the Discussion, we lay out our interpretation of the data as evidence that the cortico-cerebellar loop contributes to fine- tuning the movement, rather than driving it, but emphasize that this is an interpretation rather than a description of experimental results. Furthermore, we now address possible factors that could underlie the diversity of behavioral effects in the fourth paragraph of the Discussion (“It is possible that the variability of behavioral effects ...”).

      4) A related issue is the disconnection between description and summary, in the description of Figure 6- 8. The emphasis on correlation, yet the authors' main point here seems to be that there are changes in the activity in cortex and DCN induced by the PN stimulation during movement explain the changes in hand trajectory. For example, Figure 6D and its implications are not effectively described in the text.

      The main conclusion of figures 6 and 7 is that PN stimulation during movement alters movement-aligned cortical and DCN activity, but this modulation is typically subtle; that is, activity on control and laser trials is highly correlated for most neurons and time points. This is in contrast with more dramatic effects observed for perturbations delivered to other nodes in the loop; for instance, thalamic perturbations can robustly prevent the generation of the cortical pattern that drives movement (Sauerbrei et al. 2020). Supplemental Fig. 8D-E and Supplemental Fig. 9D-E suggest that these subtle stimulation-induced changes during movement are largely consistent with the changes that would be expected based on neural responses to laser alone, outside engagement with the task. Finally, the decoding analysis in Fig. 8 allows us to interpret these subtle neural changes: they do not appear to be random, but are consistent with the effects of stimulation on the hand. That is, the difference in hand velocity between laser and control trials decoded from neural activity is correlated with the observed hand velocity difference. We have added a video (supplemental video 3) to better visualize this result in all three spatial dimensions simultaneously, and have edited the text in the Results section to clarify these findings.

      5) Finally, the authors conclude that changes in the activity in cortex and DCN induced by the PN stimulation during movement explain the subtle deviations in hand trajectory and conclude that the cortico-cerebellar loop is responsible for fine-tuning movement parameters (bottom pf page 5 and top of page 8). However, i) the statement that this pathway fine-tunes motion is not justified by the analysis, and ii) the novelty is not made clear relative to prior work that has investigated cortico-cerebellar loop (beyond the experimental difference in perturbation site).

      Regarding (i), we agree that the fine-tuning is an interpretation rather than a direct reflection of the data presented in the paragraph, and have altered the statement accordingly: “Overall, these results show that the subtle changes in the activity in cortex and DCN induced by the PN stimulation during movement are consistent with the changes in hand trajectory for individual mice.” We now explain our interpretation of the data as supporting a fine-tuning role in the Discussion, rather than the Results. Regarding (ii), we have now clarified in the Abstract, Introduction, and Discussion that perturbation of the PN enables us to test the effects of a selective disruption of cortico-cerebellar communication, in contrast with direct manipulations of motor cortex or cerebellum (see also our response to comment 3.1 above).

      Overall, the text that follows in the discussion presented the findings in a far more clear and compelling way than much of the text in the Abstract, Introduction and Results "perturbing cortico-cerebellar communication did not block movement execution: animals were typically able to generate the basic motor pattern during optogenetic stimulation of the PN, and neural activity in cortex and cerebellum largely recapitulated the firing patterns observed during normal movement. Instead, PN perturbation altered arm kinematics, decreasing the precision and accuracy of the reach, and perturbation-induced shifts in neural activity explained these behavioral effects." The paper would be improved if the authors adapted this more precise style of reporting throughout.

      We have edited the main text throughout to improve clarity and precision.

    2. Reviewer #1 (Public Review):

      Guo et al. describes interesting experiments recording from various sites along a cortico-cerebellar loop involved in limb control. Using neuropixels recordings in motor cortex, pontine nuclei, cerebellar cortex and nuclei, the authors amass a large physiological dataset during a cued reach-to-grasp task in mice. In addition to these data, the authors 'ping' the system with optogenetic activation of pontocerebellar neurons, asking how activity introduced at this node of the loop propagates through the cerebellum to cortex and influences reaching. From these experiments they conclude the following: the cerebellum transforms activity originating in the pontine nuclei, this activity is not sufficient to initiate reaches, and supports the long standing view that the cerebellum 'fine tunes' movement, since reaches are dysmetric in response to pontine stimulation. Overall these data are novel, of high quality, and will be of interest to a variety of neuroscientists. As detailed below however, I think these data could provide much more insight than they currently do. Thus below I provide some suggestions on improving the manuscript.

      1) Since the loop is the focus of this study, it would be nice if the authors better characterized latencies of responsivity to pontine stimulation through the loop, to address how cortically derived information routed to the cerebellum may loop back to influence cortical function. In the data provided, we know that pontine stimulation modulates Purkinje and deep nuclear firing (but latency to responses are not transparently provided in the main text, if anywhere), while motor cortical responses peak at 120 ms (after stimulus onset?, unclear), and that this responsivity is preferentially observed in neurons engaged early in the reaching movement. Is the idea, then, that cortical activity early in the reach is further modulated by cerebellar processing to (Re) influence that same cortical population? Does this interpretation align with the duration of reaches, the duration of early responsive activity during reach, and the latency of responsivity; or is the idea that independent information from other modalities entering the pontine nuclei modulates early cells? Latency to respond at the different nodes, might aid in thinking through what these data mean for the function of the loop.

      2) Many of the figures need work to aid interpretation. Axis labels are often missing (eg 2F); color keys are often unlabeled (2F); color gradients often used but significance thresholds are hard to evaluate (using same colors for z scores and control / laser is confusing 6, 8); and within-figure keys would be useful (5D-h). These issues occur throughout the manuscript.

      3) Relatedly, but also conceptually, Figure 3B has particular issues, such as identifying where the neuropixel multiunit activity is coming from. I assume that in the gray boxes illustrating the spatio-temporal profile of spiking band activity that the lower part of the box is the ventral direction, upper, dorsal. This is not spelled out. From the two examples it would seem that the spiking band is in different places in the cerebellum, undermining, I think, the objective of the figure. It would be sensible to revisit this entire figure to identify the key takeaways and design figures around those ideas. As it stands, these examples appear anecdotal. Consider moving this to a supplement. Powerband density strength is missing an axis. More importantly, it would be nice to corroborate the interpretation of the MUA with the single unit recordings, since the idea is that many neurons are entraining to the PN activity. Yet, the examples don't seem particularly entrained. Is the activity being picked up on just axonal firing of the PN axons? Fourier analysis of spiking of isolated neurons in cerebellum should be used to corroborate the idea that cerebellar neurons are entraining, rather than the neuropixel picking up entrained PN axons.

      4) The use of the GLM is puzzling. In addressing the question of how cerebellum and motor cortex interact (from the Abstract, "how and why" do these regions interact) it is unclear why these regions are treated separately. I would have expected some kind of joint GLM where DCN activity is used to predict M1 variance (5 co-recordings are reported but nothing to analyze?); or where DCN + M1 activity is used to decode kinematics to see if it is better than one or the other alone. As it stands, we learn that there is more kinematic information in the motor cortex than in DCN. This is not necessarily surprising given previous literature on cerebellar contributions to reaching movements. In principle the idea that 'PN stimulation might perturb reaching kinematics through descending projections to the spinal cord, or by altering activity in motor cortex' is treated as mutually exclusive outcomes, though it is highly unlike to be so.' Analyzing M1+DCN together could address whether DCN activity adds nothing to decoding kinematics that isn't there in M1 or adds something that M1 does not have access to. The main point here is that the physiological datasets could be better leveraged with these fits to derive insight into the interactions of the loop. R2 should be provided in the GLMs (Fig 8) to assess statistically how well they perform relative to one another, not just correlations between the two.

    1. Nonetheless, as we noted, (1) entails (3) but not (3'). That (3') is false says nothing about (1). But if (3) is false, (1) is also false. Put differently (3) is entailed by orthodox theism, while (3') is certainly not. Thus while use of (3) in showing that (1) and (2) are logically compatible is perfectly legitimate, the theist is committed to (3) in a stronger sense than that in which (3) is one of various propositions he may adopt for legitimate logical manoeuvres, and I think this is worth emphasizing.

      (3) is consistent and entailed by (1), (2).

    Annotators

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Review of "Co-chaperone involvement in knob biogenesis implicates host-derived chaperones in malaria virulence." by Diehl et al for Review Commons.


      **Major Comments.** __

      1. In this paper the function of Plasmodium falciparum exported protein PFA66, is investigated by replacing its functionally important dnaJ region with GFP. These modified parasites grew fine but produced elongated knob-like structures, called mentulae, at the surface of the parasites infected RBCs. Knobs are elevated platforms formed by exported parasite proteins at the surface of the infected RBC that are used to display PfEMP1 cytoadherance proteins which help the parasites avoid host immunity. The mentulae still display some PfEMP1 and contain exported proteins such as KAHRP but can no longer facilitate cytoadherence. Complementation of the truncated PFA66 with full length protein restored normal knob morphology however complementation with a non-functional HPD to QPD mutant did not restore normal morphology implying interaction of the PFA66 with a HSP70 possibly of host origin is important for function. While a circumstantial case is made for PFA66 interacting with human HSP70 rather than parasite HSP70-x, is there any direct evidence for this eg, protein binding evidence? I feel that without some additional evidence for a direct interaction between PFA66 and human HSP70 then the paper's title is a little misleading.

        We thank the reviewer for their kind words. They are correct that we do not show direct evidence of such an interaction, but would like to note that we, and others, despite concerted efforts to produce direct evidence, have always been hindered by the nature of the experimental system. As noted also in our reply to Reviewer 3, the inability to genetically modify the host cell leads us to suggest that indirect evidence is the best that can conceivably be provided at this time. Our evidence, although indirect, is the first experimental evidence for the importance of such an interaction, all other suggestions having been based on “guilt by association” i.e. protein localisation or co-IP analyses.

      Was CSA binding restored upon complementation of ∆PFA with the full-size copy of PFA66?

      As this project grew organically and was driven by the results already obtained, we decided to use knob morphology via SEM as a “proof-of-principle” to show that we could reverse the phenotype. Thus, while we cannot comment on whether ALL functions of PFA66 are complemented, we suspect that if the knobs revert to their WT morphology, this is likely to be true for the other tested phenotypes. We do not feel that revisiting all of our assays (which would basically entail repeating almost every experiment so far carried out) would really be much more informative. We have added a note in the discussion stating “We wish to note that we cannot unequivocally state that our complementation construct allows reversion of all the aberrant phenotypes herein investigated, however we feel it likely that all abnormal phenotypes are linked and thus our “proof-of-principle” investigation of knob/eKnob phenotypes is likely to be reflected in other facets of host cell modification and can thus be seen as a proxy for such.”.

      **Minor Comments**

      Line 36, NPP should be NPPs if referring to the plural.


      Changed


      Line 37, MC should be MCs if referring to the plural. By the way this acronym is never used in the text, it's always written 'Maurer's clefts'.

      Changed

      Abstract, Line 52-53, could be changed to "uncover a new KAHRP-independent..." as it currently implies (albeit weakly) that that this is the first observation of a KAHRP-independent mechanism for correct knob biogenesis. Maier et al 2008, have previously shown that knock out of PF3D7_1039100 (J-domain exported protein), greatly reduced knob size and knock out of PHISTb protein PF3D7_0424600, resulted in knobless parasites.

      Correct. In line with the suggestions of another reviewer, this section has been changed.

      In the Abstract it is mentioned that "Our observations open up exciting new avenues for the development of new anti-malarials." This is never really expanded upon in the rest of the paper and so seems like a bit of a throwaway line and could be left out.

      Good point, changed

      Line 59, WHO world malaria report should be cited here since these numbers are from the report not a paper from 2002.

      Done

      Line 67, Marti et al 2004 should be cited here as its published at the same time as Hiller et al 2004.

      Our mistake. Done

      Line 76, I suggest using either 'erythrocyte' or 'red blood cell' throughout the text not both.

      We now use erythrocyte throughout

      Line 80, Maier et al 2008 should be referenced here.

      Done

      Line 87, the authors should cite Birnbaum et al 2017 for the technique used. This is cited immediately after (line 98) in the results section but could be addressed at both points in the text.

      Done

      Line 123, IFAs and live cell imaging failed to detect the PFA-GFP protein and the author proposes this is due to low expression levels. However, PFA66 is expressed at ~350 FPKM in the ring stage and previous studies from your own group have visualised it using GFP before. Is there another explanation for this such as disruption of the locus here has served to greatly reduce the expression level of the fusion protein?

      The truncated protein is now distributed throughout the whole erythrocyte cytosol, not concentrated into J-dots, likely making detection difficult. We wish to note that our original GFP tagged PFA66 lines (Külzer et al, 2010) did not really show a strong signal in comparison to other lines we are used to analysing. We further believe that the sub-cellular fractionation (Figure S1) demonstrates the erythrocyte cytosolic localization of the truncated PFA66. We have no evidence that truncation causes lower expression, but any future revision will include a comparison of expression levels of endogenously GFP tagged dPFA and PFA66.

      Line 147, for consistency it would be best to introduce infected red blood cell (iRBC) at the beginning of the main text and use throughout the text instead of switching between 'infected human erythrocyte' and iRBC.

      We agree, and have changed accordingly

      Line 153, Fig S2A does not exist.

      We apologise, this has been changed

      Lines 156-158: Different knob morphologies are described with repeated reference to Fig2 and FigS2. Since multiple whole-cell SEM images are displayed in these figures it would be worth adding lettering and/or zoomed-in regions of interest highlighting examples of each aberrant knob type.


      This has now been added to Figure S2.

      Line 178-179, "Although not highly abundant in either sample, the morphology of Maurer's clefts appeared comparable in both samples (data not shown)." Why is the data not shown? Representative images of Maurer's clefts from each line should be included in the supplementary figures or this in-text statement should more clearly justified.

      Figure S3 has been adjusted to also show Maurer´s clefts in more detail. An Excel table of Data can be provided if necessary.

      Line 196, indirect immunofluorescence assay (IFA).


      Changed

      Line 201, how was the 'non-significant difference' measured? PHISTc looks quite different by eye. Rephrase the term "significant difference" as localisation of these exported proteins was compared visually rather than quantified. Otherwise, a measure of mean fluorescence intensity could be taken for each protein as a basic comparison between the two lines. In the Figure legend of S4, the term "no drastic difference", is used suggesting this was not quantified. By the way, PHISTc appears different by the represented figure.

      We apologise for our use of a specific term for non-statistically verified observations. The PHISTc image the reviewer comments on, was presented incorrectly (too much brightness introduced during processing) and is now correct. We mean to say that we could not (in a blinded check), tell the difference between WT and KO IFA images. Only KAHRP (in our opinion) demonstrated a different fluorescence pattern. As KAHRP has previously been implicated in knob formation, we then analysed this phenotype in more detail. A detailed analysis of the fluorescence pattern in the other IFAs does, in our eyes, not add to the story or add any real value to our observations.

      Line 213, you now have 3 versions for the word wild type, 'wild type', 'wild-type' and 'WT', best to choose one for consistency.

      Changed

      Line 232, 'tubelike' to 'tube-like'.

      Changed

      Line 279, just use 'IFA', the acronym has already been explained earlier in the text.

      Changed

      Line 319, 'permeation' should be 'permeability'.

      Changed

      Line 353, 'The action of host actin is known' to 'Host actin is known'.

      Changed

      Line 373, 'through their role as regulators'.

      Changed

      Line 402, either use 'HSP70-x' or 'HSP70-X' throughout the text.

      Changed

      Line 540, the speed used to pellet the samples for sorbitol lysis assay, 1600g is quite high and could reflect RBC fragility rather than direct sorbitol induced lysis. The parasitemia is also very low, and previous published methods have used ~90% parasitemia rather than the 2% used here. We are not saying the method is wrong but please check it is accurate.

      We used the method of our former colleague Stefan Baumeister (University of Marburg), who is an expert in analysis of NPP, thus we are sure the method is correct. We are in fact tempted to remove the NPP data as they deflect from the main narrative of the manuscript, this being the reason we include them only as supplementary data

      Line 479, 10µm should be 10 µM.

      Changed

      In Fig 1A, the primers A, B, C etc are not explained anywhere that I can see.

      This information has now been included in the 1A Figure legend and table 2A.

      Figure 1B, I do not see any clear band for the 3' integration indicated with the *. Can a better image be shown?

      We apologise. Integration PCRs are notoriously challenging. Any revised manuscript will include better quality images

      It seems from Fig 3G,H,I that the KAHRP puncta are bigger in ∆PFA but are as abundant as CS2. Given that KAHRP is associated with knobs how do you reconcile this with there being fewer knobs per unit area in ∆PFA compared to CS2 as in Fig 2B? The numbers of knobs/KAHRP spots/Objects per um2 seems to vary between Fig 2 and 3. Please provide some commentary about this.

      We are not sure if all KAHRP spots actually label eKnobs, and it is possible that there are KAHRP “foci” that are not associated with eKnobs. We also wish to note that the data in figure 2 and 3 were produced using very different techniques. Sample preparation may lead to membrane shrinkage or stretching, and the different microscopy techniques have very different levels of resolution. For this reason we do not believe that the data from these very different independent experiments can be compared, however a comparison within a data set is possible and good practice.

      In the bottom panels of Fig 4, KAHRP::mCherry appears to extend beyond the glycocalyx beyond the cell. Is this an artifact?

      We checked assembly of the figure and are sure that this was not introduced during production of the figure. Our only explanation is that WGA does not directly stain the erythrocyte membrane, but the glycocalyx. A closer examination of the WGA signal reveals that it is weaker at this point (and also in the eKnobs i, ii) so potentially the KAHRP signal is beneath the erythrocyte plasma membrane, but the membrane cannot be visualised at this point.

      Line 837, does this refer to 10 technical replicates or was the experiment repeated on 10 independent occasions? This should at least be done in 2 biological replicates given the range in technical replicates on the graph. Was CS2 considered as '100% lysis' or the water control described in the method? Please provide more detail.


      This figure is the result of 10 biological and 4 technical replicates. A number of data points were removed as lying outside normal distribution (Gubbs test). The highest value within a biological replicate was set to 100% to allow comparison of results. This has now been corrected in the text.

      Reviewer #1 (Significance (Required)):

      This is a reasonably significant publication as it describes knob defects that to my knowledge have never been observed before. Importantly, the deletion of the J domain from PFA66 is genetically complemented to restore function really confirming a role for this protein in knob development. Amino acids critical for the function of the J-domain are also resolved. Apart from some minor technical and wording issues the paper is really nice work apart from one area which is the proposed partnership of PFA66 with human HSP70 for which there is not much direct evidence. If this evidence can be provided, we think this work could be published in a high impact journal. Without the evidence, it could find a home in a mid-level journal with some tempering of the claims of PFA66's interaction with human HSP70.

      **Referee Cross-commenting**


      There seems to be a high degree of similarity in the reviewers' comments and I think as many issues as possible should be addressed. I definitely agree that the term mentula should be not be used.


      We have now adopted the suggestion of Reviewer 3, and use the term eKnobs.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Plasmodium falciparum exports several proteins that contain J-domains and are hypothesized to act as co-chaperones to support partner HSP70s chaperones in the host erythrocyte, but the function of these co-chaperones is largely unknown. Here the authors provide a functional analysis of one of these exported HSP40 proteins known as PFA66 by using the selection-linked integration approach to generate a truncation mutant lacking the C-terminal substrate binding domain. While there is no fitness cost during in vitro culture, light and electron microscopy analysis of this mutant reveals defects in knob formation that produces a novel, extended knob morphology and ablates Var2CSA-mediated cytoadherence. These knob formation defects are distinct from previous mutants and this unique phenotype is exploited by the authors to show that the HSP70-stimulating "HPD" motif of PFA66 impacts rescue of the altered knob phenotype. In other HSP40 co-chaperones, this motif is critical to stimulate partner HSP70 activity, suggesting that PFA66 acts as a bona fide co-chaperone. Importantly, previous work by the Przyborski lab and others has shown that deletion PfHSP70x, the only HSP70 exported by the parasite, does not phenocopy the PFA66 mutant, implying that the partner HSP70 is of host origin. The results are exciting but I have some concerns about controls needed to properly interpret the functional complementation experiments. My specific comments are below.


      We agree that some control experiments are missing, and these will be included in any future revision.

      **Major comments**

      __

      • The failure of the HPD mutant PFA66 to rescue the knob-defect is very interesting. However, the authors need to determine that the HPA mutant is expressed at the same level as the WT (by quantification against the loading controls in the western blots in Fig 1D and Fig S6H) and is properly exported (by IFA and/or WB on fractionated iRBCs, as done for the GFP-fused truncation in Fig S1A). Otherwise, the failure to rescue is hard to interpret. If these controls were in place, the conclusion that a host HSP70 is likely being hijacked by PFA66 is appropriate. This genetic data would be greatly strengthened by in vitro experiments with recombinant protein showing activation of a host HSP70 by PFA66, but I realize this may be out of the scope of the present study. Along these lines, it might be worth discussing the finding by Daniyan et al 2016 that recombinant PFA66 was found to bind human HSPA1A with similar affinity to PfHSP70x but did not substantially stimulate its ATPase activity, suggesting this is not the relevant host HSP70. This study is cited but the details are not discussed. __

      As in our answer to Reviewer 1, we will examine the expression and localisation of both WT and mutant PFA66.

      We are currently expressing and purifying a number of HSP40/70 combinations for exactly the kind of analysis suggested and hope to include such data in future revisions, but as the reviewer fairly notes, this is really beyond the scope of the current study.

      Regarding Daniyan et al (and other) papers: The fact that PFA66 can stimulate PfHSP70x does not preclude that it also interacts with human HSP/HSC70, and indeed there is some stimulation of human HSP70. Daniyan and colleagues did steady-state assays in the absence of nucleotide exchange factors. Therefore, the stimulation of human HSP/HSC70 is not very prominent. One should either do single-turnover experiments or add a nucleotide exchange factor to make sure that nucleotide exchange does not become rate-limiting for ATP hydrolysis. This is completely independent of the results for PfHSP70-X the intrinsic nucleotide exchange rates of the studied HSP70s could be very different. Also, it is important to understand that J-domain proteins generally do not stimulate ATPase activity much by themselves but in synergism with substrates, allowing the possibility that such an in vitro assay may not reflect the situation in cellula. dditionally the resonance units in the SPR analysis for PFA66-HsHSP70 are lower than those for PFA66-PfHSP70-X. This could mean that PFA66 is a good substrate for PfHSP70-X but not for HsHSP70, but this does not mean that PFA66 does not cooperate with HsHSP70.

      - The authors claim that truncation of PFA66 alters the localization of KAHRP but not the other exported proteins they evaluated by IFA (Fig S4). This seems baseless as they don't apply the same imageJ evaluation to these other proteins. Similarly, the statement that KAHRP structures "appear by eye to have a lower circularity, although we were not able to substantiate this with image analysis" is subjective/qualitative and should probably be removed.

      We mean to say that we could not (in a blinded check), tell the difference between WT and KO IFA images. Only KAHRP (in our opinion) demonstrated a different fluorescence pattern. As KAHRP has previously been implicated in knob formation, we then analysed this phenotype in more detail. A detailed analysis of the fluorescence pattern in the other IFAs does, in our eyes, not add to the story or add any real value to our observations.

      The statement on the circularity has been removed according to the reviewers wishes.

      -The section title "Chelation of membrane cholesterol...causes reversion of the mutant phenotype in ∆PFA" seems an overstatement given the MBCD effect on the knob morphology is fairly weak and remains significantly abnormal.

      The title of this section was misleading, we agree. We have retitled it “Chelation of membrane cholesterol but not actin depolymerisation or glycocalyx degradation causes partial reversion of the mutant phenotype in ∆PFA” to clarify that the reversion was only partial (as explained by the following text in the manuscript).

      **Minor comments**

      - The DNA agarose gel image in Fig 1B is not very convincing. Most of the bands are faint and there is a lot of background/smear signal in the lanes. Also, it would help for clarity if the primer pairs used for each reaction were stated as shown in the diagram (rather than simply "WT", "5' Int" and "3' Int").

      We apologise. Integration PCRs are notoriously challenging. Any revised manuscript will feature clearer images.

      - Given the vulgar connotation of "mentula", the authors might consider an alternative term.

      We have now adopted the term “eKnobs” suggested by Reviewer 3.

      - lines 67-69: The authors may wish to cite a more recent review that takes into account updated Plasmepsin 5 substrate predication from Boddey et al 2013 (PMID: 23387285). For example, Boddey and Cowman 2013 (PMID: 23808341) or de Koning-Ward et al 2016 (PMID: 27374802).

      A fair point, we have now added Koning-Ward.

      - lines 77-79: "deleted" is repetitive in this sentence.

      Changed

      - line 115: It might be clearer to state "endogenous PFA66 promoter"

      Changed

      - lines 131-132: "...these data suggests that deletion of the SBD of PFA66 leads to a non-functional protein." Behl et al 2019 (PMID: 30804381) showed the recombinant C-terminal region of PFA66 (residues 219-386, including the SBD truncated in the present study) binds cholesterol. The authors may wish to mention this along with their reference to Kulzer et al 2010 showing PFA66 segregates with the membrane fraction, suggesting cholesterol is involved in J-dot targeting.

      We should have noted this connection and thank the reviewer for bringing it to our attention. This section has been revised to include this important information.

      - line 198: It's not clear what is meant by "+ve" here and afterward. Please define.

      We have now changed this to “structures labelled by anti-KAHRP antibodies”, or merely “KAHRP”.

      - lines 749-750: "Production of PFA and NEO as separate proteins is ensured with a SKIP peptide". Translation of the 2A peptide does not always cause a skip (see PMID: 24160265) and often yields only about 50% skipped product (for example, PMID: 31164473). Because of the close cropping in the western blots in Fig 1C or S1A this is difficult to assess. Is a larger unskipped product also visible? Beyond this one point, it is general preferable that the blots not be cropped so close.

      A very valid point, and in other parasite lines we have indeed detected non-skipped protein. In our case, we visualise a band at the predicted molecular mass for the skipped dPFAGFP and the commonly observed circa. 26kDa GFP degradation product. The full-length blots have now been included as supplementary data (Figure S7).

      - lines 867-868: Explain more clearly what "Cy3-caused fluorescence" is measuring.

      The Cy3 channel refers to anti-var2CSA staining, and we have now included this information.

      - Several figure legends would benefit from a title sentence describing what the figure is about (ie, Fig legends 1, 3, 5, S1, S5 & S6)

      This has been added.

      Reviewer #2 (Significance (Required)):

      This manuscript by Diehl et al reports on the function of the exported P. falciparum J-domain protein PFA66 in remodeling the infected RBC. Obligate intracellular malaria parasites export effector proteins to subvert the host erythrocyte for their survival. This process results in major renovations to the erythrocyte, including alteration of the host cell cytoskeleton and formation of raised protuberances on the host membrane known as knobs. Knobs serve as platforms for presentation of the variant surface antigen PfEMP1, enabling cytoadherence of the infected RBC to the host vascular endothelium. This process is of great interest as it is critical for parasite survival and severe disease during in vivo infection. The basis for trafficking of exported effectors within the erythrocyte after they are translocated across the vacuolar membrane is not well understood but is known to involve chaperones. This is a particularly interesting study in that it provides evidence in support of the hypothesis, initially proposed nearly 20 years ago, that the parasite hijacks host chaperones to remodel the erythrocyte. This is biologically intriguing and also suggests new therapeutic strategies targeting host factors that would not be subjected to escape mutations in the parasite genome. The work will be of interest to the those studying exported protein trafficking and/or virulence in Plasmodium (such as this reviewer) as well as the broader chaperone and host-pathogen interaction fields.

      **Referee Cross-commenting**

      I also agree with similarity in comments. Some additional discussion on the failure to localize the PFA66 truncation by live FL is warranted, as noted by reviewer #1. Seems likely that either the level of PFA66 protein is reduced by the truncation or the truncated PFA66 is dispersed from J-dots and harder to visual when diffuse instead of punctate. In either case, the complementing copy (WT or QPD) should be visualized by IFA.


      As noted above, we believe our inability to visualize the truncated protein is likely due to its dispersal throughout the whole erythrocyte cytosol as opposed to lower expression levels, but we will be checking this, and also the localisation of WT and mutant PFA66 complementation chimera and expect to have this result for the next revision.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The data are for the most part well controlled and reveal a potential function for PFA66 in knob formation. The assays are state of the art and the data provides insight into knob formation.

      However, some conclusions are not fully supported by the data. For example, 'uncover a KAHRP-independent mechanism for correct knob biogenesis' (line 52-53) is not supported by the data because PFA66 truncation could result in misfolding of KAHRP and thus lead to knob biogenesis defects.

      We meant to imply that not only perturbations/absence of KAHRP lead to aberrant knobs. This is now changed to “…uncover a new KAHRP-independent molecular factor required for correct knob biogenesis.”.

      The other major issue is that despite having a complemented parasite line in hand, the parental parasite line is used as a control for almost all assays. This is a critical issue because an alternative explanation for their data would be that expression of truncated PFA66 leads to expression of a misfolded protein that aggregates in the host RBC OR it clogs up the export pathway and indirectly leads to knob biogenesis defects. It is surprising that the authors do not test the localization of dPFA using microscopy especially since it is tagged with GFP. While the complemented parasite line does revert back, this could also be due to the fact that the complement overexpresses the chaperone helping mitigate issues caused by the truncated protein.

      As all virulence characteristics we monitor in this study have been verified many times in the parental CS2 parasites in the literature, we think that the best comparative control is indeed the truncated cell line. The large part of our study aimed to characterize differences in various characteristics upon inactivation of PFA66 function, and for this reason we used the parental WT line as a control. Using the complementation line would not truly reflect the effect of PFA66 truncation, as PFA66::HA was not expressed from an endogenous locus, but rather from an episomal plasmid. This itself may result in expression levels which differ from WT, and thus this parasite line cannot be seen as the gold-standard control for assaying PFA66 function.

      We did indeed try to localize dPFA (lines 122-123 in the original manuscript), but were unsuccessful, likely due to diffusion of dPFA throughout the entire erythrocyte cytosol (as opposed to concentration into J-dots as the WT). For this reason we carried out fractionation instead, and could show that dPFA is soluble within the erythrocyte cytosol. This experiment additionally excludes any blockage of the export pathway as no dPFA was associated with the pellet/PV fraction. Other proteins were still exported as normal (Figure S4), further supporting a functional export pathway. Indeed, as reported by ourselves and our colleagues (particularly from the Spielmann laboratory, Mesen-Ramirez et al 2016, Grüring et al 2012), blockage of the export pathway is likely to lead to non-viable parasites as the PTEX translocon seems to be the bottleneck for export of a number of proteins, many of which are essential for parasite survival.

      Reviewer #3 (Significance (Required)):

      The malaria-causing parasite extensively modifies the host red blood cell to convert the host into a suitable habitat for growth as well as to evade the immune response. It does so by exporting several hundred proteins into the host cell. The functions of these proteins remain mostly unknown. One parasite-driven modification, essential for immune evasion, is the assembly of 'knob' like structures on the RBC surface that display the variant antigen PfEMP1. How these knobs are assembled and regulated is unknown.

      In the current manuscript, Diehl et al target an exported parasite chaperone from the Hsp40 family, termed PFA66. The phenotypic observations described in the manuscript are quite spectacular and well characterized. The truncation of PFA66 results in some abnormal knob formation where the knobs are no longer well-spaced and uniform but instead sometimes form tubular structures termed mentulae. The mechanistic underpinnings driving the formation of mentulae remain to be understood but that will probably several more manuscripts to be deciphered.

      We thank the reviewer for their kind comments, and also for the recognition that this current manuscript is merely the exciting beginning of a story!

      **Major Comments:**

      General comment on the use of controls: The large part of our study aimed to characterize differences in various characteristics upon inactivation of PFA66 function, and for this reason we used the parental WT line as a control. Using the complementation line as a control in this context would not truly reflect the effect of PFA66 truncation, as PFA66::HA was not expressed from an endogenous locus, but rather from an episomal plasmid. This itself may result in expression levels which differ from WT, and thus this parasite line cannot be seen as the gold-standard control for assaying PFA66 function. Our complementation experiments were initially designed to verify that phenotypic changes ONLY related to inactivation of PFA66 function and were (as unlikely as this is) not due to second site changes during the genetic manipulation process. To avoid lengthy and not really very informative analysis of the complementation line, we used knob morphology via SEM as a “proof-of-principle”. However, as the reviewer is formally correct, we have added a passage to the discussion stating that “We wish to note that we cannot unequivocally state that our complementation construct caused reversion of all the aberrant phenotypes herein investigated, however we feel it likely that all abnormal phenotypes are linked and thus our “proof of principle” investigation of knob/eKnob phenotypes is likely to be reflected in other facets of host cell modification and can thus be seen as a proxy for such.“.

      Fig 3: The control used here is the parental line. Was there a reason why the complemented parasite line was not used as the control? Showing that the KAHRP localization and distribution is restored upon complementation would greatly increase the confidence in the phenotype.

      Please see our general comments above.

      Fig 5: The data showing a defect in CSA binding are convincing but again only the parental control is used and not the complemented parasite line. The complemented parasite line should be used as a control for the PFA binding mutant.

      Please see our general comments above, and also our reponse to reviewer 1.

      In 5D, the defect in dPFA seems to be occur to a lesser degree than Fig. 2C. How many biological replicates are shown in each of these figures? The figure legend says 20 cells were quantified via IFA but were these cells from one experiment? The expression of mentulae seems quite variable, while the authors mention '22%' (line 164), it seems in most other experiments, its more ~10% (5D and S6B, D-E). Were these experiments blinded?

      As the reviewer is likely aware, subtle differences in parasite culture conditions, stage, fixation, SEM conditions and length of time in culture between time experimental time points can lead to variations in results. Due to the time required to generate the data for figure 5, these experiments took place months after the original (i.e. Figure 2C) analysis. It is not possible to directly compare the results of these two independent experiments, however it is possible to compare the results of the parasite lines included within each set of experimental data. Due to the time and cost involved, each of these experiments represents only one biological replicate. If required, we can include more replicates, although this is more likely to further complicate the situation due to the reasons mentioned above.

      Fig S6G: The staining suggests that most PfEMP1 in is not exported, in any parasite line. Staining for PfEMP1 is technically challenging and these data are not enough to show that expression level is 'similar' (Line 279-280). It may be more feasible to use the anti-ATS antibody and stain for the non-variant part of PfEMP1 (Maier et al 2008, Cell).


      It is well known that a large portion of PfEMP1 remains intracellular. This figure does not aim to differentiate between surface exposed and internal PfEMP1, but merely to show that similar TOTAL PfEMP1 is expressed in the deletion line, and also that the parasites have not undergone a switching event which would lead to loss of CSA binding ability. We will endeavour to address this in future revisions by Western Blot but wish to note that WB analysis of PfEMP1 is notoriously difficult.

      Lines 320-322: The logic of why increased robustness of the RBC membrane would lead to faster parasite growth is confusing. It is likely that the loss of PfEMP1 expression leads to faster growth. The loss of NPP is minimal and may not cause growth defects in rich media.

      As far as we can detect, there is no loss of total PfEMP1 expression (as verified by figure S6G), but rather a drop in surface exposure and functionality, which is unlikely to affect parasite growth rates. What we intended to say was that the NPP assay is influenced by fragility of the erythrocyte, and therefore a stiffer erythrocyte may be more resistant to sorbitol-induced lysis. As the NPP result does not really add much to the main narrative of this manuscript, we would prefer not to invest unnecessary effort for a minimal potential readout. Indeed, we are tempted to remove the NPP data as they deflect from the main findings of the manuscript, this being the reason we include them only as supplementary data

      Lines 433-434: These data do support a function for HsHsp70 but these data are among many others that have previously provided circumstantial evidence for its role in host RBC modification. May be a co-IP would help support these conclusions better.

      Despite all our best efforts and publications, we have been unable to detect this interaction in co-IP or crosslink experiments, although we were successful in detecting interactions between another HSP40 (PFE55) and HsHSP70 (Zhang et al, 2017). Although this is disappointing, it may be explained due to the transient nature of HSP40/HSP70 interactions. We agree that our suggestion (that parasite HSP40s functionally interact with human HSP70) is not novel (we and others have noted this possibility for over 10 years), however the challenging nature of the experimental system makes it very difficult to show direct evidence of the importance of this interaction in cellula. Over the past decade we have use numerous experimental approaches to try to address this but have always been confounded by technical challenges. In 2017 the corresponding author took a sabbatical to attempt manipulation of hemopoietic stem cells to reduce HSP70 levels in erythrocytes, however it appears (unsurprisingly) that HsHSP70 is required for stem cell differentiation, and thus this tactic was not followed further. The authors believe that, due to the lack of the necessary technology, indirect evidence for this important interaction is all that can realistically be achieved at this time, and this current study is the first to provide such evidence.

      We would further like to note that a successful co-IP would not directly verify a functional interaction between PFA66 and HsHSP70, but could also reflect a chaperone:substrate interaction between these proteins, and is therefore not necessarily informative.

      **Minor Comments:**

      Fig1: The bands are hard to see in WT and 3’Int. May be a better resolution figure would help? Also, the schematic shows primers A-D but the figure legend does not refer to them. It would be useful to the reader to have the primers indicated above the PCR gel along with the expected sizes.

      We apologise. Integration PCRs are notoriously challenging. Any revised manuscript will contain clearer images.


      Fig S1: The NPP data could be improved if tested in minimal media. It has been shown that NPP defects do not show up in rich media (Pillai et al 2012, Mol. Pharm. PMID: 22949525). Does complementation restore NPP and growth rate?

      As the NPP result does not really add much to the main narrative of this manuscript, we would prefer not to invest unnecessary effort for a minimal potential readout. Indeed, we are tempted to remove the NPP data as they deflect from the main findings of the manuscript, this being the reason we include them only as supplementary data. Likewise the complementation experiments are, we feel, unnecessary.

      Fig 4: It is not clear what the line scan analysis are supposed to show. What does ‘value’ on the y-axis mean?


      These are line scans of fluorescence intensity (arbitrary units) along the yellow arrows shown on the fluorescent panels. This is now indicated in the figure legend.

      Fig S5D: Maybe it was a problem with the file but no actin staining is visible.

      The actin stain was visible on the screen, but unfortunately not in the PDF. We have applied (suitable) enhancement to produce the images in the new version.

      Fig 6: A model for mentulae formation is not really proposed. Only what the authors expect the mentulae to look like.

      We have changed the legend to reflect this “Figure 6. Proposed model for eKnob formation and structure.”. We do propose that runaway extension of an underlying spiral protein may lead to eKnobs, thus would like to keep the word “formation”.

      Lines 312-313: It is not clear what 'highly viable' means, parasites are either viable or not.


      This has been changed.

      Lines 400-405: The authors forgot to cite a complementary paper that showed no virulence defect upon 70x knockout or knockdown (Cobb et al mSphere 2017). Those data also support a role for HsHsp70.

      We apologise for the omission. This is now included.

      **Referee Cross-commenting**


      I agree, the comments are pretty similar. The authors could tone down their conclusions or add more data to support their conclusions. May be call them elongated knobs or eKnobs, instead of mentula? __

      We have now removed the offending term and use eKnobs.

    1. Reviewer #4 (Public Review):

      In this paper, the author uses an impressive comparative dataset of 172 species to investigate the relationship between intraspecific genetic diversity and census (actual) population size. They find that even when they use phylogenetic comparative methods, the relationship between neutral diversity and population size is much weaker than predicted by theory and that selection on linked sites is unlikely to explain this difference. The paper convincingly demonstrates that the paradox of variation first pointed out by Lewinton in the 70s remains paradoxical.

      This paper is exceptionally strong in multiple ways. First, it is statistically rigorous; this is particularly impressive given that the paper uses methods and data from multiple fields (genomics, macroecology, conservation biology, macroevolution). This is the most robust estimate of the relationship between diversity and population size that has been published to date. Second, it is conceptually rigorous: the paper clearly lays out the various hypotheses that have been put forth over the years for this pattern as well as the logic behind these. The author has done a great job at synthesizing some complex debates and different types of data that are potentially relevant to resolving it. Third, it is exceptionally well-written. I sincerely enjoyed reading it. Overall, I think this is a major contribution to this field and though the paper does not resolve the challenge laid down by Lewinton, I think these analyses (and curated data/computational scripts) will inspire other researchers to dig into this question.

      I do however, have some suggestions as to how this paper could be strengthened.

      First, in phylogenetic comparative methods (PCMs) there has been a persistent confusion as to what phylogenetic signal is relevant -- when applying a phylogenetic generalized linear model with a phylogenetically structured residual structure (which the author does here), one is estimating the phylogenetic structure in the errors and not the traits themselves. The comparative analysis are well-done and properly interpreted but at some points in the text, particularly when addressing Lynch's conjecture that PCMs are irrelevant for coalescent times and comments/analysis on the appropriateness of Brownian motion as a model of evolution, that there is some conceptual slippage and I suggest that author take a close look and make sure their language is consistent. Strictly speaking the PGLM approach doesn't assume that the underlying traits are purely BM -- only that the phylogenetic component of the error model is Brownian. As such running the node-height test on the both the predictors and the response variable separately -- while interesting and informative about the phylogenetic patterns in the data (including the shift points you have observed) isn't really a test of the assumptions of the phylogenetic regression model. It is at least theoretically plausible (if not biologically) that both Y and X have phylogenetic structure but that the estimated lambda = 0 (if for instance, Y and X were perfectly correlated because changes in Y were only the result of changes in X). To be clear, I am fine with the PGLM analysis you've done and with the node-height test; I just don't think that the latter justifies the former.

      One note about the ancestral character reconstruction: I think it is a fine visualization and realize you didn't put too much emphasis on it but strictly speaking the ASR's were done under a constant process model and therefore they wouldn't provide evidence for (a probably very real shift) between phyla. I think it was a good idea to run the analyses on the clade specific trees (particularly given how deep and uncertain the branches dividing the phyla are) but I just don't think you could have gotten there from the ASR.

      I am not convinced that the IUCN RedList analysis helps that much here and in my view, you might consider dropping this from the main text. This is for two reasons: 1) species may be of conservation concern both because they have low abundance in general and/or that their abundance is known to have experienced a recent decline -- distinguishing these two scenarios is impossible to do with the data at hand; and 2) there is of course a huge taxonomic bias in which species are considered; I don't think we can infer anything ecologically relevant from whether a species is listed on the RedList or not (as you suggest regarding the lynx, wolverine, and Massasauga rattlesnake) except that people care about it.

      This is not really a weakness but I find it notable that recombination map length is correlated with body size. I realize this is old news but I was left really curious as to a) why such a relationship exists; and b) whether the mechanism that generates this might help explain some of the patterns you've observed. I would be keen to read a bit more discussion on this point.

    2. Reviewer #3 (Public Review):

      This study is quite directly a follow-up study of the recent work of Corbett-Detig et al (2015) and the commentary by Coop (2016) which aimed to understand the relation between population size and diversity, and the degree to which the shape of the relation could be explained by the action of linked selection. The analysis here scales up the sample size for a large-scale focus on comparative analyses of animals, and introduces the application of phylogenetic correction to control for relatedness.

      As the most comprehensive analysis of its type to date, and with the addition of phylogenetic correction, this work's strength primarily lies in confirming the conclusions laid out in the commentary by Coop, notably that linked selection is unable to fully explain the narrowness of the diversity across species with orders of magnitude variation in population sizes. Through an explicit model-fitting of the effects of linked selection, the main conclusions are essentially that Lewontin's Paradox remains unexplained. The Introduction and discussion provide a very nice accounting of the range of possible explanations. I also appreciated the connection of the population size inferences to IUCN status.

      I wasn't so convinced that the assessment of phylogenetic inertia (Lambda>0) really provides a way to assess Lynch's argument that coalescent times are too short to have a phylogenetic effect. For reasons outlined by the author in the discussion, it could well be that any phylogenetic inertia signal is due to inertia of life history traits correlated with effective population size rather than with diversity itself. The discussion raises this important point, but I think leaves us with the difficulty of really assessing how important that phylogenetic correction really is: if diversity has no direct phylogenetic non-independence, I am a bit unsure how much we have learned through this analysis alone (i.e. what is lambda telling us), without an explicit assessment of how often divergence times may actually truly be on the same order as coalescent times.

      That said, I think it's a very open question whether diversity actually has phylogenetic independence because of short split times relative to effective population sizes. The author mentions the possible effect of large Ne on causing this to be violated; but I also wondered whether many of the small Nc species are still retaining a fair bit of ancestral polymorphism, further homogenizing diversity levels.

      Overall a number of possible explanations (such as the effect of variable selected site densities, and variable recombination) were raised, and rather quickly rejected as 'unlikely to explain the qualitative patterns'. In a number of cases these statements were fairly brief, and I wondered whether in aggregate how likely a combination of these COULD explain the patterns. Looking at Figure 5B, it seems like the major effect of phylogeny (or correlated life history) is also apparent for the discrepancy between observed and predicted diversity- Chordates seem to have the largest discrepancy. With that in mind, I do wonder whether some feature of genome structure in Cordates, including a combination of the effects discussed in the paper that could account for the discrepancy (e.g. the effects of variable recombination rates/genome size and functional densities, variation in mutation rates, etc.) could collectively account for the paradox, even though individually the author rules them out as being able to explain the 'qualitative pattern'. Could the genome structure of chordates lead to a major difference in linked selection that's unaccounted for here?

      Mei et al (2018) (American Journal of Botany, Volume 105, Issue 1, p1-124) argued that species with larger genomes have greater 'functional space', implying a greater deleterious mutation rate in species with larger genomes. This could potentially be a factor driving those Chordates with intermediate Nc values furthest below the predicted line?

    3. Reviewer #1 (Public Review):

      The standard neutral model, which is our null model for levels of genetic variation, predicts that they should be proportional to census population sizes. In reality census population sizes across metazoan species span several orders of magnitude more than the ~3 orders spanned by levels of genetic diversity. This discrepancy is referred to as Lewontin's paradox, and to resolve it would mean to explain how basic population genetic processes lead to the modest span of genetic diversity levels that we observe. This is a central question in population genetics (which is, after all, concerned with understanding patterns of genetic variation) and is of substantial general interest.

      The manuscript addresses Lewontin's paradox through three main analyses:

      1) It derives novel estimates of census population size across metazoans, which alongside previous estimates of neutral diversity levels, enables a revised quantification of the relationship between diversity levels (\pi) and census populations sizes (Nc).

      2) It quantifies the relationship between \pi and Nc controlling for phylogenetic relatedness.

      3) It revisits the question of whether this relationship can be accounted for by the effects of selection at linked loci (e.g., sweeps and background selection). I address each of these analyses in turn.

      Novel estimation of census population sizes in metazoans: The estimates are derived by: 1) estimating the density of individuals within their range, based on body size and a previously observed linear relationship between body size and density (Damuth 1981, 1987); 2) applying a geometric algorithm (finding the minimum alpha-shape computationally, sometimes adjusting alpha manually) to geographic occurrence data to estimate the area of the range; and 3) multiplying the two.

      The results are sometimes surprising. For example, Drosophila melanogaster is estimated to have a population size > 10^17 (Fig. 1); if the volume of an individual is 1 mm3, this implies a total volume > 1km x 1km x 100 m. Additionally, some species classified as endangered have census estimates > 10^8 (Fig. 3). The author compares his area estimates with estimates for species in the IUCN Red List (focused on endangered species) to find that they largely correlate (although this is not quantified). I think further investigation of the quality of the census size estimates is warranted. Are there are other estimates of census size or biomass that can be used for validation, e.g., for species of economic and biomedical importance (e.g., herring and anopheles)?

      If the proposed method proves to work well, I imagine that the estimates of census size may be of broad interest in other contexts. In the context of Lewontin's paradox, it may be interesting to quantify the difference in the relationship between \pi and Nc suggested by the new estimates vs the proxies used in previous work (e.g., Leffler et al. 2012).

      Quantifying the relationship between \pi and Nc controlling for phylogenetic relatedness: I am unclear about the motivation for this analysis. As Lynch argued (and the author describes), if TMRCAs of neutral loci within a species are smaller than the split time from another species in the sample, its genetic diversity level was shaped after the split, and it could be considered an independent sample for the relationship between \pi and Nc. There may be underlying factors shaping this relationship that are not phylogenetically independent (e.g., similar life history traits) but it is unclear why that would justify down-weighting a sample. In that sense, I am not convinced by the authors argument that finding a 'phylogenetic signal' justifies the correction. Stated differently, it is not obvious what is the 'true' relationship being estimated and why relatedness biases it. One could imagine that the 'true' relationship is the one across extant species, in which case the correction is not needed (with the possible exception of species in which TMRCAs are on the same order or greater than split times). I don't know what an alternative 'true' relationship would be.

      Moreover, I am not sure how a more precise 'quantification' of the relationship between diversity and census size serves us. Regardless of corrections, it is obvious that the null provided by the standard neutral model is off by orders of magnitude. Perhaps once we have alternative explanations for this relationship then testing them may require corrections, but presumably the corrections will depend on the explanations.

      One context in which phylogenetic considerations and quantification may be relevant is the comparison of the \pi - Nc relationship among clades. Notably, one could imagine that different population genetic processes are important in different clades (e.g., due to reproductive strategy) and a comparative analysis may highlight such differences. It is less clear whether the corrections that are applied here are the relevant ones. Separating clades makes sense in this regard, but it is unclear why to correct for non-independence within a clade. Furthermore, it seems that in order to point to different processes one would like to control for the distribution of census population sizes in comparisons between clades (to the extent possible). Otherwise, one can imagine the same process shaping the relationship in different clades, but having a non-linear (in log-log scale) functional dependence on census population size (as in the case of genetic draft studied next). In this regard, I am not sure I follow the argument attributed to Gillespie (1991) and specifically how the current analysis supports it.

      In summary, I find the ideas of clade level analyses and of using phylogenetic comparative methods (PCMs) to look at census population size (and possibly diversity levels) promising. For example, as the author alludes to in the Discussion (bottom of P. 13), PCMs may be informative about the hypothesis that species with large census sizes have a greater rate of speciation. Yet I find the current analyses difficult to interpret.

      Analysis of the effects of linked selection: The author investigates whether the effect of selection at linked sites (e.g., selective sweeps and background selection) can account for the observed relationship between diversity levels and census population size. To this end, he assumes that different species have the same sweeps and background selection parameters inferred in Drosophila melanogaster, but differ in census size and genetic map length.

      As justification for using selection parameters inferred in D. melanogaster, the author argues that this is a "generous" assumption in that the effects of linked selection in this species are on the high end. One issue with this argument is that among reasons for the strong effects in D. melanogaster is its short genetic map length. This is not a substantial caveat, given that the analysis is meant as an illustration and it can be resolved by using appropriate wording. Perhaps more troubling is that the author's estimate of the reduction in diversity level in D. melanogaster is much greater than the reduction estimated in the inference that he relies on (several orders of magnitude and less than one, respectively). This discrepancy is mentioned but should probably be addressed more substantially.

      The results of the analysis are intriguing. The effects of linked selection `shrink' the ~13 orders of magnitude of census population sizes to ~3 orders of magnitude of diversity levels. This massive effect is largely due to the genetic draft (Gillespie 2001) and to a lesser extent to the decrease in map length with increasing census size: when the census population size becomes very large (Nc~10^9) and coalescence rates due to genetic drift decrease accordingly (~1/2Nc), coalescence rates due to sweeps, which increase owing to the smaller map lengths (and would otherwise remain constant), become dominant. In hindsight this is quite intuitive and aligns with Gillespie's original argument, but this is in hindsight, and using this argument in conjunction with data, specifically with census population size and map length estimates, is novel.

      As the author points out, the resulting relationship between diversity levels and census population sizes does not fit the data well. Notably, predicted diversity levels are too high in the intermediate range of census population sizes. Nonetheless, their analysis suggests that linked selection may play a much greater role than previous studies suggested (i.e., the analyses of Corbett-Detig et al. (2015) and Coop (2016) suggests that it cannot account for more than 1 order of magnitude). Maybe the poor fit is due to the importance of other factors (e.g., bottlenecks) in species with intermediate census population sizes?

      I also wonder whether the potential role of linked selection may be clearer if the different effects are shown separately, and perhaps with less reliance on the estimates from D. melanogaster. Namely, the effects of background selection can be shown for a few different values of Udel, e.g., between 0.3-3 (this range seems plausible based on many estimates). They can be shown both accounting and not accounting for the relationship between map length and census size. Similarly, the effect of sweeps can be shown for several values of corresponding parameters, and perhaps even for different models for how the number of beneficial substitutions varies with census size (see Gillespie's work to that effect). I believe that such illustrations will be fairly intuitive and less restrictive.

    1. This datashortage is caused by chronic under‐funding ofconservation science, especially in the species‐rich tropics (Balmford and Whitten 2003), andthe highfinancial cost and logistical difficultiesof multi‐taxafield studies.

      Why is conservation science under-funded? I know that this text may be a little older, so funding may not be as bad now. With the effects of climate change seen everywhere, you would think the government would put more efforts and funds into conservation. We need to be responsible for these species and their ecosystems, especially if we're the ones responsible. Luckily it seems that the Biden administration is taking a bigger focus on the environment and climate change than the past administration. I read in an article that the new administration plans to triple the amount of protected land in the U.S., which will be huge for conservation. More changes also seem to be on the way.

      https://www.nationalgeographic.com/environment/article/biden-commits-to-30-by-2030-conservation-executive-orders

    1. We pro-pose that neurophenomenology of dreaming is a nascent discipline that requires rethinking the relative role of third-, first- and second-person methodologies, and that a paradigm shift is required in order to investigate dreaming as a phenomenon on a continuum of conscious phenomena as opposed to a break from or an alteration of consciousness

      We may need to think of dreams not as a break form consciousness, but as a unique continuation of our wakeful conscious state.

    Annotators

    1. Yet specialization becomes increasingly necessary for progress, and the effort to bridge between disciplines is correspondingly superficial.

      Conversely, the less specialized a discipline is, the more superficial its understanding and therefore more easily bridged. I'm thinking education in this context.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      Bhide and colleagues present an insightful study of how cellular mechanics influences differential cell behaviour during morphogenesis despite apparent genetic homogeneity of the cellular ensembles. They dissect the extensively studied system of mesoderm invagination in Drosophila, focussing on the differences in cell behaviours between the cells in the middle of the infolding tissue and on the periphery that, as far as we know, share a common gene expression profile. They describe sub-cellular dynamics of major effector of apical constriction morphogenesis, the myosin motor distribution, in the invaginating cells and conclude that differences in myosin levels alone cannot account for the observed differences in cell behaviours. In order to understand the cell behaviour inhomogeneity, they turn to biophysical simulation and in an impressively exhaustive manner substantiate the idea that non-linear effects are required for explaining the phenomenon. This theoretical treatment fits well with the notion that the genetic identity of the cells but rather cell-cell mechanical coupling determine the differences in invaginating cell's behaviours. Additionally, the modelling is consistent with the myosin asymmetry and dynamics in the cells whose behaviours is being contrasted. Complementary, and beautifully executed filament-based modelling of microscopic actomyosin contractility further corroborates this view. Finally, the proposed model of non-linear actomyosin contractility dynamics governing the differential cell behaviour across genetically homogenous cellular field, is challenged by two complementary laser ablation and optogenetic experimental approaches. Overall, the results represent convincing evidence that points the tissue mechanics field of Drosophila mesoderm into an interesting new direction and has general implications for the understanding of the interplay between genetic regulation and emergent behaviours of cells operating in mechanically complex multicellular embryonic context. The study is meticulously executed, highly quantitative and combines effectively experiment and theory. I have only minor comments that concern in particular the presentation of the results.

      The paper is very dense and the text does not complement well the results presented in the main figures. Many panels in the Figures are not referred to explicitly. Figure elements are referenced out of order both within and across Figures. Sometimes, particularly, in the last two Figures (3 and 4) the reader is left alone to figure out what the data show (with the appropriately terse legends and without the clear narrative in the text, it is an uphill battle for non-specialists). Some key results are hidden in the sea of elements within the Figure 2 that contains the most important, relevant and impressive data.

      We have split this figure in two, moved some of the results from Suppl. Fig. 5 into one of its parts and included new calculations and data. We have also extended the description of these results in the main text and in the figure legends.

      As an example, on line the authors point to panel 2F to demonstrate the asymmetry of myosin distribution in some cells. To the best of my understanding, this phenomenon is actually shown in Fig 2E which is curiously not referenced at all.

      We have corrected the references to the panels

      Similarly, Figure 2K and L provide crucial data substantiating much of the conclusions of the paper. It requires a major effort to understand what the graphs mean. The following simulation results are quite impressive and would deserve a separate Figure which could provide more space for explaining what the parameter maps actually show. What is for instance plotted on the Y axis as steepness?

      We have added the following explanation: “The ‘width’ of the profile is the number of cells with maximum value; the ‘steepness’ is the slope between minimal and maximal values (equation 2 in materials and methods).”

      Secondly, I find the overall narrative of the manuscript needing some reorganisation. The main question is set-up extremely well, however in the middle of the manuscript the focus on the connection between cell behaviours and genetic programs is lost. New conclusions on force transmission between cells emerge, however they are not obviously connected with the question posed from the onset and addressed in the discussion section.

      To us, the section on force transmission seemed like an important component of the issue of intrinsic versus extrinsically determined cell behaviours. We had seen that the intrinsic programme of the cells, as reflected in their myosin levels, might not be sufficient to explain the difference between stretching and constricting. If their behaviour is not intrinsically determined, then there must be something acting from the outside, and we are looking here at what that might be, i.e. we need to find out how the potential constriction is influenced. The first model tests under which conditions differential contractility leads to different ‘cell’ behaviours. This in turn leads directly to the question of the forces the cells in the epithelium exert on each other.

      My impression is that the authors are conservative in their reasoning, however it does compromise the overall message of the story that should ideally focus on one subject. I find the combined evidence presented sufficiently supportive of the model that is beautifully and eloquently presented in the concluding sentence of the paper:

      "This mechanism, which we propose corresponds to the non-linear behaviour predicted by the models, would apply both to central and to lateral cells, with a catastrophic 'flip' being stochastic and rare in central cells, but reproducible in lateral cells because of the temporal and spatial gradient in which contractions occur."

      This may not turn out to be the entire story or even entirely correct, but it is certainly and exciting way of thinking about the problem. I wish that the manuscript would stay more on this subject throughout and provide intermediate conclusions supporting this model as the story develops.

      Few more minor comments:

      We have corrected all of the typos, mistakes and omissions and adapted the text, as mentioned below.

      Line 36 - typo > Line 97 - starting bracket missing > Line 126 - data on intensity are presented here. There is also a panel on concentration (Fig 1H). Where is this discussed?

      An explanation (definition) has been added to the main text.

      Line 132 - panel 2G - disruptive out of sequence reference to a future figure > Line 135 - with this regard - please spell out this important conclusion

      We have expanded this part, basically introducing the conclusion more clearly (we hope).

      Line 183 - typo > Line 210 - insects do not have intermediate filaments

      We have added ‘mammalian‘ to the reported experiment in the text, to make it clear that this does not refer to Drosophila cells

      Line 238 - please provide a hint of how such global ablations are performed > We have added this – both explicitly, and the relevant references.

      Line 240 - walk us through the Figure, it is too complex to figure it out alone > We have added a more extensive explanation both in the text and in the new figure legend.

      Line 245 - why is the clear hypothesis mentioned above (point 2) rephrased? > Line 273 - vague statement

      We have changed the text in response to these useful pointers.

      **Significance:

      The results represent convincing evidence that points the tissue mechanics field of Drosophila mesoderm into an interesting new direction and has general implications for the understanding of the interplay between genetic regulation and emergent behaviours of cells operating in mechanically complex multicellular embryonic context.

      Reviewer #2

      Bhide and colleagues explore the mechanisms of cell expansion in epithelial morphogenesis. During the invagination of the Drosophila mesoderm, cells in the center of the prospective mesoderm constrict under the action of actomyosin pulses, while lateral cells elongate towards the center of the mesodermal placode to accommodate the reduction in apical surface of the central cells. Central and lateral cells display strong similarities in terms of gene expression. How are thus this different behaviors (contraction and expansion) accomplished? The authors found that both central and lateral cells assemble actomyosin networks, although lateral cells do it with a certain delay. Mathematical models of cell constriction across the mesoderm using different strain-stress responses showed that strain-induced cell softening was necessary recapitulate the patterns of constriction and expansion observed in vivo. Furthermore, modelling predicts that cells can stretch until the actin networks yield and break. Laser ablation and optogenetic reduction of contractility in central cells results in a reduction in the apical surface area of lateral cells. An optogenetic increase in contractility in lateral cells caused an increase in apical area in central cells. Together, these data suggest that mechanical cues can override and contribute to sculpting genetically defined morphogenetic domains.

      I propose to address the following points before further considering the manuscript:

      Major

      1. Figure 3: following laser ablation of central cells, lateral cells reduce their apical surface. How do the authors know that this reduction in lateral cell apical surface area is an active process, driven by actomyosin-based contraction, rather than a passive response to the expansion of the wound induced by laser ablation?

        A similar argument could explain the constriction of lateral cells after optogenetic inhibition of actomyosin networks: the central cells relax, expand and compress the lateral cells.

      With regard to the comparison to wounds, it is important to note that the epithelium is not actually wounded by either ablation method. Thus, while the treatments ablate the actyomyosin meshwork, they do not ablate or kill the cells. Perhaps the term is an unfortunate choice, since it is more commonly used in developmental biology for killing cells. However, here the cells remain intact and when the optogenetic or laser treatment is released the cells resume their physiological activities.

      We have added a note in the text and now refer to ‘laser microdissection’, a term of art in the field, for more clarity.

      Regarding the more important question of what is the active process, expansion of the central cells or constriction of the lateral cells, a contribution from expanding central cells is of course in theory not impossible.

      However, for this scenario to work, in the absence of pulling from the lateral cells, there would have to be a force that is generated in the central cells, in this case a pushing force that would expand the cells and act on the lateral cells. We have shown in our previous work that if the actomyosin is dissected in dorsal cells, which are not surrounded by potentially contractile cells, the cells do not expand (Rauzi et al, 2017). This shows that ‘relaxing’ by itself does not have ‘expansion’ as a consequence. One would therefore have to consider how such a pushing force could arise in these cells. We can think of only two possibilities: hydrostatic pressure or an active force from the subcellular molecular machinery.

      Considering hydrostatic pressure, if the apical actomyosin that is ablated was responsible for maintaining such a pressure inside the cell (a reasonable assumption), then releasing the actomyosin would allow the cell volume to push against the neighbouring cell. However, such a recoil would occur on a very short time scale (seconds), whereas we see the contraction of the lateral cells continuing over extended periods (minutes).

      Alternatively, expansive forces could be generated by the cytoskeleton. Cytoskeletal pushing forces can come from microtubules (classical example: mitotic spindle; epithelial morphogenesis: work from T. Harris and B. Baum labs: PMID 18508861 and 20647372), or from continuous creation of new cross-linked or branching actin networks pushing against plasma membranes, as in the leading edge of crawling cells. But the microtubules in the blastoderm cells are not oriented in such a way they could provide a force in the correct dimension in these cells (the majority is oriented along the apical-basal axis). In addition, the connection between MT and the plasma membrane depends on the cortical actin meshwork (involving, for example, the actin-binding proteins P120-Catenin or patronin/Shot; Roeper lab, PMID 24914560, StJohnston Lab, PMID: 27404359) but the connection of actin with the plasma membrane has been severed in the optogenetically manipulated cells.

      By contrast, we show that normal lateral mesodermal cells possess a contractile actin network. So the only sustained force generated in the system at this point is the contractile force in lateral cells (which is normally counteracted by the stronger contractile force from central cells).

      Thus, we conclude that the expansion of central cells is a passive response to a contractile force from lateral cells, not an active process and conversely, the constriction of lateral cells is an active autonomous process.

      To demonstrate active responses of the lateral cells upon laser ablation and optogenetic manipulations of central cells, at the very least the authors should show the distribution of myosin in the lateral cells that constrict and demonstrate the assembly of contractile networks.

      We have now included the requested data for the experiments with laser ablations. Suppl. Fig. 8 and Suppl. video 3 show the myosin that accumulates in lateral cells. It would be nice also to be able to show this for the optogenetic experiments. However, despite trying hard, we have not succeeded in generating healthy embryos that carry the entire set of transgenes that are necessary to carry out the optogenetic experiments and at the same time visualize myosin (see also response to referee 2, point 3).

      1. Modelling suggests that actin networks yield and break in lateral cells. Does this occur in vivo?

      We postulate that the skewed and inhomogeneous distribution of myosin and the large myosin-free areas in stretched cells (lines 170 – 172 in the original text) are indications of a yielding meshwork, or at least of uneven force distribution in the network that leads to ineffective contraction or even release – i.e. functionally correspond to yielding. We have made this more explicit now.

      We have also added an additional panel quantifying more clearly the proportion of low- myosin areas in lateral cells (now Fig. 3H).

      Work from the Lecuit lab has recently shown beautifully that it is the connectivity of the myosin mesh rather than the underlying actin meshwork that affects apical forces in epithelial cells (PMID: 32483386), and our own findings are entirely consistent with that.

      1. Lines 166-175: The authors propose that constriction of a cell affects the localization of myosin in its neighbors. However, this is not directly measured. The authors should quantify the relative myosin offset in the cells around constricting cells, and show that that offset is greater (and oriented towards the constricting cell) than in cells around expanding cells. There should be a correlation between the relative size change of a cell and the myosin offset (not just concentration) in their neighbours. We now provide measurements of the rate of cell area change against the offset of surrounding myosin (the distance of myosin from a cellular border). We see that surrounding myosin is closer to the border of constricting cells and tends to be further away from the borders of expanding cells.

      We have added these data to the new Fig. 3I.

      In addition, does optogenetic activation of constriction in lateral cells affect the offset of myosin networks in central cells?

      This is technically challenging. For such an experiment we would need an embryo to express membrane and myosin markers in addition to the two optogenetic constructs and the GAL4 driver. We tried multiple times to generate such a cross, but obtained either no embryos or, at best, deformed embryos. We also tried to use the MCP-MS2 system in parallel to CRY2-RhoGEF2 but the crosses had the same problem. This sensitivity to additional genetic load was also observed in the DeRenzis lab, who generated these strains and tested and used them extensively.

      1. Fig. 2E-F: the authors argue that the mean myosin concentration in lateral cells at certain times is equivalent to that of central cells earlier in the invagination process. However, the fraction of apical surface area covered by myosin network is consistently lower for lateral cells (and also for central cells that remain unconstricted!). Have the authors considered this fact, and if not, why? Wouldn't this explain, at least in part, why some cells constrict and others do not, if medial myosin networks drive the disassembly of the apical surface?

      We believe in fact that this is precisely part of the picture and it was what we had meant to propose, but the text was perhaps indeed just to condensed. Thus, we had stated in line of the original document:

      “While the asymmetry is visible in all cell rows, there are larger areas without myosin and the distance of displacement is greater in lateral cells (Fig. 2G-J)”,

      and in the discussion (line 277 – 285):

      “Despite the homogeneous actin meshwork in stretching cells, the areas that are free of active myosin occupy a large proportion of the apical surface – similar to ectodermal or amnioserosa cells in which the connection of pulsatile foci to the underlying actin meshwork is lost. ... Dilution of cortical myosin may compromise a cell’s ability to make sufficient physical connections, in particular along the dorso-ventral axis, so that even if sufficient force is generated, it cannot shorten the cell in the long dimension. In other words, even though the cells have enough myosin to create force, the system is not properly engaged and its force is not transmitted to the cell boundary.”

      However, we didn’t state this with sufficient clarity in the results section and have added an extra sentence to this effect.

      If myosin activity were increased in laterals cells once central cells begin constricting, would that lead to an increased fraction of lateral cell surfaces covered by actomyosin networks and to reduced lateral cell elongation?

      This is a really nice experiment, and we have indeed tried to induce activation at later time points, but unfortunately this did not yield unambiguous results. If we did the manipulation after the central cells had clearly constricted, then activating lateral cells did not lead to their contraction. However, since this is a negative result and we have no independent criterion for knowing how 'strong' the induced contraction was (as explained above, we are unfortunately not able to visualize the myosin in these experiments), and why it might not have been sufficient to overcome the pull from central cells.

      In this context it is worth remembering that in mutants in which myosin is overactivated as a result of defective upstream signalling, lateral cells stretch less or not at all. See PMID: 24026125 for gprk2 mutants and our own results for active Rho1:

      {{images cannot be displayed}}

      Figure: Confocal Z-section of embryos expressing sqh::GFP (myosin; green) and GAP43::mCherry (membrane; magenta) imaged ventrally. A constitutively active form of Rho1 is ectopically expressed using a maternal Gal4 driver, inducing activation of myosin in more lateral cells. White dots mark the mesectoderm determined by backtracing after ventral furrow invagination. Yellow arrows in B are constricted cells in row 7/8.

      Minor

      1. Image panels are missing scale bars in many figures. > 2. Fig. 1C'-D': The authors should include a color bar to provide some indication of the scale of the apical areas measured. Same comment for other figures in which apical area is color-coded.

      We have added the missing elements

      1. Supp. Fig. 2E-F, G-H and Supp. Fig. 6: what is the difference between myosin intensity and myosin concentration? Junctional vs medial localization? Or summed vs mean pixel value? Please be specific, the difference between intensity and concentration is not clear.

      In the cases where we talk about myosin ‘amount’ we have now exchanged the term ‘intensity’, i.e the physical term for the amount of light, for ‘amount’ (i.e. that for which we use the light intensity as a proxy) and have explained in the main text how we define total apical myosin amount and apical myosin concentration (amount over area). However, in the cases where we are describing the actual image analysis, as in Suppl. Fig. 3, we use ‘intensity’ as the term of art that is used for the methods employed here. Similarly, the terms ‘sum intensity’ and ‘mean intensity’ are terms used for image in analysis in Fiji.

      The definitions of “junctional” and “medial” actin were introduced by the Lecuit lab (PMID: 21068726), and we have included the appropriate reference.

      1. Line 118: Supp. Fig. 2 does not have panels I and K. > 5. Line 223: the authors reference data at sec, but Supp. Fig. 6 does not show any images at that time point. They should be added or a different time point indicated.

      These errors have been corrected.

      Typos

      1. Abstract: "[in a supracellular context" should be "in a supracellular context". > 2. Line 145: should this be a reference to Supp. Fig. 5 instead of Supp. Fig. 4? > 3. Line 166: I am not sure how Supp. Fig. 5 supports this statement. Is this the right figure reference? Should it be Supp. Fig. 4 instead? > 4. Line 881: "representing on line" should be "representing one line".

      These errors have been corrected.

      Optional

      Tony Harris' lab showed that the Arf-GEF Steppke antagonizes myosin and facilitates cell deformation at the leading edge of the embryonic epidermis during Drosophila dorsal closure (West et al., Curr Biol, 2017). Does Steppke localize to junctions in lateral but not central mesoderm cells? Does the pattern of Steppke localization in the mesoderm change with manipulations to the contractility of central cells?

      This is certainly interesting, and we have ordered the protein trap, UAS constructs and RNAi lines. However, these will be long-term and time-consuming experiments.

      Significance:

      This is an interesting study, and one that makes uses of beautiful tools, including quantitative microscopy and image analysis, mathematical modeling and optogenetic manipulations. The prediction that embryonic cells display non-linear stress-strain responses is exciting, as linearity has been the predominant assumption so far. However, I find that model predictions are not well supported by the data, and that alternative interpretations of some results are possible. Additionally, the paper lacks insight into the molecular mechanisms that facilitate stretching (although that could be the subject of a follow-up study).

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary:

      In this study, the authors explore potential mechanisms for why some cell constrict while other cells expand, despite similar intrinsic genetic programs, during Drosophila ventral furrow formation at the onset of gastrulation. The authors combine quantitative analyses of cell shapes and myosin levels from multiphoton confocal and Multi-View SPIM imaging, optogenetic and laser perturbation experiments, and mechanical models to argue that nonlinear mechanical interactions between cells are required to explain the cell behaviors. Based on microscopic models of the actomyosin cytoskeleton in the tissue the authors argue that the required nonlinear mechanical behavior is consistent with actomyosin network reorganization.

      Major comments:

      • Although the area of investigation is exciting and the results are interesting, unfortunately the quality of the results and comparison between experiment and modeling in the current version of the manuscript are not convincing. Although it is not clearly explained in the manuscript, the experimental results on cell shapes, myosin intensity, laser manipulation, optogenetic perturbations appear to be from a single embryo or small number of embryos for each experiment (Figures 1, 3, 4).

      We had analysed a much larger number of embryos, but only included those for presentation that provided the most extensive data. It is extremely difficult to obtain absolutely ‘perfect’ embryos at high resolution for full quantification over long periods. ‘Perfect’ means that the embryos are mounted in such a way that they are imaged from an angle of 45 degrees off the dorso-ventral axis, so that initially mesodermal rows 3 to 7 are seen, and then, as furrow formation progresses, the more lateral rows move through the field of vision. It is difficult to mount in this perfect manner for two reasons: the shape of the embryo means that the embryo does not ‘like’ to be balanced in this position, but instead prefers to fall back on its side. Secondly, the embryo has to be mounted at a time point before visible differentiation along the D-V axis, so no visual cues exist to get the positioning right. This means that many of our recordings lack either the more ventral or the lateral cell rows. While the findings for these more restricted observations are fully consistent with our reports, they cannot be quantified with a full comparison across all cell rows over the entire imaging period. Nevertheless, we have processed and analysed further examples which we have now included in Suppl. Fig. 2 and Suppl. Fig. 8.

      The authors state that the cell stretching pattern "was best recapitulated by a superelastic response", but did not provide direct quantitative comparisons of the different mechanical models to the experimental data to clearly demonstrate this.

      Data that illustrate this were shown in Suppl. Fig 5 – but, admittedly, were not well explained, or rather, not at all. We have now added better explanations, expanded the figure, included new analyses, and now present some of these data in the new Fig. 2. Briefly, the figure shows that superelastic and elastoplastic responses are the only curves that successfully reproduce the pattern of stretching lateral cells (last 3 cells stretching with the inner cell stretching most and the last cell stretching least) while at the same time matching the ratio between the cell sizes of the most stretching cells to the least stretching cell.

      The top row of the parameter scans in Suppl. Fig. 5 (now Fig. 2) shows how many cells stretch for each combination of myosin curve steepness (y-axis) and width (x-axis) with shades of blue indicating the number of cells, and the red outline in the field where 3 cells stretch outlining those conditions where the inner cell stretches most. The bottom row shows the resulting size ratios of largest to smallest cell. High ratios in the region outlined in red in the top row are only reached for the superelastic and elastoplastic responses, with the elastomeric tending in the right direction.

      We have now also quantified a goodness-of-fit (root mean squared error, RMSE) measurement between our experimental data and the simulated data of all our models. This is shown now in the new Fig. 2.[1]

      We also note that only the parameter maps of the superelastic and elastoplastic models (Fig. 2J,K) resemble the equivalent parameter maps of the microscopic model (Fig. 3Q).

      Moreover, the local optogenetic myosin recruitment experiments in Figure 4 do not provide sufficient information on optogenetic tool recruitment,

      We have included images that illustrate the optogenetic construct in the illuminated cells, but not in the central cells in Suppl. Fig. 8. It is impossible to show the construct in the ‘dark’ cells, because illuminating them would activate the construct.

      myosin localization,

      As explained above, this is unfortunately technically not feasible. The best we can do is refer to the description of the construct by Izquierdo et al. (PMID: 29915285), which shows the accuracy of the tool and the highly specific membrane recruitment of myosin.

      or cell behaviors

      We have added quantitative comparisons between the experimental and control areas. to justify the claim that the central cells are not activated by the optogenetic perturbation and are only responding to the forces from neighboring cells.

      • The authors should provide direct quantitative comparisons of the models and experiments to clearly demonstrate their claims that the superelastic model is better than the linear model or other nonlinear models.

      See response above.

      • The authors should do additional experiments and/or provide more details for the existing experiments (to include several embryos per condition) on myosin quantification, photo-manipulation, and optogenetics experiments.

      We have provided data for more embryos for all cases.

      Additional controls would like be necessary for claims resulting from the optogenetics experiments in Figure 4.

      This has been addressed above – we have provided additional data and controls.

      • The additional time and resources required to address these concerns would depend on the experimental details, N values, and statistics in the current studies, which unfortunately were not described in the current manuscript.

      We have been able to add substantial additional data and have added the requested numbers. For many of the experiments each recording can be very time consuming and for the reasons explained in this response, it is not always easy to obtain precisely the desired recording from the desired imaging angle with the manipulations having been done precisely in the desired position. The numbers of embryos are therefore not high, but multiple shorter recordings provide a body of results that support the findings, but are not easily comparable statistically.

      • Methods descriptions for reproducibility are generally adequate, with the exception of N values and statistics

      See above.

      • Are the experiments adequately replicated and statistical analysis adequate?

      No, see above.

      Minor comments:

      1) Scale bars for images are missing throughout.

      We have added these

      2) Number of embryos and cells analyzed missing throughout text and figure legends. We have added additional embryos for all conditions and have included the numbers of cells analysed for all quantifications (except in cases where each data point represents a cell).

      3) Units are missing for many quantities in figures and tables throughout.

      We have added these

      4) Many figure references in the main text are incorrect, pointing either to the wrong figure or wrong figure panel.

      These have been corrected

      5) Line 728. What time point was used for myosin concentrations used in the model?

      We have added this information to the figure legend.

      How might myosin dynamics influence these findings?

      As regards the subcellular dynamics of myosin, these are included in the microscopic model (see ref Belmonte et al.;PMID: 28954810). Preliminary results showed that small changes in myosin stall force and unloaded myosin speed have little effect in our general results. This is now shown in a new supplemental figure (Suppl. Fig. 6). However, if the referee is referring to the dynamics of myosin accumulation over time, this is an interesting question.

      We had begun to explore this topic, but then realized for the linear stress-strain model that it is in fact expected that myosin accumulation would ultimately not affect the outcome. This is because in a linear model the final state of the system is determined by the final shape of the governing myosin profile regardless of the time evolution of the profile, and our simulations confirm this. A systematic analysis for all other stress- strain curves with temporal changes in myosin profiles (where a dependency on the profile temporal evolution is expected) is very time-consuming and will be interesting to pursue in future.

      The main conclusion here that linear models do not recapitulate the observed data as well as the non-linear ones stands regardless of how the temporal dynamics of myosin accumulation may affect the non-linear systems.

      6) The authors show a few examples of myosin pulsing in lateral cells and then conclude that myosin pulsing is not qualitatively different from central cells (lines 135- 136). The author should quantify the number of pulsing lateral cells as well as period and amplitude of pulsing, or discuss relevant results from prior studies in more detail to justify this conclusion.

      By ‘not qualitatively different’ we had meant only ‘in the sense that they are capable of generating contractile forces’, and we have made that more explicit in the text now. The quantitative differences have already been analysed and reported by the Martin lab (https://doi.org/10.1101/2020.04.15.043893; the pulses are slower and less persistent), and our point was that in spite of these known differences, the pulses are able to mediate constriction.

      7) Lines 145-150. The authors very briefly describe the results of the linear-stress strain response and conclude this did not yield outputs corresponding to in vivo data and leave this largely to the supplementary figures. This is a key point in the paper and deserves much more discussion and space in the main text.

      We have included a more extensive description and interpretation of the results in the main text, as detailed in several responses above

      As mentioned in main comments above, a quantitative comparison of the different mechanical models to show that the superelastic model better describes the observations should be included (potentially as an inset to Fig 2D showing a quantitative measure of the quality of model fit to the data).

      These comparisons have now been expanded and explained more extensively and moved to the main Figures.

      8) Lines 162-163. Provide more rationale for why strain-softening would most likely manifest as permanent or reversible cytoskeletal reorganization.

      The only component of the cell that can likely mediate this physical property and also respond at the observed time scales is the cytoskeleton. In these cells it is the main mechanical determinant. Other components that could in principle contribute to the nonlinearity of stress-strain response might be the viscosity of the cytosol, or the plasma membrane. However, stress responses of fluids to shear are usually in the direction of increasing stiffness, and rarely, if ever, with shear thinning. The same is mostly true for colloidal solutions. Therefore it is more likely that the stress-strain relationships at the apical surface of the cells are dominated by the dynamics of the actin cytoskeleton given that even the shape of the plasma membrane is in general determined by the cytoskeleton. We have added a note to this effect in the text.

      9) Lines 187-188. "This shows that forces acting on each cell from its neighbors have an important role in determining the cell's behavior." This seems somewhat obvious; perhaps a bit more explanation would help the reader to understand the importance of these results.

      We have expanded the explanations of these findings and added a sentence to relate them to the main model of the paper

      10) Lines 196-198. How were the concentrations and lengths of F-actin chosen? How were the concentration and properties of linkers chosen?

      The parameters were chosen on the basis of our earlier studies on simulated contractile meshworks and the theory underlying their behaviour. We had reported the conditions under which such meshes are able to contract, and also shown that the underlying theory correctly predicts behaviour of experimental meshworks (for those few conditions for which they have been reported).

      Unfortunately, there are practically no measurements for the length of F-actin filaments in vivo and estimates vary widely. Reliable data on the density of the cortical network are equally sparse.

      Based on our own previous work we chose concentrations of cross-linkers, myosin motors and transmembrane connectors that are able to ensure optimal contraction and force. Our in vivo measurements reported here show that the amounts of F-actin do not vary significantly across the mesoderm, so we used the same concentration of actin, crosslinkers and membrane connectors in all cells of the model, varying only myosin concentration. Taking into account the cell diameter of the mesodermal cells (~7um) and to ensure that the meshwork is sufficiently cross-connected (dense) to generate contraction and transmit forces between cells we used a model where each cell contains F-actin filaments of 1.5 um.

      We have expanded our supplemental material to make these points clearer.

      How sensitive are the results to these details of the cytoskeletal composition?

      We varied both the amounts of cytoskeletal components and the parameters controlling their dynamics (such as myosin stall force and viscosity) and found little impact on model predictions. These data are now presented in Suppl Fig. 6.

      11) Lines 238-244. It would be helpful to include some additional quantification that clearly shows the reader the differences in cell behaviors in control and perturbed tissue.

      We have added quantitative comparisons of the cells in the perturbed region with cells in an equivalent control region, together with evaluations of two additional embryos.

      For the optogenetics experiment, it would be important to show quantification that the lateral cells are not being directly perturbed during photoactivation of neighboring cells (e.g. due to light leakage).

      We have included this information, as described above.

      In both perturbations, it would be helpful to quantify how many cells in rows 7 and 8 constricted and by how much did they constrict? How reproducible were these effects?

      The perturbation experiments were those where it was most difficult to obtain a large number of identical-looking embryos that would allow broad statistics to be applied. For this to work, we would have to have embryos that were identically mounted and illuminated in the identical area of precisely rows 1 to 6 on each side of the midline – at a resolution of one cell row of 6.2 um width. And all this blind, because at the start of the manipulation there are no visual cues for orientation. Morphology gives no cues at this stage. The MS2-MCP-GFP works for laser ablations, but cannot be used for the optogenetics, because the embryo must not be exposed to blue light. This means we cannot predetermine precisely which rows we target.

      We have however added data and quantifications for the control and two further laser- manipulated embryos, which are now shown in suppl. Fig. 8. It is evident from both that our perturbations were slightly asymmetric and included the outer rows on only one side and on that side several cells that would normally have stretched are now strongly constricted. While by no means true for all lateral cells, this is a case of one black swan disproving the hypothesis that all swans are white: any constricting cell within two cell diameters of the mesectoderm, i.e. ones that would normally stretch proves that lateral cells do have the capacity to constrict.

      12) Lines 245-252. A key assumption in interpreting this experiment seems to be that the central cells are not directly perturbed by the optogenetic activation. Additional quantifications of RhoGEF2-CRY2 and/or myosin should be shown to support this.

      We have included an image of the optogenetically activated construct in this experiment in Fig. 5, but we cannot show its behaviour in the non-activated part because if we illuminated it, it would be activated. We were unable to create the embryos necessary to document the behaviour of myosin.

      It would be helpful to include some additional quantification that clearly shows the reader the differences in cell behaviors in control and experimental regions. How reproducible were these effects?

      We now provide the results from two additional embryos in Suppl. Fig. 8, and include quantitative comparisons between the control and experimental regions for these and for the embryos that are currently shown in Fig. 5 E.

      13) A section on statistics is missing from the methods section.

      We have added descriptions of the quantifications and statistics.

      14) Line 615. Ensure that Eq. 1 is dimensionally consistent; crucially, what units are used for 'M'? If the model is non-dimensionalized, provide the reference scales.

      Apart from the initial distance between membrane positions (set to 6.2 um) all other units in our visco-elastic model are arbitrary. In order to make this clearer, instead of using the term “viscosity” in equation 1, we now call it a “damping constant”.

      15) Line 675: The investigated stress-strain relationships are presented in Table S1. What are the definitions of xpl and xsh?

      We have included these definitions in materials and methods:

      All stress-strain curves are linear for extensive strains (∆𝑥) lower than the proportionality limit (𝑥!"), with some curves (elastoplastic and superelastic) undergoing a strain-softening to strain-hardening change after a given strain-hardening limit (𝑥#$).

      16) Line 678: Parameter values for the stress-strain relationships are given in Table S2. Can you provide more information on how these values were selected and their units? How sensitive are the results to changes in these values? Provide references when possible.

      The values for xpl and xsh were chosen to be within the range of the observed lengths of stretching cells, with xpl < xsh. Changing the values of each parameter listed in Table S2 does change the results quantitatively, but over the ranges we tested them, never to the point of making the linear or the other non-linear models reproduce the target pattern of stretching.

      We have stated this in the materials and methods section.

      17) Line 697. Please comment on why the embryo appears skewed to the right. Embryos are not always ‘perfect’, unfortunately. In addition, they can get slightly squashed during mounting and imaging. In spite of its imperfection, we showed this particular one, because we had imaging data for a long period without drift or other interference, and with good contrast at great depth.

      18) Line 712. A color-bar corresponding to this color-code is missing in the figure.

      This has been corrected.

      19) Lines 715-717. It seems panels E and E' are swapped in the legend.

      corrected

      20) Line 724 (Fig 2). It is difficult to read anything in panel K inset or Panel L inset.

      We have rearranged this figure and replaced some panels for greater clarity, and to remove redundancy.

      21) Line 728. What does "embryo 1" refer to?

      This was a remainder from an old plan where each embryo was numbered and listed in a table so that it could be cross-referred to. We have now described in the supplementary table the genotypes and imaging technique for each group of embryos. Where we show data or analyses of the same embryo in different figures, we refer directly to the relevant panels. We have made sure the embryos are referred to correctly in the figure legends.

      22) Line 732. A quantitative measure of the quality of the fits of the models to the experimental data should be included.

      We have done this, and the new data are now included in the new Figure 2.

      23) Line 739. What exactly does "Embryo 2" refer to?

      See comment 21

      24) Line 779. Why is a z-plane of 15 microns below surface chosen? > 25) Line 797. Why is a z-plane of 25 microns below the surface chosen?

      The planes were chosen in each case to show the reader in one single plane rows 7 and 8 along with the central cells > 26) Line 900. Panel G in Supp Fig 5 is not described in figure description.

      The panel captions were wrongly numbered. This has now been corrected, and more information on this figure has been included in the text. > - Are prior studies referenced appropriately?

      Yes.

      • Are the text and figures clear and accurate?

      No (see details listed above).

      • It would be very helpful to the reader to show direct quantitative comparison of the different mechanical models with the experimental observations to show how much better the nonlinear model is compared to the linear model.

      We have included this.

      An extended explanation of experiments and experimental results within the main text would improve the manuscript.

      We have expanded our explanations in many places.

      Significance:

      The key advance in this work is in identifying a potential role of nonlinear mechanical properties in contributing to distinct cell behaviors within a tissue during development in vivo. This contributes to a growing body of work highlighting the importance of cell and tissue mechanical properties in regulating cell behaviors during the formation of tissue structure.

      This work adds to a growing body of work connecting actomyosin contractility in cells to tissue-scale behavior during development. This work provides a unique mechanical modeling perspective to the study of apical constriction during Drosophila ventral furrow invagination, highlighting a potential role for superelastic cell mechanical behaviors during morphogenesis in vivo.

      The finding would be of interest to researchers working in the areas of morphogenesis, mechanobiology, the cytoskeleton, and active matter.

      This reviewer's expertise is in experimental studies of the cytoskeleton and cell mechanics during morphogenesis.

    1. “Oh! dear, there are a great many people like me, I dare say, only a great deal better. Good morning to you.” “But I say, Miss Morland, I shall come and pay my respects at Fullerton before it is long, if not disagreeable.” “Pray do. My father and mother will be very glad to see you.” “And I hope — I hope, Miss Morland, you will not be sorry to see me.” “Oh! dear, not at all. There are very few people I am sorry to see. Company is always cheerful.” “That is just my way of thinking. Give me but a little cheerful company, let me only have the company of the people I love, let me only be where I like and with whom I like, and the devil take the rest, say I. And I am heartily glad to hear you say the same. But I have a notion, Miss Morland, you and I think pretty much alike upon most matters.” “Perhaps we may; but it is more than I ever thought of. And as to most matters, to say the truth, there are not many that I know my own mind about.” “By Jove, no more do I. It is not my way to bother my brains with what does not concern me. My notion of things is simple enough. Let me only have the girl I like, say I, with a comfortable house over my head, and what care I for all the rest? Fortune is nothing. I am sure of a good income of my own; and if she had not a penny, why, so much the better.” “Very true. I think like you there. If there is a good fortune on one side, there can be no occasion for any on the other. No matter which has it, so that there is enough. I hate the idea of one great fortune looking out for another. And to marry for money I think the wickedest thing in existence. Good day. We shall be very glad to see you at Fullerton, whenever it is convenient.” And away she went. It was not in the power of all his gallantry to detain her longer. With such news to communicate, and such a visit to prepare for, her departure was not to be delayed by anything in his nature to urge; and she hurried away, leaving him to the undivided consciousness of his own happy address, and her explicit encouragement. The agitation which she had herself experienced on first learning her brother’s engagement made her expect to raise no inconsiderable emotion in Mr. and Mrs. Allen, by the communication of the wonderful event. How great was her disappointment! The important affair, which many words of preparation ushered in, had been foreseen by them both ever since her brother’s arrival; and all that they felt on the occasion was comprehended in a wish for the young people’s happiness, with a remark, on the gentleman’s side, in favour of Isabella’s beauty, and on the lady’s, of her great good luck. It was to Catherine the most surprising insensibility. The disclosure, however, of the great secret of James’s going to Fullerton the day before, did raise some emotion in Mrs. Allen. She could not listen to that with perfect calmness, but repeatedly regretted the necessity of its concealment, wished she could have known his intention, wished she could have seen him before he went, as she should certainly have troubled him with her best regards to his father and mother, and her kind complimen

      she thinks he wants to marry her. But this could be harmless flirting? she worries about intention

    1. My dearest Catherine, I received your two kind letters with the greatest delight, and have a thousand apologies to make for not answering them sooner. I really am quite ashamed of my idleness; but in this horrid place one can find time for nothing. I have had my pen in my hand to begin a letter to you almost every day since you left Bath, but have always been prevented by some silly trifler or other. Pray write to me soon, and direct to my own home. Thank God, we leave this vile place tomorrow. Since you went away, I have had no pleasure in it — the dust is beyond anything; and everybody one cares for is gone. I believe if I could see you I should not mind the rest, for you are dearer to me than anybody can conceive. I am quite uneasy about your dear brother, not having heard from him since he went to Oxford; and am fearful of some misunderstanding. Your kind offices will set all right: he is the only man I ever did or could love, and I trust you will convince him of it. The spring fashions are partly down; and the hats the most frightful you can imagine. I hope you spend your time pleasantly, but am afraid you never think of me. I will not say all that I could of the family you are with, because I would not be ungenerous, or set you against those you esteem; but it is very difficult to know whom to trust, and young men never know their minds two days together. I rejoice to say that the young man whom, of all others, I particularly abhor, has left Bath. You will know, from this description, I must mean Captain Tilney, who, as you may remember, was amazingly disposed to follow and tease me, before you went away. Afterwards he got worse, and became quite my shadow. Many girls might have been taken in, for never were such attentions; but I knew the fickle sex too well. He went away to his regiment two days ago, and I trust I shall never be plagued with him again. He is the greatest coxcomb I ever saw, and amazingly disagreeable. The last two days he was always by the side of Charlotte Davis: I pitied his taste, but took no notice of him. The last time we met was in Bath Street, and I turned directly into a shop that he might not speak to me; I would not even look at him. He went into the pump–room afterwards; but I would not have followed him for all the world. Such a contrast between him and your brother! Pray send me some news of the latter — I am quite unhappy about him; he seemed so uncomfortable when he went away, with a cold, or something that affected his spirits. I would write to him myself, but have mislaid his direction; and, as I hinted above, am afraid he took something in my conduct amiss. Pray explain everything to his satisfaction; or, if he still harbours any doubt, a line from himself to me, or a call at Putney when next in town, might set all to rights. I have not been to the rooms this age, nor to the play, except going in last night with the Hodges, for a frolic, at half price: they teased me into it; and I was determined they should not say I shut myself up because Tilney was gone. We happened to sit by the Mitchells, and they pretended to be quite surprised to see me out. I knew their spite: at one time they could not be civil to me, but now they are all friendship; but I am not such a fool as to be taken in by them. You know I have a pretty good spirit of my own. Anne Mitchell had tried to put on a turban like mine, as I wore it the week before at the concert, but made wretched work of it — it happened to become my odd face, I believe, at least Tilney told me so at the time, and said every eye was upon me; but he is the last man whose word I would take. I wear nothing but purple now: I know I look hideous in it, but no matter — it is your dear brother’s favourite colour. Lose no time, my dearest, sweetest Catherine, in writing to him and to me, Who ever am, etc.

      This bears resemblance to Fantomina where Beauplaisir grows bored of Fantomina after he has had raped her and had his fun. From this it seems pretty clear Frederick broke up with her. Do you think Frederick was bored of her because he received sex? Or did he do this to punish her?

    1. istorical ecological studies can provide a baseline on which to design biodiversity recovery strategies and conservation goals. Ma

      An example I think of is the catacombs in Europe. Something we do not think about is how unsustainable and awful burying people after they die is for the environment. It causes deforestation, contaminates soils, and uses a lot of resources. I have been to Italy twice, the second time I was able to go into the catacombs and learn about their history. Not a lot of people know this, but there are catacombs stretching throughout all of Europe, a lot being under Rome. There are dozens of levels of catacombs, all stacked on top of one another under the ground. Most we can not access because they have flooded or are not structurally sound. Rome is a large city, but Ancient Rome is not to the naked eye. There are layers upon layers of buildings under current day buildings that have been covered in sediment due to that power river. I went into a building in ancient Rome that had 3 buildings underneath it. The one at the very bottom was from before 100 B.C. So you can imagine, there were ALOT of people in Rome. They did not have much space and this river posed a threat to their crops. They couldn't afford to bury people like we do today, they had to think of something clever. So they stacked cemeteries on top of one another to reduce the amount of land area affected by the bodies. The hallways of the catacombs were designed for people to visit past family members, but that did not last long because of the smell of decomposition (yummy). So the hallways of the catacombs are extremely small and narrow, I am 5 4 and had to duck the entire time. They have holes cut into the walls where bodies were laid to rest, multiple right next to one another to take up as little space as possible.

      This is an example of something we could take inspiration from when it comes to sustainable practices. The catacombs had their own issues, but some aspects of their design may be useful for the future of cemeteries. They reduced deforestation, which is something cemeteries are very bad about. Maybe we should start to stack our cemeteries similar to them to reduce the amount of acres cleared for the dead. We can do the same with the findings of this and other papers involving ancient civilizations. Learn from the past.

    1. more important than working to become “com-petent” in the cultures of those with whom we work and interact

      I really like their definition of cultural humility and find it interesting that there is a discussion on if cultural humility or competency within a particular culture is more important. This kind of reminds me of the "jack of all trades" vs. "expert in a field" discussion and veterinary careers. As a veterinary student we have a very diverse background of knowledge to be able to address a wide variety of conditions and diseases. Some individuals decide to pursue board certification and expertise in a particular field of veterinary medicine. Even though a veterinarian may be boarded in cardiology and primarily see cardiac patients, it is important to remember other systemic conditions that can lead to similar symptoms/ lesions and to be able to address them accordingly. From this stand point I think that having proper cultural humility and competency within particular cultures that you work and interact can be equally important. If you predominantly work within a certain culture, it is natural to become more competent to interacting with individuals from that culture. At the same time it is important to maintain culture humility so you can properly and respectfully address people from all cultures.

    1. Feedback from the faculty teaching team after teaching for almost 8 weeks is how to template and simplify space for students to use, here is a direct quote: “could we create dedicated blog page for students that would be a pre-made, fool-proof template? When a student’s WordPress blog does not work and we can’t fix the problem, it is very frustrating to be helpless beside an exasperated student.”

      There may be a bit of a path forward here that some might consider using that has some fantastic flexibility.

      There is a WordPress plugin called Micropub (which needs to be used in conjunction with the IndieAuth plugin for authentication to their CMS account) that will allow students to log into various writing/posting applications.

      These are usually slimmed down interfaces that don't provide the panoply of editing options that the Gutenberg interface or Classic editor metabox interfaces do. Quill is a good example of this and has a Medium.com like interface. iA Writer is a solid markdown editor that has this functionality as well (though I think it only works on iOS presently).

      Students can write and then post from these, but still have the option to revisit within the built in editors to add any additional bells and whistles they might like if they're so inclined.

      This system is a bit like SPLOTs, but has a broader surface area and flexibility. I'll also mention that many of the Micropub clients are open source, so if one were inclined they could build their own custom posting interface specific to their exact needs. Even further, other CMSes like Known, Drupal, etc. either support this web specification out of the box or with plugins, so if you built a custom interface it could work just as well with other platforms that aren't just WordPress. This means that in a class where different students have chosen a variety of ways to set up their Domains, they can be exposed to a broader variety of editing tools or if the teacher chooses, they could be given a single editing interface that is exactly the same for everyone despite using different platforms.

      For those who'd like to delve further, I did a WordPress-focused crash course session on the idea a while back:

      Micropub and WordPress: Custom Posting Applications at WordCamp Santa Clarita 2019 (slides)

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer 1:

      I think the experiments within the manuscript are generally of good quality and well controlled.

      We would like to thank the reviewer for the appreciation of our work.

      ...However, I find that the authors' conclusions are very often not supported by the experiments performed (as detailed below) and I would strongly recommend that the authors stick to the conclusions that can be drawn based on the data they have generated. In my opinion, this manuscript contains findings that are of interest to the field but it needs to be rewritten with more justifiable conclusions.

      We have extensively rewritten the manuscript and toned down the role of the HMR/LHR complex in hybrids while emphasizing its role in Drosophila melanogaster.

      1) 'Speciation Core Complex' - The only link to speciation is the fact that the 'SCC' includes D.melanogaster HMR, a known hybrid incompatibility gene. On the other hand, all of these proteins have important functions in a pure species context and all of the interactions reported between the members of the SCC occur in a D.melanogaster background. Also, SCC assembly in viable/inviable hybrids is not tested. Essentially, I would come up with a different and more functionally consistent name for the complex. I highly recommend against naming these stable interactors as the 'SCC' unless the authors can show that mutating any of the other 'SCC' proteins (specifically NLP, NPH, BOH1 & BOH2), which should presumably also disrupt SCC formation, leads to the rescue of hybrid male lethality?

      We agree with the reviewer that we base the naming of the complex on the presence of the products of the two known hybrid incompatibility genes Hmr and Lhr. As we did not investigate the complex’ composition in hybrids we agree with the three reviewers that the term SCC is probably misleading. We also agree with the reviewer that it would be highly interesting to investigate whether NLP, NPH, BOH1 or BOH2 mutations also rescue hybrid male lethality. However, we would need to generate fly lines carrying mutations in both the D.mel and the D.sim alleles since the respective genes are autosomal and we feel that this would be beyond the scope of the manuscript. Moreover, such assays would only be possible it those genes are non-essential and not like Nlp, of which the available hypomorphic or deletion alleles are homozygous lethal (**Padeken, J. et al. (2013)**).

      2) Is it a stable 6-membered complex? - The only line of evidence for the presence of a stable complex between all 6 proteins are the MS data from Figure 1C and Figure S1A-C. Although I don't think it is necessarily required, a biochemical demonstration that these proteins co-sediment at a high MW would be a much stronger indication of complex formation. That being said, I think the authors can use their expertise in AP-LC/MS to more comprehensively characterize complex formation.

      Besides the fact that we observe all six components in AP-MS experiment using either one of the subunits, we have also shown in our previous experiments (Thomae et al, 2013) that all subunits can be purified by a tandem purification using first an antibody against FLAG-HMR followed by a Myc-LHR antibody. We also tried to purify the HMR complex via size exclusion chromatography to determine the size of the complex as suggested by the reviewer. Unfortunately, we did not manage to isolate enough of the complex in a soluble form that allowed us to detect a single peak on a size exclusion column. This may be either due to a disassembly of the complex during the unavoidable dilution during SEC or a lack of antibody sensitivity. We also tried to reconstitute the entire complex from recombinantly expressed proteins but failed to express all subunits in a soluble form. It is worth mentioning that a similar observation has been made, for example, for the Dosage Compensation Complex, which, despite being well characterized, has also eluded a characterization using size exclusion chromatography.

      a) For example, the authors could test whether loss of BOH1/BOH2 in S2 cells impacts complex formation. A reduction of interactions between other complex members would strengthen the authors' conclusion of a stable and stoichiometric 6-membered complex.

      Based on our observation that HMR and LHR form a stable heterodimeric complex in vitro (Figure S4) we assume that the presence or absence of the other components does not affect the complex composition in its entirety. The experiment suggested by the reviewer would allow us to distinguish between direct and indirect interactions between BOH1/2 and HMR. Though this is clearly a very exciting approach, RNAi mediated knock downs are rarely complete in S2 cells, making such experiments difficult to interpret. Therefore, these experiments would need to be supported by reconstitution of the different complexes in vitro and potentially crosslinking MS experiments. Such extensive molecular analysis would very likely require at least 6 month to be completed and would be beyond the scope of the current manuscript.

      1. b) Additionally, I would suggest that they use one (or more) of BOH1/BOH2/NLP/LHR as baits in the S2 cells expressing HMR mutations (HMR2 and HMR DC, Figure 3) to test complex formation. Beyond Figs. 1 and S1, the authors only test one-way interactions between HMR (or HMR mutants) and the other 5 binding partners. It is unclear if the other 5 'SCC' members are capable of binding each other when HMR is mutated. As a result, how HMR affects the ability of other proteins to interact with each other and its role in complex formation remains somewhat unclear. This is particularly important since the authors conclude in the discussion that "HMR acts as a molecular bridge between different modules of the SCC" and that "the integrity of the SCC is essential for its function".

      Similar to our answer to the reviewer’s suggestion above, we believe that this experiment requires an additional extensive molecular analysis to be meaningful, which is beyond the scope of the current manuscript. It is important to clarify here that the S2 cells we use still express endogenous full length-HMR, which could participate in complex formation even when Hmr mutant alleles are expressed. To unambiguously show that BOH1 and BOH2 still interact with the other complex components when they no longer associate with HMR, we would therefore need to generate a CRISPR based exchange of all HMR genes in SL2 cells with a mutated version of HMR and analyze their interaction partners. As both alleles fail to fully rescue HMR functionality in a deletion background and as we have shown previously that a removal of HMR results in mitotic defects, it may not even be possible to generate such cell lines.

      3) Centromeric vs heterochromatic localization of HMR - There appears to be some differences between Hmr localization across different tissues as the authors have noted in their introduction. In this manuscript, the authors assess HMR localization in S2 cells as well as mitotic and endocycling follicle cells from various stages of oogenesis. In these cell types, the authors compare HMR localization to both Cenp-C (centromere) and HP1 (constitutive heterochromatin). In my opinion, it is not easy to get a clear perspective on what the authors consider to be HMR's true localization in these cells and tissues. I would recommend the following straightforward changes/experiments related to this point,

      a) Label the image categories in Figure 4A. Please also describe in detail the classification criteria were used to separate these image categories from one another.

      In the revised manuscript we will label the image categories in Figure 4A. An extensive description on how the classification criteria were applied can be found in the methods section.

      b) I would also move Figure S7A to the main text since it demonstrates centromeric colocalization of HMR in early follicle cells.

      In the revised manuscript we will move **figure S7A to a new figure 5C. We have furthermore investigated the localization of endogenous HMR in various cell types in ovaries, which is going to be included in the revised manuscript as a new figure 5A.

      c) Use linescans on existing images to better demonstrate colocalization between Hmr and Cenp-C and/or HP1

      In the revised manuscript we will prepare linescans/profile plots for all IF pictures when necessary.

      d) Show Cenp-A and HMR staining for the images in Figure 5C and stage 10 follicle cells from Figure S7A.

      As stainings with the Cenp-C antibody resulted in more stable and reproducible signals, we used Cenp-C as a proxy for Cenp-A and centromere localization. In Figure S7A and B we stained Cenp-C and showed a greatly reduced expression in follicle cells undergoing endoreplication. We therefore did not perform a Cenp-C (or Cenp-A)/HMR co-staining in these cells and do not think it would add to a better understanding of the mechanisms of HMR locaization (Figure 5C).

      e) I feel the authors do not spend enough time discussing the fact that HMRDC still appears to localize to centromeres at most follicle cells upto Stage 7.

      We now also include the staining of endogenous HMR (figure 5A revised ms) in the various cell types in ovaries. This allows us to expand the discussion of HMR’s localization in dependency of the cell type and stage. These studies not only reveal the high diversity of HMR localization but also suggests that the potential of HMR to localize to the centromere as well as pericentromeric heterochromatin is crucial for its function. In the revised manuscript we have now discussed the fact that HMRdC still localizes to the centromere up to stage 7 more extensively.

      In sum, it would also be nice for the authors to take a clear position on whether HMR is centromeric, heterochromatic or both in the cells they analyze by microscopy and why these localizations may change between the cells they have looked at.

      The fact that we now include a novel figure where we investigate HMR’s localization in different cell types allows us to discuss the (diverse) localization as well as its potential regulation more extensively. As the localization is highly dependent on the cell type observed as well as the cell cycle stage use, we feel that these aspects need to be taken into account when describing HMRs localization. This is now discussed in the revised manuscript.

      4) HMR2 analyses - I think HMR2 is an important mutant to include as a control for HMRDC, especially since the authors should already have the required strains/data. I specifically mean the following,

      1. a) Figure 4C - Please add HMR2 ChIP-seq tracks only if the authors already have this data.

      Unfortunately, we were unable to acquire convincing HMR2-ChIP data. This may be due to the fact that HMR2localizes quite diffusely or due to a lower percentage of cells expressing this allele in the S2 lines used. Both issues do not influence our interpretations in AP-MS experiments or in single cell based fluorescence microscopy assays, but is problematic in bulk cell population assays like ChIP. Therefore, we cannot provide good HMR2 ChIP-Seq tracks.

      b) Figure 5C and Figure S7B - Add HMR2 IF images. Please also discuss HMR2 localization to centromeres and heterochromatin.

      In the revised manuscript, we have/will attache(d) IF images of ovarial tissue made from strains heterozygous for the Hmr2 allele. Due to the lower gene dosage the intensity of HMR stainings is reduced making a precise localization more difficult. As the manuscript mainly focusses on the description of the newly discovered HmrdC allele, we have added this as supplemental material.

      c) Figure 5E - Increase n's for the HMR2 fertility assay.

      The HMR2 allele has been extensively characterized by Aruna and colleagues (Aruna et al., Genetics (2009)) with regards to its effect on fertility. For this particular assay we only use it as a positive control and reference for the newly described HMRdC allele. We therefore feel that an increase in the number of replicates would be redundant to the earlier publications.

      5) HMR localization in female germline cells - Given that the authors indicate that female fertility and telomeric transposon suppression are compromised with HMR2 and HMRDC, I think it would strengthen the manuscript to address HMR localization with respect to heterochromatin and centromeres in the nurse cells and/or oocytes.

      We now also include the staining of endogenous HMR (figure 5A revised ms) in nurse cells, oocytes and early-stage follicle cells. This allows us to expand the discussion of HMR’s localization in dependency of the cell type and stage.

      6) I find the last part of the abstract and discussion i.e. HMR bridges heterochromatin and the centromere, to be very speculative based on the data presented. As far as I can tell, the only experimental basis for this conclusion is the fact that HMR binds known centromeric and heterochromatic proteins. With this logic, you could easily make a similar argument for the numerous proteins that colocalize with centromeric and pericentromeric heterochromatin. Personally, I would not speculate extensively on a HMR bridging activity without more compelling functional readouts.

      Our hypothesis of HMR as a bridging factor between centromeric and pericentromeric heterochromatin is not only based on its colocalization and interaction with components of chromatin types but also on our previous findings that an HMR knockdown results in a moderate centromere declustering and studies using super-resolution microscopy, which indicate that HMR is sandwiched between the two components (Kochanova, N. Y. et al. (2020)). As the proteomic analysis of the two HMR alleles presented in this study suggest that interactions with both components are required for full functionality of HMR, we assume that it bridges between the two chromatin components. However, we agree with the reviewer that this could also be explained by a centromeric as well as a heterochromatic function of HMR, which are independent from each other. We therefore removed the hypothesis from the abstract and discussed it together with other potential explanations for our findings.

      **Minor comments:**

      1) Intersection plot - I would explain the intersection plot on Figure 1C more thoroughly (I found it confusing).

      We expanded the paragraph in which we explain the intersection plot in figure 1C.

      2) Image colours - The images in Figure S2 and Figure S7 are hard to interpret due to the colours used for the HA and Hmr channel respectively. I would use the white pseudo-colour for DAPI and omit this channel from the merged image and insets (a line demarcating the nucleus would suffice in the merged image). In addition, a linescan would better represent colocalizations or lack thereof.

      We will omit the DAPI channel from the merged images and used a line to demarcate the nucleus as suggested by the reviewer in the revised manuscript. To better illustrate co-localisation of distinct factors we will used line profile plots.

      3) I'm not convinced that one can determine stoichiometry and sub-stoichiometry of protein complexes based on spectral counts; spectral counts could be affected by other factors. Therefore, I would hesitate to use "However, HP1a is only present in sub-stoichiometric amounts in the AP-MS purifications with antibodies against the SCC...."

      The question of whether the stoichiometry of complexes using iBAQ values of purified protein complexes is intensely discussed in the field. Several studies do suggest that this can indeed be done (i.e. Wohlgemuth, Iet al. Proteomics 15, 862–879 (2015); Smits, A. H., Nucleic Acids Research 41, e28–e28 (2012)), which is why we commented on the lower intensity of HP1a relative to the other subunits of the complex. However, we agree with the reviewer that this can only be an approximation rather than a precise measurement (which would need a full in vitro reconstitution, see comments above). We have mentioned this in the revised manuscript.

      4) Ambiguity in description of methods - In the methods section 'Crosses for generating Hmr genotypes for hybrid viability assays', the authors state that "In the rescue experiment, Hmr+ served as a positive (lethality rescue) and Hmr2 as a negative control (no lethality rescue)". The authors might consider rewording this as I think it's a bit strange to refer to hybrid male lethality as a rescued state.

      We agree with the reviewer that the wording to describe the assay we used to investigate HMR’s function in male hybrids is counterintuitive as a “rescue of functionality” results in male hybrid lethality. To better describe it we now call the assay “hybrid viability suppression”, according to the nomenclature that has been used by Aruna et al, 2009 (Aruna, S. et al. Genetics (2009)).

      .

      Reviewer #1 (Significance (Required)):

      **Nature and Significance of the advance:**

      This work adds to the study of reproductive isolation in Drosophila by defining a stable set of molecular interactors of the HMR hybrid incompatibility protein. In my opinion, this study offers a platform for future research into the poorly understood molecular events that trigger hybrid incompatibility in Drosophila. In addition, the authors generate a novel HMR mutation (HMRDC) that also rescues hybrid male lethality and it would be interesting to determine in finer detail how closely this mutation mimics other known HMR mutations. A characterization of BOH1/BOH2 would have also significantly strengthened the manuscript.

      We would like to thank the reviewer for the appreciation of our work. We agree with the reviewer that a deeper characterization of BOH1/BOH2 will further unravel their role in the complex. However, our initial experiments using null alleles or knock downs of BOH1 and BOH2 in D.mel showed no effect or only minor effects on transposon activation and hybrid male lethality. This is most probably due to the fact that the D.sim alleles can fully complement for their function. Moreover, the recombinant expression of BOH1 and 2 turned out to be difficult due to problems in protein solubility. We therefore need to postpone our BOH1 and 2 studies to a later timepoint.

      **My Expertise:**

      Satellite DNA repeats, Chromocenters, Speciation, Hybrid Incompatibility

      **Referees cross-commenting"

      I also agree that all the reviewer comments are reasonable. The manuscript would be significantly improved by making conclusions that can be supported by the data. I think some additional experiments are also warranted to make the paper more robust.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this study, the authors identify a protein complex that contains hybrid incompatibility genes Hmr and Lhr, naming it SCC (Speciation Core Complex). This paper's major conclusions are: 1) overexpression of Hmr (which resembles the situation in hybrid, where hmr/lhr are overexpressed) results in ectopic protein-protein interaction. 2) Hmr's DNA binding domain (mutated in Hmr2) and C-terminal domain (known to interact with Lhr) are important for its function and in causing hybrid lethality.

      The identification of SCC complex is quite intriguing, but this paper does not cover much of functional significance of this complex at all. For example, does mutating other components of SCC complex (BOH1 etc) rescue hybrid lethality? Without examining these important issues, they instead drifted to study the domain function of Hmr. It is not so clear why these two lines of studies are glued together in one paper.

      It is not that I insist that the authors have to do all these experiments, but the assembly of the paper makes this paper quite inconclusive. After reading it, the readers are left behind wondering what is the function of SCC---and we do not even know whether 'speciation core complex' is a fair naming, without any knowledge whether any of the components being involved in speciation or not.

      Overall, this work contains a lot of important information, which promises future breakthrough on the subject matter. However, unfortunately, the study is not carried out to generate any conclusion and is fairly incomplete at this point.

      We thank the reviewer for his appreciation of the importance of our work and apologize that we did not clarify the reasoning of the experiments sufficiently. We think that part of the reviewer’s disappointment is due the fact that we named the complex speciation core complex (SCC), which was indeed an unfortunate decision as we are unable to investigate the complex in male hybrids where it exerts it’s function in mediating hybrid incompatibility (see also answer to comments of reviewer 1). We therefore changed the name to HMR complex and tried to better explain the rational of our experiments in the text.

      **Specific comments.**

      • Quality of Fig4A is too low. I cannot even tell where is the boundary of nucleus. Diffuse signal in category 'yellow' and 'grey'---are they entire cell or nucleus or nucleolus? Please add additional marker(s) for better interpretation of the Hmr signal presented.

      We have improved the quality of figure 4A by adding lines to indicate the nuclear boundary and inserting profile plots to better illustrate the different types of co-localisation.

      • In Fig4A and 5C, the localization of Hmr (wild type version) looks quite different in these two images. Which image is more 'representative' for Hmr localization? (as they build the logic on Hmr localization, this inconsistency is quite bothering). This might be cell-type-specific issue, but if so, how do we know the relevance of their localization? These issues make the result of localization analysis of wt/mutant Hmr inconclusive.

      After reading the reviewers responses we realized that we did not describe our findings well enough, which resulted in a major confusion about the localization of HMR in cells. Indeed, the localization of HMR differs widely depending on the cell type used. We have now included a new figure (new Figure 5A) illustrating the analysis of the endogenous HMR localization in ovaries isolated from D.mel. We hope that the additional figure together with our interpretation helps to alleviate the confusion and adds to the understanding of HMR’s function and potential evolution of HMR.

      Reviewer #2 (Significance (Required)):

      Hmr and Lhr are known as 'hybrid incompatibility genes', deletion of which rescues male hybrid lethality in Drosophila melanogaster/simulans hybrid crosses. Understanding the molecular function of Hmr and Lhr is expected to provide insights into the fundamental question of how two species become incompatible (i.e. how speciation occurs). This study investigates the protein complex that contains Lhr and Hmr, identifying a previously unidentified 'core' complex. Understanding the function of this complex may significantly advance our understanding of speciation.

      **Referees cross-commenting"

      I think all review comments are reasonable. However, I'd like to emphasize that the biggest issue with this paper is not about the data, but how the authors frame it. The term such as 'speciation core complex' is beyond 'hype' (not even 'exaggeration'). Simply there is no evidence that this term can be supported. I think the authors need to be more ethical. I would be surprised if authors truly believe they can claim that the term 'speciation core complex' is justifiable in science.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      The manuscript "The integrity of the speciation core complex is necessary for centromeric binding and reproductive isolation in Drosophila" by Lukacs and colleagues describes a study that show, by mass-spec and ChIP-seq, that two well established hybrid incompatibility proteins form a 6-protein complex that predominantly localizes near HP1a bound chromatin boundaries. With a C-terminal domain of HMR deleted, the 6-protein core complex was not disrupted, but its interaction and subsequent localization to HP1a domain near centromeres was lost. In addition, an HMR double mutant that disrupts the interaction between HMR and other components of the 6-protein core complex was tested and similar distribution patterns as for the dC mutant were observed. Next, the nuclear localization was HMR was tested in fruit fly follicle cells by IF. In endoreplicating cells, HMR-dC did not colocalize with HP1a, as did the double mutant. The expression level of several transposable elements (TEs) was assessed and only the full length wt Hmr transgene was able to rescue the repression of TEs, whereas neither the dC and double mutants did. When the number of offspring was assayed, a similar pattern was observed. Finally, male hybrid lethality was assayed by crossing D melanogaster mothers with different Hmr alleles with wt D simulans and only the wt Hmr allele resulted in male lethality, whereas both cD and double mutants resulted in 10-40% of the offspring to be male. These findings led the authors to conclude that 1) 6-protein speciation core complex containing HMR, LHR, NLP, NPH, and two uncharacterized proteins called BOH1 and BOH2, 2) overexpression of HMR/LHR results in novel interactions with other chromatin factors, 3) both the double mutant (E317K and G527A) and the C-terminal deletion mutant are important for for protein-protein interaction within the 6-protein complex and associated factors such as HP1a, and 4) HMR bridges heterochromatin and centromeres.

      **Major comments:**

      • Most of the key conclusions are supported by the evidence presented in this manuscript. The link between centromeres and HMR (and presumably the rest of the 6-protein complex) hinges only on colocalization IF and ChIP-seq data. The change in Hmr localization in cycling follicle vs endoreplicating cells of especially the dC mutant is very interesting. The loss of CENP-C signal correlates with a change in Hmr^dC signal. What exactly drives this change is not explored.

      We have shown in the past that HMR requires full length Cenp-C to localize to the centromere in S2 cells. We assume that this is also the case in the follicle cells. Therefore, the lack of Cenp-C recruitment in endoreplicating cells is likely the reason why HMR localizes primarily to HP1a containing heterochromatin. Differently from wild type HMR, HMRdC can’t bind LHR/HP1a as our AP-MS data show and therefore is not recruited to heterochromatin and diffuses away in later stages. We have described this point more extensively in the revised manuscript

      • The data presented in this manuscript are mostly clear (see minor comments) and appear to be reproducible, especially as the methods sections is detailed and both the ChIP-seq and mass-spec data is deposited in publicly accessible databases.
      • The rational why both HMR and LHR are overexpressed in cell lines is not clearly explained.

      As outlined in our response to reviewer 1 the overexpression of HMR and LHR was designed to simulate the hybrid situation, which shows an increase in HMR and LHR levels (Thomae, A. W. et al. Developmental Cell 27, 412–424 (2013)). We have indicated this in the revised manuscript.

      • The HMR/LHR overexpression experiment is very nice, and as one would expect, resulted in more protein interactions. Some of these might simply be the result from the abundance of HMR and LHR, which have saturated the core 6-protein complex. This leaves the question what is the true minimal size of the HMR/LHR complex? The dC mutant that removes the BESS domain as well as the double point mutations that disrupts the complex altogether, get to the importance of the stability of the complex and its association with especially HP1a. What the minimal interacting partners of HMR and LHR could be explored by knocking-down both factors and do mass-spec.

      We agree with the reviewer that the abundance of HMR and LHR results in a saturation of the core complex thereby having a spillover effect on other proteins. In this regard it is worth mentioning that the expression of the Hmr2 allele does not completely disrupt the complex but rather results in a loss of interactions with NLP, NPH, BOH1 and BOH2 while maintaining the interaction with LHR and HP1a. In fact, when the HMR2 protein is expressed, it shows a stronger interaction with known heterochromatic proteins than the wt protein (Figure 3B). As both mutant alleles show functional defects in pure species and in male hybrids we assume that HMR and LHR need to bind both chromatin types simultaneously. We consider the complex to be somewhat modular as we show that HMR and LHR can interact in isolation (Figure S4) while others have shown that LHR and HP1a, as well as NLP and NPH interact (**Greil, F. et al. EMBO J (2007); Anselm, E. et al. Nucleic Acids Research (2018)respectively). This is now pointed out in the revised manuscript

      • For the telomeric TE expression as well as offspring count shown in Figure 5D,E, a wild-type control would be informative as a measure how well the Hmr+/+ rescues both phenotypes.

      The misregulation of transposable elements (TE) and fertility defects of Hmr loss of function mutants have been previously characterized (Satyaki, P. R. V. et al. PLoS genetics (2014); Aruna et al.,Genetics (2009))**. We therefore rather focused on the relative expression of TEs in the HmrdC and Hmr2 mutants relative to the wild type rescue allele (Hmr+). Hmr2 serves as a known non-rescue allele (Aruna et al., 2009) in the fertility experiment, while in the TE experiment we describe for the first time a defect in TE repression for this allele.

      **Minor comments:**

      • In the opening paragraph of the introduction, the authors describe a scenario of sympatric speciation, which is subsequently highlighted by the speciation event between D. melanogaster and D. simulans. Yet, these two species have similar but not identical distribution range, leaving open the possibility the speciation event happened in parapatry. It might be worth rephrasing the first paragraph to leave open both modes of speciation, especially as the manuscript focuses on the mechanistic side of hybrid incompatability-associated proteins.

      We did not want to imply that our experiments allow a distinction between a sympatric or parapatric speciation. We thank the reviewer for pointing this out and rephrased the first paragraph accordingly.

      • Some of the abbreviations are repeated (e.g. SCC) others aren't introduced (e.g. HI). Overall, less abbreviations will make the text more readable, especially for non-experts.

      We tried to avoid acronyms wherever possible and got rid of the term SCC altogether. All acronyms are introduced at the first appearance.

      • In IF signal in Figure 4A is difficult to see on the black background. I would suggest either increasing the gain to improve the visibility of the signal or show in black-and-white. In addition, the colors should be labeled in the figure for clarity.

      We improved the quality of Figure 4A and labeled the different types of localization (see also answer to reviewer 1).

      • In Figure 5C the images for the Hmr^KO;Hmr^2 appears to be missing.

      See answer to reviewer 1 (4b). We have/will include the corresponding picture as supplementary material as we consider the characterization of the novel Hmr allele to be the main focus of the manuscript.

      In addition, for non-experts it might be helpful to mention which set of IF images are controls, rescues, and test, similar to what was done in Figure 5B.

      We have/will indicate which IF pictures are controls and rescue experiments

      Reviewer #3 (Significance (Required)):

      **Significance:**

      • This study provides novel insight how two factors involved in male hybrid lethality, with which chromatin factors they are associated, and how two mutants impact the chromatin localization and in vivo phenotypes.
      • Understanding the molecular basis of speciation is limited as most factors that drive speciation are not identified. Drosophila species are at the forefront of this research. Post-zygotic factors have predominantly found to have strong speciation potential. This work build very nicely on this work.
      • This manuscript will be predominantly interesting for the Drosophila chromatin field and speciation field.
      • I am trained in comparative genomic focusing on centromeric repeats and now study chromatin dynamics at the single molecule level, using cell biology, biochemical and biophysical tools.

      We thank the reviewer for appreciating our work. We think that our work will also be interesting for researchers focusing on centromere clustering and genome organization in general and independently of the Drosophila system.

      **Referees cross-commenting"

      Reviewer comments look reasonable to me- 1-3 months revision is not an undue burden, I think they can do at least some of what was requested. In response to Rev2: Agreed, they ought to tone it down

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this study, the authors identify a protein complex that contains hybrid incompatibility genes Hmr and Lhr, naming it SCC (Speciation Core Complex). This paper's major conclusions are: 1) overexpression of Hmr (which resembles the situation in hybrid, where hmr/lhr are overexpressed) results in ectopic protein-protein interaction. 2) Hmr's DNA binding domain (mutated in Hmr2) and C-terminal domain (known to interact with Lhr) are important for its function and in causing hybrid lethality.

      The identification of SCC complex is quite intriguing, but this paper does not cover much of functional significance of this complex at all. For example, does mutating other components of SCC complex (BOH1 etc) rescue hybrid lethality? Without examining these important issues, they instead drifted to study the domain function of Hmr. It is not so clear why these two lines of studies are glued together in one paper.

      It is not that I insist that the authors have to do all these experiments, but the assembly of the paper makes this paper quite inconclusive. After reading it, the readers are left behind wondering what is the function of SCC---and we do not even know whether 'speciation core complex' is a fair naming, without any knowledge whether any of the components being involved in speciation or not.

      Overall, this work contains a lot of important information, which promises future breakthrough on the subject matter. However, unfortunately, the study is not carried out to generate any conclusion and is fairly incomplete at this point.

      Specific comments.

      • Quality of Fig4A is too low. I cannot even tell where is the boundary of nucleus. Diffuse signal in category 'yellow' and 'grey'---are they entire cell or nucleus or nucleolus? Please add additional marker(s) for better interpretation of the Hmr signal presented.
      • In Fig4A and 5C, the localization of Hmr (wild type version) looks quite different in these two images. Which image is more 'representative' for Hmr localization? (as they build the logic on Hmr localization, this inconsistency is quite bothering). This might be cell-type-specific issue, but if so, how do we know the relevance of their localization? These issues make the result of localization analysis of wt/mutant Hmr inconclusive.

      Significance

      Hmr and Lhr are known as 'hybrid incompatibility genes', deletion of which rescues male hybrid lethality in Drosophila melanogaster/simulans hybrid crosses. Understanding the molecular function of Hmr and Lhr is expected to provide insights into the fundamental question of how two species become incompatible (i.e. how speciation occurs). This study investigates the protein complex that contains Lhr and Hmr, identifying a previously unidentified 'core' complex. Understanding the function of this complex may significantly advance our understanding of speciation.

      **Referees cross-commenting"

      I think all review comments are reasonable. However, I'd like to emphasize that the biggest issue with this paper is not about the data, but how the authors frame it. The term such as 'speciation core complex' is beyond 'hype' (not even 'exaggeration'). Simply there is no evidence that this term can be supported. I think the authors need to be more ethical. I would be surprised if authors truly believe they can claim that the term 'speciation core complex' is justifiable in science.

    1. switched to biodegradable packaging

      Why do you think they don't make the switch? Is it just because single-use plastic is cheaper? How can we as consumers hold these companies accountable, when there may not be other options?

    1. As for the land, oceanic inventories are likelyvery incomplete. For example, there are morethan 500 species of the lovely and medically im-portant genus of marine snail,Conus. Of the 316species ofConusfrom the Indo-Pacific region,Röckelet al.(1995)find that nearly 14% weredescribed in the 20 years before their publication.

      Much of the ocean is undiscovered as we can read here, but think of the species we do not know about for the creatures we can't see. Virus and bacteria is so diverse and understanding them may lead to learning about new bio systems.

    1. Swarming may overturn the existing order of world power

      Think about this from a surveillance lens... The surveillance lens tells us that as technology develops, and the capacity to store data increases (and centralizes/decentralizes) surveillance increases, becomes more precises, domineering, and increases its power for control and subjegation, and that the 'existing order of the world' evolves as the power of surveillance increases. Remeber: docile bodies: it's not that it becomes more efficient, it becomes more effective through its ability to become internalized and self perpetuated by subject populations. Here, with this new type of data collecting military technology, we see the capacity emerging for the 'existing order of world power' to evolve to new thresholds once again. (As in, from sovereign, to discipline, to control).

    1. It is often difficult for conservation officials tounderstand the histories of peoples that may havepreceded a park or protected area.

      I feel like we as a society have always struggled with trying to understand the history of people's cultures. And why do you think that is?

    1. But who controls articulation?Because the English language is a multifaceted orationSubject to indefinite transformationNow you may think that it is ignorant to speak broken EnglishBut I’m here to tell you that even “articulate” Americans sound foolish to the British

      I loved the last line, but Essentially she's right its like there's grammar Nazi's dictating how language is supposed to be used and performed in a sense. Because language is a tool that as humans we use to express abstract ideas and better yet the spectrum of emotion that we sometimes cant find words for. how much you want to bet another language has a spot on way of describing that very feeling. Anyway language has come a long way, but I guess her point is its still got a ways to go, its still changing.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their positive comments on our manuscript. To address their criticisms, we propose to do the following experiments:

      Reviewer 1 (mi__nor comments)__:

      1. In Fig. 1, the authors show that Btz-WT, but not Btz-HD, localizes to the posterior pole of the oocyte. Do the authors see Btz-WT and/or Btz-HD localized to MNs/muscles/glia at the NMJ? We have had difficulty detecting the expression of our Btz-GFP transgenes at the NMJ. In case this was due to competition with endogenous wild-type Btz, we will repeat the staining in a btz mutant background. If the protein is still undetectable, we can include data showing the localization of UAS-Btz-GFP when overexpressed in muscles or motor neurons.

      The mitochondrial phenotypes observed in Btz mutants are striking. But it seems possible that there are defects in overall mitochondrial levels in muscle in addition to defects in their localization. Overall, mitochondrial levels seemed reduced in Btz mutants. Is it possible to do a ATP5A immunoblot in Btz mutants to test whether overall mitochondrial levels are altered?

      We will do a Western blot to compare ATP5A levels in btz2/+ and btz2/Df(3R)BSC497 larval carcasses.

      ECM proteins are known to be critical for regulating TGFB signaling. That, taken with the multi-tissue genetic requirement for Btz, suggests that Btz might directly regulate either Ltl or Frac RNA, given that these ECM proteins are likely deposited by multiple cell types.

      We agree that this is a possibility and we will mention it in the Discussion.

      Reviewer 2 (major comments):

      1. In Figure 1, regarding the validation of rescue constructs: the EJC interaction-defective mutant is based solely on conservation, as all structural/interaction studies cited with Btz bound to EJC have been with human proteins. They use Vasa localization as a readout of EJC-dependent function, but this is indirect and only assesses one aspect of EJC function (localization). Since many of the main conclusions in the paper are predicated on this mutant being EJC-independent, they should validate this with the Drosophila orthologs using immunoprecipitation. They demonstrate the capability of expressing GFP-tagged versions of Casc3 WT and mutant in S2 cells, so this should not be a cumbersome control experiment to include. We will express tagged Btz-WT and Btz-HD proteins in S2 cells and test whether they can be co-immunoprecipitated with Myc-tagged Drosophila eIF4AIII.

      Regarding Figure 3, it could be postulated that the number of boutons would be influenced by the length of axons. Is axon outgrowth accounted for in these experiments? This would influence number of synaptic boutons. Panel F looks very different from panel A in terms of axon length (could this be due to axon outgrowth defect and/or impacted muscle size?) Can quantitation be done also by normalizing to axon length (bouton number/axon length)? Or perhaps this is accounted for in muscle size? If so, this should be explained.

      • *

      The NMJ grows during development by adding both axonal branches and synaptic boutons, so its size can be measured by counting the number of boutons or branches or measuring branch length. These measures are usually well correlated. In this paper we used bouton number normalized to muscle surface area as our measure of NMJ size, but we did observe corresponding changes in the number and length of branches, as the reviewer points out. We will explain this more clearly in the text.

      In Figure 3 quantification: n's vary between genotypes significantly, and this should be explained (e.g. was there a recovery issue between genotypes or just fewer needed for WT-like?).

      • *

      The btz mutant larvae are more difficult to dissect due to muscle fragility, and some crosses in this genetic background may have yielded fewer usable filets than desired. We believe the numbers we obtained are sufficient to show which differences are significant.

      In Figure 4 panels B and F (mutants), there appears to be reduced axon outgrowth (see point above). This should be taken into account when expressing bouton number.

      • *

      As explained in our response to point 2, axon length and bouton number are correlated measures of synapse size and vary together in this figure as expected.

      The RNA-seq data (Figure 5) has a potential issue in that they used larvae with a balancer chromosome (Df), which yields a 50% reduction in any genes on that chromosome. They acknowledge this and removed these genes from the analysis, but the concern remains that this still might be a confounding variable (for example, if reduction in any of these genes might disrupt a signaling pathway). We do not think that the RNA-seq needs to be repeated, but we propose that the authors validate these targets using qPCR in their MN-specific btz knockdown system (this way, they can also include magoh and eif4aIII knockdowns for comparison).

      • *

      Because only one btz allele was available, we used transheterozygotes with a deficiency for the region to avoid homozygosing other mutations that might be present on the btz2 chromosome. As a consequence, we did observe reduced expression of genes located within the deficiency (which covers a small region, not an entire chromosome), and it is possible that this might contribute to the phenotype. However, we have seen a similar reduction in NMJ size in btz2 homozygotes. We do not think that motor neuron-specific btz knockdown is a useful genotype to validate the RNA-Seq results because ltl and frac levels do not change significantly in the CNS, only in muscle, and knockdown only in motor neurons would be unlikely to change daw levels measured in the whole CNS. Knocking down mago or eIF4AIII in muscle is lethal before the third larval instar stage, preventing us from comparing their effects on gene expression to those of btz. However, we will do qRT-PCR to measure daw, ltl and frac mRNA levels in btz2 homozygous mutant muscles.

      Reviewer 2 (minor comments):

      1. *Some statements made in the introduction that are not entirely accurate: **

        "A fourth core subunit, known as Barentsz (Btz), Cancer susceptibility candidate gene 3 (CASC3), or Metastatic lymph node 51 (MLN51), associates with the complex following the completion of splicing, and is required for the effects of the EJC on translation, NMD and mRNA localization (Chazal et al., 2013; Palacios et al., 2004; Shibuya et al., 2006; van Eeden et al., 2001)."

        A recent study indicates that Casc3 is not required for EJC-dependent NMD targets in human cells, but rather enhances NMD on a subset of targets (Gerbracht et al. 2020 NAR). Perhaps "is required" should be changed to "plays a role in cytoplasmic EJC-mediated processes, such as...". It has also been shown that EJC core can assemble without Casc3 (e.g. Ballut et al 2005 NSMB, Gehring et al 2009 PLoS Biol). Previous work from the authors show that Casc3 (Btz) is not necessary for EJC function in pre-mRNA splicing (Roignant et al, 2010 Cell). Further, there exists a population of Casc3 lacking EJCs in human cells (Mabin et al 2018 Cell Reports). Collectively, all this evidence points to Casc3 not being a core EJC subunit. *

      • *

      We will change the text so that we do not refer to Btz/Casc3 as a core subunit.

        • "In the mouse brain, haploinsufficiency for Magoh, Rbm8a or Eif4a3 causes severe microcephaly, but complete loss of Casc3 has a much milder effect that can be attributed to developmental delay (Mao et al., 2017; Mao et al., 2016; Mao et al., 2015; Silver et al., 2010)."

        From Mao et al 2017: complete loss and hypomorphic mutants were embryonic and perinatally lethal (contrary to what the authors are stating here), while compound mutants and heterozygotes exhibited neurodevelopmental delay. By "milder effects" the authors could also be referring to brain size being proportional to body size in the complete loss homozygotes; either way, this should be clarified. *

      • *

          By “milder effects” we meant the effect on brain size. We will clarify this in the revised text.
        

      Fly-specific nomenclature could be made more accessible to a broader audience, as the full readership will likely not have expertise in Drosophila genetics. For example, w118, btz2 labels used in figures are not explained anywhere in the manuscript. While the authors do a good job of describing various mutants in a more accessible fashion in the results section, the genotype labels in figures can be better explained in the legends.

      We apologize for this and will clarify the genotype labels in the figure legends.

      Fig 2 L-N panels might warrant more explanation. Can the mitochondria be counted here? Is there also a difference in volume/morphology that could be quantitated? In Figure 2N, muscle fibers are more densely packed in mutant vs. control; can this be explained?

      • *

      We are hesitant to quantify mitochondria or comment on muscle fiber packing based on the EM images, because only one individual of each genotype was examined. We prefer to simply use these images to provide a higher resolution view of the change in mitochondrial distribution that we observed and quantified using light microscopy. However, we do plan to do a Western blot to determine whether there are changes in the number of mitochondria in btz mutants (see Reviewer 1 point 2).

      In Fig 2, to draw parallels between panels A-K and L-N, it might also be helpful to use the red/yellow arrow system on panel A for comparison.

      This is a good suggestion that we will follow.

      In Figure 3, it might be helpful for a general audience to include zoomed-in picture of boutons (as in Fig 5B), as some panels appear to have less defined bouton shape.

      • *

      We do observe that boutons tend to be less well separated from each other in btz mutants, and will include zoomed-in pictures to document this.

      Is the bouton size different in the mutant in Figure 3? Can this be quantified?

      We do not think that there is a significant difference in bouton size in btz mutants, but we will measure this and include a quantification.

      Fold changes are modest and not very apparent in staining (we acknowledge that this could be due to early developmental time point). Images could better point out differences in WT vs. mutant that are not readily apparent to those outside the fly neurodevelopment audience.

      Because of the inherent variability in synapse shape, it can be difficult to appreciate changes in bouton number from a single image. However, our quantifications show that the changes are consistent and significant.

      Fig 4 NMJs are shown on different scale (more zoomed in) than in Figure 3, and differences are bit easier to see at this scale. Presenting Fig 3 on this scale might help the reader with visualizing the differences in WT versus mutant.

      • *

      We will crop the images in Figure 3 so as to show them at the same scale as in Figure 4.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary

      Ho et al. describes the developmental functions of the Drosophila Casc3 ortholog, Barentsz (Btz) using in vivo loss-of-function and rescue experiments in Drosophila larvae. In this study, the authors find that loss of Casc3 contributes to neuromuscular defects in the larval fly. Utilizing transgenics of WT and EJC interaction-defective mutants, they demonstrate that Btz has both EJC-dependent and independent functions in the larval neuromuscular junction, wherein muscle defects are EJC dependent and synaptic defects are EJC-independent. Using RNA-seq, they find that upregulated mRNAs include those that belong to the Activin signaling pathway. They go on to find that the neuromuscular defects in Btz mutants can be attributed to dysregulation of Activin signaling, and are rescued with loss of the Activin ligand, Dawdle (Daw).

      Major Comments

      Overall, the paper presents well-controlled experiments that support the main conclusions. We propose achievable validation experiments that we believe will strengthen the conclusions of the paper. There is some concern that the magnitude of the effects are overstated, or could be made more apparent to a broader audience (i.e. those in the mRNA regulation field beyond Drosophila geneticists).

      • In Figure 1, regarding the validation of rescue constructs: the EJC interaction-defective mutant is based solely on conservation, as all structural/interaction studies cited with Btz bound to EJC have been with human proteins. They use Vasa localization as a readout of EJC-dependent function, but this is indirect and only assesses one aspect of EJC function (localization). Since many of the main conclusions in the paper are predicated on this mutant being EJC-independent, they should validate this with the Drosophila orthologs using immunoprecipitation. They demonstrate the capability of expressing GFP-tagged versions of Casc3 WT and mutant in S2 cells, so this should not be a cumbersome control experiment to include.

      • Regarding Figure 3, it could be postulated that the number of boutons would be influenced by the length of axons. Is axon outgrowth accounted for in these experiments? This would influence number of synaptic boutons. Panel F looks very different from panel A in terms of axon length (could this be due to axon outgrowth defect and/or impacted muscle size?) Can quantitation be done also by normalizing to axon length (bouton number/axon length)? Or perhaps this is accounted for in muscle size? If so, this should be explained.

      • In Figure 3 quantification: n's vary between genotypes significantly, and this should be explained (e.g. was there a recovery issue between genotypes or just fewer needed for WT-like?).

      • In Figure 4 panels B and F (mutants), there appears to be reduced axon outgrowth (see point above). This should be taken into account when expressing bouton number.

      • The RNA-seq data (Figure 5) has a potential issue in that they used larvae with a balancer chromosome (Df), which yields a 50% reduction in any genes on that chromosome. They acknowledge this and removed these genes from the analysis, but the concern remains that this still might be a confounding variable (for example, if reduction in any of these genes might disrupt a signaling pathway). We do not think that the RNA-seq needs to be repeated, but we propose that the authors validate these targets using qPCR in their MN-specific btz knockdown system (this way, they can also include magoh and eif4aIII knockdowns for comparison).

      Minor comments

      Some statements made in the introduction that are not entirely accurate:

      • "A fourth core subunit, known as Barentsz (Btz), Cancer susceptibility candidate gene 3 (CASC3), or Metastatic lymph node 51 (MLN51), associates with the complex following the completion of splicing, and is required for the effects of the EJC on translation, NMD and mRNA localization (Chazal et al., 2013; Palacios et al., 2004; Shibuya et al., 2006; van Eeden et al., 2001)."

      A recent study indicates that Casc3 is not required for EJC-dependent NMD targets in human cells, but rather enhances NMD on a subset of targets (Gerbracht et al. 2020 NAR). Perhaps "is required" should be changed to "plays a role in cytoplasmic EJC-mediated processes, such as...". It has also been shown that EJC core can assemble without Casc3 (e.g. Ballut et al 2005 NSMB, Gehring et al 2009 PLoS Biol). Previous work from the authors show that Casc3 (Btz) is not necessary for EJC function in pre-mRNA splicing (Roignant et al, 2010 Cell). Further, there exists a population of Casc3 lacking EJCs in human cells (Mabin et al 2018 Cell Reports). Collectively, all this evidence points to Casc3 not being a core EJC subunit.

      • "In the mouse brain, haploinsufficiency for Magoh, Rbm8a or Eif4a3 causes severe microcephaly, but complete loss of Casc3 has a much milder effect that can be attributed to developmental delay (Mao et al., 2017; Mao et al., 2016; Mao et al., 2015; Silver et al., 2010)."

      From Mao et al 2017: complete loss and hypomorphic mutants were embryonic and perinatally lethal (contrary to what the authors are stating here), while compound mutants and heterozygotes exhibited neurodevelopmental delay. By "milder effects" the authors could also be referring to brain size being proportional to body size in the complete loss homozygotes; either way, this should be clarified.

      General minor comments:

      • Fly-specific nomenclature could be made more accessible to a broader audience, as the full readership will likely not have expertise in Drosophila genetics. For example, w118, btz2 labels used in figures are not explained anywhere in the manuscript. While the authors do a good job of describing various mutants in a more accessible fashion in the results section, the genotype labels in figures can be better explained in the legends.

      • Fig 2 L-N panels might warrant more explanation. Can the mitochondria be counted here? Is there also a difference in volume/morphology that could be quantitated? In Figure 2N, muscle fibers are more densely packed in mutant vs. control; can this be explained?

      • In Fig 2, to draw parallels between panels A-K and L-N, it might also be helpful to use the red/yellow arrow system on panel A for comparison.

      • In Figure 3, it might be helpful for a general audience to include zoomed-in picture of boutons (as in Fig 5B), as some panels appear to have less defined bouton shape.

      • Is the bouton size different in the mutant in Figure 3? Can this be quantified?

      • Fold changes are modest and not very apparent in staining (we acknowledge that this could be due to early developmental time point). Images could better point out differences in WT vs. mutant that are not readily apparent to those outside the fly neurodevelopment audience.

      • Fig 4 NMJs are shown on different scale (more zoomed in) than in Figure 3, and differences are bit easier to see at this scale. Presenting Fig 3 on this scale might help the reader with visualizing the differences in WT versus mutant.

      Significance

      Overall, this paper contributes conceptually to understanding EJC-mediated mRNA regulation during development. The contribution here is incremental, but meaningful in terms of defining the scope of regulation by the EJC and its peripheral factors in various contexts. These findings will likely be of interest to the fields of RNA metabolism and neurodevelopment. It also adds to the existing work suggesting Casc3 may have additional functions outside of the EJC (e.g. Mao et al. 2017 RNA, Baguet et al 2007 J Cell Sci, Cougot et al. 2014 J Cell Sci); while these previous studies have suggested Casc3 roles in development and mRNA localization/granule formation that are different from the EJC core proteins, this study more directly tests an EJC-independent role in mRNA regulation of specific targets. Further addressing the molecular basis of this regulation will be outside the scope of this article but will be of interest to the field.

      We are molecular biologists who study NMD and are thus equipped to address the EJC-related molecular functions and impact on the transcriptome. We do not have expertise in Drosophila genetics or neurobiology, and thus cannot critically evaluate the specific genetic approaches used or anatomy presented to the full extent. We have, however, pointed out areas that need elaboration regarding the genetic approaches and/or presentation of data that may be unfamiliar to a broader audience (i.e. the RNA metabolism field).

    1. Author Response:

      Evaluation Summary:

      Since DBS of the habenula is a new treatment, these are the first data of its kind and potentially of high interest to the field. Although the study mostly confirms findings from animal studies rather than bringing up completely new aspects of emotion processing, it certainly closes a knowledge gap. This paper is of interest to neuroscientists studying emotions and clinicians treating psychiatric disorders. Specifically the paper shows that the habenula is involved in processing of negative emotions and that it is synchronized to the prefrontal cortex in the theta band. These are important insights into the electrophysiology of emotion processing in the human brain.

      The authors are very grateful for the reviewers’ positive comments on our study. We also thank all the reviewers for the comments which has helped to improve the manuscript.

      Reviewer #1 (Public Review):

      The study by Huang et al. report on direct recordings (using DBS electrodes) from the human habenula in conjunction with MEG recordings in 9 patients. Participants were shown emotional pictures. The key finding was a transient increase in theta/alpha activity with negative compared to positive stimuli. Furthermore, there was a later increase in oscillatory coupling in the same band. These are important data, as there are few reports of direct recordings from the habenula together with the MEG in humans performing cognitive tasks. The findings do provide novel insight into the network dynamics associated with the processing of emotional stimuli and particular the role of the habenula.

      Recommendations:

      How can we be sure that the recordings from the habenula are not contaminated by volume conduction; i.e. signals from neighbouring regions? I do understand that bipolar signals were considered for the DBS electrode leads. However, high-frequency power (gamma band and up) is often associated with spiking/MUA and considered less prone to volume conduction. I propose to also investigate that high-frequency gamma band activity recorded from the bipolar DBS electrodes and relate to the emotional faces. This will provide more certainty that the measured activity indeed stems from the habenula.

      We thank the reviewer for the comment. As the reviewer pointed out, bipolar macroelectrode can detect locally generated potentials, as demonstrated in the case of recordings from subthalamic nucleus and especially when the macroelectrodes are inside the subthalamic nucleus (Marmor et al., 2017). However, considering the size of the habenula and the size of the DBS electrode contacts, we have to acknowledge that we cannot completely exclude the possibility that the recordings are contaminated by volume conduction of activities from neighbouring areas, as shown in Bertone-Cueto et al. 2019. We have now added extra information about the size of the habenula and acknowledged the potential contamination of activities from neighbouring areas through volume conduction in the ‘Limitation’:

      "Another caveat we would like to acknowledge that the human habenula is a small region. Existing data from structural MRI scans reported combined habenula (the sum of the left and right hemispheres) volumes of ~ 30–36 mm3 (Savitz et al., 2011a; Savitz et al., 2011b) which means each habenula has the size of 2~3 mm in each dimension, which may be even smaller than the standard functional MRI voxel size (Lawson et al., 2013). The size of the habenula is also small relative to the standard DBS electrodes (as shown in Fig. 2A). The electrodes used in this study (Medtronic 3389) have electrode diameter of 1.27 mm with each contact length of 1.5 mm, and contact spacing of 0.5 mm. We have tried different ways to confirm the location of the electrode and to select the contacts that is within or closest to the habenula: 1.) the MRI was co-registered with a CT image (General Electric, Waukesha, WI, USA) with the Leksell stereotactic frame to obtain the coordinate values of the tip of the electrode; 2.) Post-operative CT was co-registered to pre-operative T1 MRI using a two-stage linear registration using Lead-DBS software. We used bipolar signals constructed from neighbouring macroelectrode recordings, which have been shown to detect locally generated potentials from subthalamic nucleus and especially when the macroelectrodes are inside the subthalamic nucleus (Marmor et al., 2017). Considering that not all contacts for bipolar LFP construction are in the habenula in this study, as shown in Fig. 2, we cannot exclude the possibility that the activities we measured are contaminated by activities from neighbouring areas through volume conduction. In particular, the human habenula is surrounded by thalamus and adjacent to the posterior end of the medial dorsal thalamus, so we may have captured activities from the medial dorsal thalamus. However, we also showed that those bipolar LFPs from contacts in the habenula tend to have a peak in the theta/alpha band in the power spectra density (PSD); whereas recordings from contacts outside the habenula tend to have extra peak in beta frequency band in the PSD. This supports the habenula origin of the emotional valence related changes in the theta/alpha activities reported here."

      We have also looked at gamma band oscillations or high frequency activities in the recordings. However, we didn’t observe any peak in high frequency band in the average power spectral density, or any consistent difference in the high frequency activities induced by the emotional stimuli (Fig. S1). We suspect that high frequency activities related to MUA/spiking are very local and have very small amplitude, so they are not picked up by the bipolar LFPs measured from contacts with both the contact area for each contact and the between-contact space quite large comparative to the size of the habenula.

      A

      B

      Figure S1. (A) Power spectral density of habenula LFPs across all time period when emotional stimuli were presented. The bold blue line and shadowed region indicates the mean ± SEM across all recorded hemispheres and the thin grey lines show measurements from individual hemispheres. (B) Time-frequency representations of the power response relative to pre-stimulus baseline for different conditions showing habenula gamma and high frequency activity are not modulated by emotional

      References:

      Savitz JB, Bonne O, Nugent AC, Vythilingam M, Bogers W, Charney DS, et al. Habenula volume in post-traumatic stress disorder measured with high-resolution MRI. Biology of Mood & Anxiety Disorders 2011a; 1(1): 7.

      Savitz JB, Nugent AC, Bogers W, Roiser JP, Bain EE, Neumeister A, et al. Habenula volume in bipolar disorder and major depressive disorder: a high-resolution magnetic resonance imaging study. Biological Psychiatry 2011b; 69(4): 336-43.

      Lawson RP, Drevets WC, Roiser JP. Defining the habenula in human neuroimaging studies. NeuroImage 2013; 64: 722-7.

      Marmor O, Valsky D, Joshua M, Bick AS, Arkadir D, Tamir I, et al. Local vs. volume conductance activity of field potentials in the human subthalamic nucleus. Journal of Neurophysiology 2017; 117(6): 2140-51.

      Bertone-Cueto NI, Makarova J, Mosqueira A, García-Violini D, Sánchez-Peña R, Herreras O, et al. Volume-Conducted Origin of the Field Potential at the Lateral Habenula. Frontiers in Systems Neuroscience 2019; 13:78.

      Figure 3: the alpha/theta band activity is very transient and not band-limited. Why refer to this as oscillatory? Can you exclude that the TFRs of power reflect the spectral power of ERPs rather than modulations of oscillations? I propose to also calculate the ERPs and perform the TFR of power on those. This might result in a re-interpretation of the early effects in theta/alpha band.

      We agree with the reviewer that the activity increase in the first time window with short latency after the stimuli onset is very transient and not band-limited. This raise the question that whether this is oscillatory or a transient evoked activity. We have now looked at this initial transient activity in different ways: 1.) We quantified the ERP in LFPs locked to the stimuli onset for each emotional valence condition and for each habenula. We investigated whether there was difference in the amplitude or latency of the ERP for different stimuli emotional valence conditions. As showing in the following figure, there is ERP with stimuli onset with a positive peak at 402 ± 27 ms (neutral stimuli), 407 ± 35 ms (positive stimuli), 399 ± 30 ms (negative stimuli). The flowing figure (Fig. 3–figure supplement 1) will be submitted as figure supplement related to Fig. 3. However, there was no significant difference in ERP latency or amplitude caused by different emotional valence stimuli. 2.) We have quantified the pure non-phase-locked (induced only) power spectra by calculating the time-frequency power spectrogram after subtracting the ERP (the time-domain trial average) from time-domain neural signal on each trial (Kalcher and Pfurtscheller, 1995; Cohen and Donner, 2013). This shows very similar results as we reported in the main manuscript, as shown in Fig. 3–figure supplement 2. These further analyses show that even though there were event related potential changes time locked around the stimuli onset, and this ERP did NOT contribute to the initial broad-band activity increase at the early time window shown in plot A-C in Figure 3. The figures of the new analyses and following have now been added in the main text:

      "In addition, we tested whether stimuli-related habenula LFP modulations primarily reflect a modulation of oscillations, which is not phase-locked to stimulus onset, or, alternatively, if they are attributed to evoked event-related potential (ERP). We quantified the ERP for each emotional valence condition for each habenula. There was no significant difference in ERP latency or amplitude caused by different emotional valence stimuli (Fig. 3–figure supplement 1). In addition, when only considering the non phase-locked activity by removing the ERP from the time series before frequency-time decomposition, the emotional valence effect (presented in Fig. 3–figure supplement 2) is very similar to those shown in Fig.3. These additional analyses demonstrated that the emotional valence effect in the LFP signal is more likely to be driven by non-phase-locked (induced only) activity."

      A

      B

      Fig. 3–figure supplement 1. Event-related potential (ERP) in habenula LFP signals in different emotional valence (neutral, positive and negative) conditions. (A) Averaged ERP waveforms across patients for different conditions. (B) Peak latency and amplitude (Mean ± SEM) of the ERP components for different conditions.

      Fig. 3–figure supplement 2. Non-phase-locked activity in different emotional valence (neutral, positive and negative) conditions (N = 18). (A) Time-frequency representation of the power changes relative to pre-stimulus baseline for three conditions. Significant clusters (p < 0.05, non-parametric permutation test) are encircled with a solid black line. (B) Time-frequency representation of the power response difference between negative and positive valence stimuli, showing significant increased activity the theta/alpha band (5-10 Hz) at short latency (100-500 ms) and another increased theta activity (4-7 Hz) at long latencies (2700-3300 ms) with negative stimuli (p < 0.05, non-parametric permutation test). (C) Normalized power of the activities at theta/alpha (5-10 Hz) and theta (4-7 Hz) band over time. Significant difference between the negative and positive valence stimuli is marked by a shadowed bar (p < 0.05, corrected for multiple comparison).

      References:

      Kalcher J, Pfurtscheller G. Discrimination between phase-locked and non-phase-locked event-related EEG activity. Electroencephalography and Clinical Neurophysiology 1995; 94(5): 381-4.

      Cohen MX, Donner TH. Midfrontal conflict-related theta-band power reflects neural oscillations that predict behavior. Journal of Neurophysiology 2013; 110(12): 2752-63.

      Figure 4D: can you exclude that the frontal activity is not due to saccade artifacts? Only eye blink artifacts were reduced by the ICA approach. Trials with saccades should be identified in the MEG traces and rejected prior to further analysis.

      We understand and appreciate the reviewer’s concern on the source of the activity modulations shown in Fig. 4D. We tried to minimise the eye movement or saccade in the recording by presenting all figures at the centre of the screen, scaling all presented figures to similar size, and presenting a white cross at the centre of the screen preparing the participants for the onset of the stimuli. Despite this, participants my still make eye movements and saccade in the recording. We used ICA to exclude the low frequency large amplitude artefacts which can be related to either eye blink or other large eye movements. However, this may not be able to exclude artefacts related to miniature saccades. As shown in Fig. 4D, on the sensor level, the sensors with significant difference between the negative vs. positive emotional valence condition clustered around frontal cortex, close to the eye area. However, we think this is not dominated by saccades because of the following two reasons:

      1.) The power spectrum of the saccadic spike artifact in MEG is characterized by a broadband peak in the gamma band from roughly 30 to 120 Hz (Yuval-Greenberg et al., 2008; Keren et al., 2010). In this study the activity modulation we observed in the frontal sensors are limited to the theta/alpha frequency band, so it is different from the power spectra of the saccadic spike artefact.

      2.) The source of the saccadic spike artefacts in MEG measurement tend to be localized to the region of the extraocular muscles of both eyes (Carl et al., 2012).We used beamforming source localisation to identify the source of the activity modulation reported in Fig. 4D. This beamforming analysis identified the source to be in the Broadmann area 9 and 10 (shown in Fig. 5). This excludes the possibility that the activity modulation in the sensor level reported in Fig. 4D is due to saccades. In addition, Broadman area 9 and 10, have previously been associated with emotional stimulus processing (Bermpohl et al., 2006), Broadman area 9 in the left hemisphere has also been used as the target for repetitive transcranial magnetic stimulation (rTMS) as a treatment for drug-resistant depression (Cash et al., 2020). The source localisation results, together with previous literature on the function of the identified source area suggest that the activity modulation we observed in the frontal cortex is very likely to be related to emotional stimuli processing.

      References:

      Yuval-Greenberg S, Tomer O, Keren AS, Nelken I, Deouell LY. Transient induced gamma-band response in EEG as a manifestation of miniature saccades. Neuron 2008; 58(3): 429-41.

      Keren AS, Yuval-Greenberg S, Deouell LY. Saccadic spike potentials in gamma-band EEG: characterization, detection and suppression. NeuroImage 2010; 49(3): 2248-63.

      Carl C, Acik A, Konig P, Engel AK, Hipp JF. The saccadic spike artifact in MEG. NeuroImage 2012; 59(2): 1657-67.

      Bermpohl F, Pascual-Leone A, Amedi A, Merabet LB, Fregni F, Gaab N, et al. Attentional modulation of emotional stimulus processing: an fMRI study using emotional expectancy. Human Brain Mapping 2006; 27(8): 662-77.

      Cash RFH, Weigand A, Zalesky A, Siddiqi SH, Downar J, Fitzgerald PB, et al. Using Brain Imaging to Improve Spatial Targeting of Transcranial Magnetic Stimulation for Depression. Biological Psychiatry 2020.

      The coherence modulations in Fig 5 occur quite late in time compared to the power modulations in Fig 3 and 4. When discussing the results (in e.g. the abstract) it reads as if these findings are reflecting the same process. How can the two effect reflect the same process if the timing is so different?

      As the reviewer pointed out correctly, the time window where we observed the coherence modulations happened quite late in time compared to the initial power modulations in the frontal cortex and the habenula (Fig. 4). And there was another increase in the theta band activities in the habenula area even later, at around 3 second after stimuli onset when the emotional figure has already disappeared. Emotional response is composed of a number of factors, two of which are the initial reactivity to an emotional stimulus and the subsequent recovery once the stimulus terminates or ceases to be relevant (Schuyler et al., 2014). We think these neural effects we observed in the three different time windows may reflect different underlying processes. We have discussed this in the ‘Discussion’:

      "These activity changes at different time windows may reflect the different neuropsychological processes underlying emotion perception including identification and appraisal of emotional material, production of affective states, and autonomic response regulation and recovery (Phillips et al., 2003a). The later effects of increased theta activities in the habenula when the stimuli disappeared were also supported by other literature showing that, there can be prolonged effects of negative stimuli in the neural structure involved in emotional processing (Haas et al., 2008; Puccetti et al., 2021). In particular, greater sustained patterns of brain activity in the medial prefrontal cortex when responding to blocks of negative facial expressions was associated with higher scores of neuroticism across participants (Haas et al., 2008). Slower amygdala recovery from negative images also predicts greater trait neuroticism, lower levels of likability of a set of social stimuli (neutral faces), and declined day-to-day psychological wellbeing (Schuyler et al., 2014; Puccetti et al., 2021)."

      References:

      Schuyler BS, Kral TR, Jacquart J, Burghy CA, Weng HY, Perlman DM, et al. Temporal dynamics of emotional responding: amygdala recovery predicts emotional traits. Social Cognitive and Affective Neuroscience 2014; 9(2): 176-81.

      Phillips ML, Drevets WC, Rauch SL, Lane R. Neurobiology of emotion perception I: The neural basis of normal emotion perception. Biological Psychiatry 2003a; 54(5): 504-14.

      Haas BW, Constable RT, Canli T. Stop the sadness: Neuroticism is associated with sustained medial prefrontal cortex response to emotional facial expressions. NeuroImage 2008; 42(1): 385-92.

      Puccetti NA, Schaefer SM, van Reekum CM, Ong AD, Almeida DM, Ryff CD, et al. Linking Amygdala Persistence to Real-World Emotional Experience and Psychological Well-Being. Journal of Neuroscience 2021: JN-RM-1637-20.

      Be explicit on the degrees of freedom in the statistical tests given that one subject was excluded from some of the tests.

      We thank the reviewers for the comment. The number of samples used for each statistics analysis are stated in the title of the figures. We have now also added the degree of freedom in the main text when parametric statistical tests such as t-test or ANOVAs have been used. When permutation tests (which do not have any degrees of freedom associated with it) are used, we have now added the number of samples for the permutation test.

      Reviewer #2 (Public Review):

      In this study, Huang and colleagues recorded local field potentials from the lateral habenula in patients with psychiatric disorders who recently underwent surgery for deep brain stimulation (DBS). The authors combined these invasive measurements with non-invasive whole-head MEG recordings to study functional connectivity between the habenula and cortical areas. Since the lateral habenula is believed to be involved in the processing of emotions, and negative emotions in particular, the authors investigated whether brain activity in this region is related to emotional valence. They presented pictures inducing negative and positive emotions to the patients and found that theta and alpha activity in the habenula and frontal cortex increases when patients experience negative emotions. Functional connectivity between the habenula and the cortex was likewise increased in this band. The authors conclude that theta/alpha oscillations in the habenula-cortex network are involved in the processing of negative emotions in humans.

      Because DBS of the habenula is a new treatment tested in this cohort in the framework of a clinical trial, these are the first data of its kind. Accordingly, they are of high interest to the field. Although the study mostly confirms findings from animal studies rather than bringing up completely new aspects of emotion processing, it certainly closes a knowledge gap.

      In terms of community impact, I see the strengths of this paper in basic science rather than the clinical field. The authors demonstrate the involvement of theta oscillations in the habenula-prefrontal cortex network in emotion processing in the human brain. The potential of theta oscillations to serve as a marker in closed-loop DBS, as put forward by the authors, appears less relevant to me at this stage, given that the clinical effects and side-effects of habenula DBS are not known yet.

      We thank the reviewers for the favourable comments about the implication of our study in basic science and about the value of our study in closing a knowledge gap. We agree that further studies would be required to make conclusions about the clinical effects and side-effects of habenula DBS.

      Detailed comments:

      The group-average MEG power spectrum (Fig. 4B) suggests that negative emotions lead to a sustained theta power increase and a similar effect, though possibly masked by a visual ERP, can be seen in the habenula (Fig. 3C). Yet the statistics identify brief elevations of habenula theta power at around 3s (which is very late), a brief elevation of prefrontal power a time 0 or even before (Fig. 4C) and a brief elevation of Habenula-MEG theta coherence around 1 s. It seems possible that this lack of consistency arises from a low signal-to-noise ratio. The data contain only 27 trails per condition on average and are contaminated by artifacts caused by the extension wires.

      With regard to the nature of the activity modulation with short latency after stimuli onset: whether this is an ERP or oscillation? We have now investigated this. In summary, by analysing the ERP and removing the influence of the ERP from the total power spectra, we didn’t observe stimulus emotional valence related modulation in the ERP, and the modulation related to emotional valence in the pure induced (non-phase-locked) power spectra was similar to what we have observed in the total power shown in Fig. 3. Therefore, we argue that the theta/alpha increase with negative emotional stimuli we observed in both habenula and prefrontal cortex 0-500 ms after stimuli onset are not dominated by visual or other ERP.

      With regard to the signal-to-noise ratio from only 27 trials per condition on average per participant: We have tried to clean the data by removing the trials with obvious artefacts characterised by increased measurements in the time domain over 5 times the standard deviation and increased activities across all frequency bands in the frequency domain. After removing the trials with artefacts, we have 27 trials per condition per subject on average. We agree that 27 trials per condition on average is not a high number, and increasing the number of trials would further increase the signal-to-noise ratio. However, our studies with EEG recordings and LFP recordings from externalised patients have shown that 30 trials was enough to identify reduction in the amplitude of post-movement beta oscillations at the beginning of visuomotor adaption in the motor cortex and STN (Tan et al., 2014a; Tan et al., 2014b). These results of motor error related modulation in the post-movement beta have been repeated by other studies from other groups. In Tan et al. 2014b, with simultaneous EEG and STN LFP measurements and a similar number of trials (around 30), we also quantified the time-course of STN-motor cortex coherence during voluntary movements. This pattern has also been repeated in a separate study from another group with around 50 trials per participant (Talakoub et al., 2016). In addition, similar behavioural paradigm (passive figure viewing paradigm) has been used in two previous studies with LFP recordings from STN from different patient groups (Brucke et al., 2007; Huebl et al., 2014). In both studies, a similar number of trials per condition around 27 was used. The authors have identified meaningful activity modulation in the STN by emotional stimuli. Therefore, we think the number of trials per condition was sufficient to identify emotional valence induced difference in the LFPs in the paradigm.

      We agree that the measurement of coherence can be more susceptible to noise and suffer from the reduced signal-to-noise ratio in MEG recording. In Hirschmann et al. 2013, 5 minutes of resting recording and 5 minutes of movement recording from 10 PD patients were used to quantify movement related changes in STN-cortical coherence and how this was modulated by levodopa (Hirschmann et al., 2013). Litvak et al. (2012) have identified movement-related changes in the coherence between STN LFP and motor cortex with recording with simultaneous STN LFP and MEG recordings from 17 PD patients and 20 trials in average per participant per condition (Litvak et al., 2012). With similar methods, van Wijk et al. (2017) used recordings from 9 patients and around on average in 29 trials per hand per condition, and they identified reduced cortico-pallidal coherence in the low-beta decreases during movement (van Wijk et al., 2017). So the trial number per condition participant we used in this study are comparable to previous studies.

      The DBS extension wires do reduce signal-to-noise ratio in the MEG recording. therefore the spatiotemporal Signal Space Separation (tSSS) method (Taulu and Simola, 2006) implemented in the MaxFilter software (Elekta Oy, Helsinki, Finland) has been applied in this study to suppress strong magnetic artifacts caused by extension wires. This method has been proved to work well in de-noising the magnetic artifacts and movement artifacts in MEG data in our previous studies (Cao et al., 2019; Cao et al., 2020). In addition, the beamforming method proposed by several studies (Litvak et al., 2010; Hirschmann et al., 2011; Litvak et al., 2011) has been used in this study. In Litvak et al., 2010, the artifacts caused by DBS extension wires was detailed described and the beamforming was demonstrated to effectively suppress artifacts and thereby enable both localization of cortical sources coherent with the deep brain nucleus. We have now added more details and these references about the data cleaning and the beamforming method in the main text. With the beamforming method, we did observe the standard movement-related modulation in the beta frequency band in the motor cortex with 9 trials of figure pressing movements, shown in the following figure for one patient as an example (Figure 5–figure supplement 1). This suggests that the beamforming method did work well to suppress the artefacts and help to localise the source with a low number of trials. The figure on movement-related modulation in the motor cortex in the MEG signals have now been added as a supplementary figure to demonstrate the effect of the beamforming.

      Figure 5–figure supplement 1. (A) Time-frequency maps of MEG activity for right hand button press at sensor level from one participant (Case 8). (B) DICS beamforming source reconstruction of the areas with movement-related oscillation changes in the range of 12-30 Hz. The peak power was located in the left M1 area, MNI coordinate [-37, -12, 43].

      References:

      Tan H, Jenkinson N, Brown P. Dynamic neural correlates of motor error monitoring and adaptation during trial-to-trial learning. Journal of Neuroscience 2014a; 34(16): 5678-88.

      Tan H, Zavala B, Pogosyan A, Ashkan K, Zrinzo L, Foltynie T, et al. Human subthalamic nucleus in movement error detection and its evaluation during visuomotor adaptation. Journal of Neuroscience 2014b; 34(50): 16744-54.

      Talakoub O, Neagu B, Udupa K, Tsang E, Chen R, Popovic MR, et al. Time-course of coherence in the human basal ganglia during voluntary movements. Scientific Reports 2016; 6: 34930.

      Brucke C, Kupsch A, Schneider GH, Hariz MI, Nuttin B, Kopp U, et al. The subthalamic region is activated during valence-related emotional processing in patients with Parkinson's disease. European Journal of Neuroscience 2007; 26(3): 767-74.

      Huebl J, Spitzer B, Brucke C, Schonecker T, Kupsch A, Alesch F, et al. Oscillatory subthalamic nucleus activity is modulated by dopamine during emotional processing in Parkinson's disease. Cortex 2014; 60: 69-81.

      Hirschmann J, Ozkurt TE, Butz M, Homburger M, Elben S, Hartmann CJ, et al. Differential modulation of STN-cortical and cortico-muscular coherence by movement and levodopa in Parkinson's disease. NeuroImage 2013; 68: 203-13.

      Litvak V, Eusebio A, Jha A, Oostenveld R, Barnes G, Foltynie T, et al. Movement-related changes in local and long-range synchronization in Parkinson's disease revealed by simultaneous magnetoencephalography and intracranial recordings. Journal of Neuroscience 2012; 32(31): 10541-53.

      van Wijk BCM, Neumann WJ, Schneider GH, Sander TH, Litvak V, Kuhn AA. Low-beta cortico-pallidal coherence decreases during movement and correlates with overall reaction time. NeuroImage 2017; 159: 1-8.

      Taulu S, Simola J. Spatiotemporal signal space separation method for rejecting nearby interference in MEG measurements. Physics in Medicine and Biology 2006; 51(7): 1759-68.

      Cao C, Huang P, Wang T, Zhan S, Liu W, Pan Y, et al. Cortico-subthalamic Coherence in a Patient With Dystonia Induced by Chorea-Acanthocytosis: A Case Report. Frontiers in Human Neuroscience 2019; 13: 163.

      Cao C, Li D, Zhan S, Zhang C, Sun B, Litvak V. L-dopa treatment increases oscillatory power in the motor cortex of Parkinson's disease patients. NeuroImage Clinical 2020; 26: 102255.

      Litvak V, Eusebio A, Jha A, Oostenveld R, Barnes GR, Penny WD, et al. Optimized beamforming for simultaneous MEG and intracranial local field potential recordings in deep brain stimulation patients. NeuroImage 2010; 50(4): 1578-88.

      Litvak V, Jha A, Eusebio A, Oostenveld R, Foltynie T, Limousin P, et al. Resting oscillatory cortico-subthalamic connectivity in patients with Parkinson's disease. Brain 2011; 134(Pt 2): 359-74.

      Hirschmann J, Ozkurt TE, Butz M, Homburger M, Elben S, Hartmann CJ, et al. Distinct oscillatory STN-cortical loops revealed by simultaneous MEG and local field potential recordings in patients with Parkinson's disease. NeuroImage 2011; 55(3): 1159-68.

      I doubt that the correlation between habenula power and habenula-MEG coherence (Fig. 6C) is informative of emotion processing. First, power and coherence in close-by time windows are likely to to be correlated irrespective of the task/stimuli. Second, if meaningful, one would expect the strongest correlation for the negative condition, as this is the only condition with an increase of theta coherence and a subsequent increase of theta power in the habenula. This, however, does not appear to be the case.

      The authors included the factors valence and arousal in their linear model and found that only valence correlated with electrophysiological effects. I suspect that arousal and valence scores are highly correlated. When fed with informative yet highly correlated variables, the significance of individual input variables becomes difficult to assess in many statistical models. Hence, I am not convinced that valence matters but arousal not.

      For the correlation shown in Fig. 6C, we used a linear mixed-effect modelling (‘fitlme’ in Matlab) with different recorded subjects as random effects to investigate the correlations between the habenula power and habenula-MEG coherence at an earlier window, while considering all trials together. Therefore the reported value in the main text and in the figure (k = 0.2434 ± 0.1031, p = 0.0226, R2 = 0.104) show the within subjects correlation that are consistent across all measured subjects. The correlation is likely to be mediated by emotional valence condition, as negative emotional stimuli tend to be associated with both high habenula-MEG coherence and high theta power in the later time window tend to happen in the trials with.

      The arousal scores are significantly different for the three valence conditions as shown in Fig. 1B. However, the arousal scores and the valence scores are not monotonically correlated, as shown in the following figure (Fig. S2). The emotional neutral figures have the lowest arousal value, but have the valence value sitting between the negative figures and the positive figures. We have now added the following sentence in the main text:

      "This nonlinear and non-monotonic relationship between arousal scores and the emotional valence scores allowed us to differentiate the effect of the valence from arousal."

      Table 2 in the main text show the results of the linear mixed-effect modelling with the neural signal as the dependent variable and the valence and arousal scores as independent variables. Because of the non-linear and non-monotonic relationship between the valence and arousal scores, we think the significance of individual input variables is valid in this statistical model. We have now added a new figure (shown below, Fig. 7) with scatter plots showing the relationship between the electrophysiological signal and the arousal and emotional valence scores separately using Spearman’s partial correlation analysis. In each scatter plot, each dot indicates the average measurement from one participant in one emotional valence condition. As shown in the following figure, the electrophysiological measurements linearly correlated with the valence score, but not with the arousal scores. However, the statistics reported in this figure considered all the dots together. The linear mixed effect modelling taking into account the interdependency of the measurements from the same participant. So the results reported in the main text using linear mixed effect modelling are statistically more valid, but supplementary figure here below illustrate the relationship.

      Figure S2. Averaged valence and arousal ratings (mean ± SD) for figures of the three emotional condition. (B) Scatter plots showing the relationship between arousal and valence scores for each emotional condition for each participant.

      Figure 7. Scatter plots showing how early theta/alpha band power increase in the frontal cortex (A), theta/alpha band frontal cortex-habenula coherence (B) and theta band power increase in habenula stimuli (C) changed with emotional valence (left column) and arousal (right column). Each dot shows the average of one participant in each categorical valence condition, which are also the source data of the multilevel modelling results presented in Table 2. The R and p value in the figure are the results of partial correlation considering all data points together.

      Page 8: "The time-varying coherence was calculated for each trial". This is confusing because coherence quantifies the stability of a phase difference over time, i.e. it is a temporal average, not defined for individual trials. It has also been used to describe the phase difference stability over trials rather than time, and I assume this is the method applied here. Typically, the greatest coherence values coincide with event-related power increases, which is why I am surprised to see maximum coherence at 1s rather than immediately post-stimulus.

      We thank the reviewer for pointing out this incorrect description. As the reviewer pointed out correctly, the method we used describe the phase difference stability over trials rather than time. We have now clarified how coherence was calculated and added more details in the methods:

      "The time-varying cross trial coherence between each MEG sensor and the habenula LFP was first calculated for each emotional valence condition. For this, time-frequency auto- and cross-spectral densities in the theta/alpha frequency band (5-10 Hz) between the habenula LFP and each MEG channel at sensor level were calculated using the wavelet transform-based approach from -2000 to 4000 ms for each trial with 1 Hz steps using the Morlet wavelet and cycle number of 6. Cross-trial coherence spectra for each LFP-MEG channel combination was calculated for each emotional valence condition for each habenula using the function ‘ft_connectivityanalysis’ in Fieldtrip (version 20170628). Stimulus-related changes in coherence were assessed by expressing the time-resolved coherence spectra as a percentage change compared to the average value in the -2000 to -200 ms (pre-stimulus) time window for each frequency."

      In the Morlet wavelet analysis we used here, the cycle number (C) determines the temporal resolution and frequency resolution for each frequency (F). The spectral bandwidth at a given frequency F is equal to 2F/C while the wavelet duration is equal to C/F/pi. We used a cycle number of 6. For theta band activities around 5 Hz, we will have the spectral bandwidth of 25/6 = 1.7 Hz and the wavelet duration of 6/5/pi = 0.38s = 380ms.

      As the reviewer noticed, we observed increased activities across a wide frequency band in both habenula and the prefrontal cortex within 500 ms after stimuli onset. But the increase of cross-trial coherence starts at around 300 ms. The increase of coherence in a time window without increase of power in either of the two structures indicates a phase difference stability across trials in the oscillatory activities from the two regions, and this phase difference stability across trials was not secondary to power increase.

      Reviewer #3 (Public Review):

      This paper describes the oscillatory activity of the habenula using local field potentials, both within the region and, through the use of MEG, in connection to the prefrontal cortex. The characteristics of this activity were found to vary with the emotional valence but not with arousal. Sheding light on this is relevant, because the habenula is a promising target for deep brain stimulation.

      In general, because I am not much on top of the literature on the habenula, I find difficult to judge about the novelty and the impact of this study. What I can say is that I do find the paper is well-written and very clear; and the methods, although quite basic (which is not bad), are sound and rigourous.

      We thank the reviewer for the positive comments about the potential implication of our study and on the methods we used.

      On the less positive side, even though I am aware that in this type of studies it is difficult to have high N, the very low N in this case makes me worry about the robustness and replicability of the results. I'm sure I have missed it and it's specified somewhere, but why is N different for the different figures? Is it because only 8 people had MEG? The number of trials seems also a somewhat low. Therefore, I feel the authors perhaps need to make an effort to make up for the short number of subjects in order to add confidence to the results. I would strongly recommend to bootstrap the statistical analysis and extract non-parametric confidence intervals instead of showing parametric standard errors whenever is appropriate. When doing that, it must be taken into account that each two of the habenula belong to the same person; i.e. one bootstraps the subjects not the habenula.

      We do understand and appreciate the concern of the reviewer on the low sample numbers due to the strict recruitment criteria for this very early stage clinical trial: 9 patients for bilateral habenula LFPs, and 8 patients with good quality MEGs. Some information to justify the number of trials per condition for each participant has been provided in the reply to the Detailed Comments 1 from Reviewer 2. The sample number used in each analysis was included in the figures and in the main text.

      We have used non-parametric cluster-based permutation approach (Maris and Oostenveld, 2007) for all the main results as shown in Fig. 3-5. Once the clusters (time window and frequency band) with significant differences for different emotional valence conditions have been identified, parametric statistical test was applied to the average values of the clusters to show the direction of the difference. These parametric statistics are secondary to the main non-parametric permutation test.

      In addition, the DICS beamforming method was applied to localize cortical sources exhibiting stimuli-related power changes and cortical sources coherent with deep brain LFPs for each subject for positive and negative emotional valence conditions respectively. After source analysis, source statistics over subjects was performed. Non-parametric permutation testing with or without cluster-based correction for multiple comparisons was applied to statistically quantify the differences in cortical power source or coherence source between negative and positive emotional stimuli.

      References:

      Maris E, Oostenveld R. Nonparametric statistical testing of EEG- and MEG-data. Journal of Neuroscience Methods 2007; 164(1): 177-90.

      Related to this point, the results in Figure 6 seem quite noisy, because interactions (i.e. coherence) are harder to estimate and N is low. For example, I have to make an effort of optimism to believe that Fig 6A is not just noise, and the result in Fig 6C is also a bit weak and perhaps driven by the blue point at the bottom. My read is that the authors didn't do permutation testing here, and just a parametric linear-mixed effect testing. I believe the authors should embed this into permutation testing to make sure that the extremes are not driving the current p-value.

      We have now quantified the coherence between frontal cortex-habenula and occipital cortex-habenula separately (please see more details in the reply to Reviewer 2 (Recommendations for the authors 6). The new analysis showed that the increase in the theta/alpha band coherence around 1 s after the negative stimuli was only observed between prefrontal cortex-habenula and not between occipital cortex-habenula. This supports the argument that Fig. 6A is not just noise.

    1. Author Response:

      Reviewer #1:

      The authors demonstrate in this study that it is possible to train mice to perform a challenging tactile discrimination task, in a highly controlled manner, in a fully automated setup in which the animals learn to head-fix voluntarily. A number of well described tricks are used to prolong the self-fixation time and thereby obtain enough training time to reach good performance when the decision perceptual decision is difficult. In addition the study establish that this experimental design allows targeted silencing of relatively deep brain areas through a clear skull preparation.

      It has already been demonstrated that mice can perform voluntary head-fixation and can do behavioral tasks in this context. However, this is the first time this methodology is applied to first to a tactile task and second to a task that mice learn is thousands of trials. Another advantage of the present technique is that it is fully automated and allows training without virtually any human intervention.

      The demonstration that optogenetic silencing can be performed in this context is nice but not very surprising as already done in other contexts. Nevertheless it is an interesting application of self head-fixation. The authors should make sure that a maximum of information is available relative to the efficiency of the silencing (fraction of cells silenced) and about its impact on the behavior (does it result or not in a complete impairment?).

      We have improved presentation in various places of the paper to provide more information about the optogenetic manipulation. We added new analysis of the fraction of neurons affected by photostimulation (Figure 8E). We also analyzed the impact on behavioral performance relative to chance performance (Figure S4A and S6). We compared the effect size to prior studies (Figure S4) and we discuss the interpretation of effect size (Discussion, page 22).

      In the power range tested in this study, photostimulation did not reduce performance to chance level (Figure S6). One limitation of the optogenetic workflow is the interpretation of behavioral deficit effect size. We examined this issue in ALM, a brain region from which we have the most extensive data. In previous studies, we have shown that bilateral photoinhibition of ALM results in chance level performance (Li et al 2016, Fig 2b; Gao et al, 2018, Extended Data Fig 6b). Here, mice performance was above chance during photoinhibition of ALM (Figure S4). This difference in effect size likely resulted from incomplete silencing of ALM. The photostimulus intensity used here was much less than those used in previous studies (0.3 vs. 11.9 mW/mm2). In addition, a single virus injection was not sufficient to cover the entire ALM. Thus a partial behavioral effect could be due to incomplete silencing of a brain region, or partial involvement of the brain region in the task. Given this limitation, we caution that the function of a brain region could only be fully deduced in more detailed analysis and together with neurophysiology. The workflow presented here can be used as a discovery platform to quickly identify regions of interest for more detailed neurophysiology analysis. We now better highlight these points in the Discussion.

      Reviewer #2:

      Hao and colleagues developed an automatic system for high-throughput behavioral and optogenetic experiments for mice in home cage settings. The system includes a voluntary head-fixation apparatus and integrated fiber-free optogenetic capabilities. The authors describe in detail the design of the system and the stages for successful automatic training. They perform proof-of-concept experiments to validate their system. The experiments are technically solid and I am convinced that their system will be of interest to some laboratories that perform similar experiments. Despite the large variety of similar automated systems out there, this one may prove to become a popular design.

      The weak side of the work is that it is not particularly novel scientifically. The system is complex but there it is not an innovative technology. The body of the study has too many technical details as if it is a Methodological section of a regular manuscript. There are bits of interesting information scattered around the paper (like the insights about the strategy mice use, which stem from the regression analysis), but these are not developed into any coherent direction that answers outstanding questions. The potential advantages of this system compared to other systems is marginal. In my eyes, the fact that manual training is so similar to the automatic one is not only a positive point. Rather, it signifies that the differences are mainly quantitative (e.g. # of mice a lab can train per day, etc). Thus, even as a methods paper, the lack of qualitative difference between this and other methods weakens it as a potential substrate for novel findings.

      The automated workflow presented here significantly boosts the yield and duration of training to rival and slightly surpass that of manual training for the first time (new Supplemental Table 1). We think this degree of automation is an important technical advance. We show that the workflow can significantly scale up the throughput of optogenetic experiments probing behaviors that require thousands of trials to learn. This enables efficient and systematic mapping of large subcortical structures that are previously difficult to achieve. We better highlight comparisons to previous methods in several key areas in the Supplemental Table 1. We have also strengthened the Discussion (page 20).

      We highlight one line of inquiry enabled by our workflow, a systematic mapping of the cortico-basal- ganglia loops during perceptual decision-making. The striatum is topographically organized. Previous studies examined different subregions of the striatum in different perceptual decision behaviors, making comparisons across studies difficult. The striatum in the mouse brain is ~21.5 mm3 in size (Allen reference brain, (Wang, et al, Cell 2020)). Optogenetic experiments using optical fibers manipulate activity near the fiber tip (approximately 1 mm3). A systematic survey of different striatal domains’ involvement in specific behaviors is currently difficult. In our workflow, individual striatal subregions (~1 mm3, Figure 8) could be rapidly screened through parallel testing. At moderate throughput (15 mice / 2 months), a screen that tiles the entire striatum could be completed in under 12 months with little human effort. To illustrate its feasibility, we tested 3 subregions in the striatum previously implicated in different types of perceptual decision behaviors (Yartsev et al, eLife 2018; Sippy et al, Neuron 2015; Znamenskiy & Zador, Nature 2013), including an additional region in the posterior striatum that do not receive ALM and S1 inputs. The results revealed a hotspot in the dorsolateral striatum that biased tactile-guided decision-making (Figure 8). Our approach thus opens the door to rapid screening of the striatal domains during complex operant behaviors.

      Moreover, by eliminating human intervention, automated training allows quantitative assaying of task learning (Figure 4). Home-cage testing also exposes behavioral signatures of motivation in self-initiated behavior (Figure 6). These observations suggest additional opportunities for inquires of goal-directed behaviors in the context of home-cage testing.

      Reviewer #3:

      In this study, Hao et al. developed an automatized operant box to perform decision-making tasks and optogenetic perturbations without requiring the experimenter's manipulation. For this aim, mice learn to head-fix and to perform a task by themselves. The optogenetic experiment using red-shifted opsins allows manipulation of circuits without the need of an implanted optical fiber. The automation of behavioral tasks in home cages (isolated rodents or in groups) is an intense area of research in neuroscience. The possibility of coupling home cage behavioral analysis with optogenetic manipulation and with complex tasks that require precise positioning of the animal for controlled stimulations (vibrating stimulation, visual …..) is thus of great interest and I commend the authors for their comprehensive dissection of the automated behavioral training setup. Some clarification, reporting of additional behavioral measures and refinement of analyses could improve the impact of this work.

      1) The first part of the paper nicely describes the experimental procedure to automate such a complex task. The procedure is very well described, the important points (e.g. the possibility for the animal to disengage…) are properly highlighted, and the online site allows to download the plans and 3D descriptions of the tools and the procedures. The authors compare task learning in automated versus manual training and show that there are overall very few differences. Whisker trimming reduces performance, indicating that animal used information to make the choice. This part of the work is already impressive. Apart from that, the authors do not consider in their description what could be an essential aspect of experiments in a home-cage, i.e the control of the motivation to perform the task. Mice perform the task (here, engage in the head fixation to obtained reward) when they wish and thus, compared with the manual training, there is no explicit control of the animal motivation. This could have consequence on i) the inter-fixation intervals that become an element of the decision and ii) questioned whether the commitment to the task is always motivated by drinking, or whether there is also a commitment to explore, or to check… This could impact the success in the task (e.g. if the animal is not motivated by water, it can explore…). Adding data analyses (information about the daily water consumption, are the inter-fixation intervals correlated with the success or failure in the last trial …) and even short discussion or introduction of these aspects (see for example Timberlake et al, JEAB 1987 or Rowland et al 2008, Physiol behavior for distinction between close and open economies paradigm) could strengthened the behavioral description.

      We thank the reviewer for these suggestions. We performed additional analyses to examine these issues which led us to include a new section of Results in the revised manuscript (page 13-14 and Figure 6).

      We have added a new Figure 6 showing water consumption and body weight information in home-cage testing. At steady state, a mouse typically consumed ~1mL of water daily (~400 rewarded trials) while maintaining stable body weight. This amount of water consumption was similar to mice engaged in daily manual experiments (Guo et al, Plos ONE 2014). The number of head-fixations per day was correlated with body weight (Figure 6). Since body weight reflects prior water consumption, this indicates different levels of motivation due to thirst, which drives engagement in the task.

      We also examined the inter-fixation-interval. Interestingly, the inter-fixation-interval after an error (which led to no reward) was significantly longer than following a correct trial (Figure 6E). This is inconsistent with error from exploration. Rather it likely reflects a loss of motivation after an error, perhaps due to the loss of an expected reward. We suspect that error trials violated the mice’s expectation of reward, and therefore discouraged the mice, leading to a loss in motivation. Consistent with this interpretation, we also found a significant increase in inter-fixation-intervals shortly after a sensorimotor contingency reversal (Figure 6F), coinciding with an increase in error rate due to the rule change.

      Despite these changes in motivation to engage in the task, the choice behavior in the task was similar. In highly trained mice, task performance was stable despite the body weight change (Figure 6D). Logistic regression analysis of the choice behavior shows that mice maintained the same strategy in their choice behavior (Figure 6G).

      2) In the second part of the work, the authors focus on the description of choice behavior. To characterize it, the authors used a logistic model to predict choices. They suggest that at the beginning of the task the animals biased their current choice by their last choice (parameter A1) and that once the task is learned they alternate according to the current stimulation (parameters S0). The model was a logistic function of the weighted sum of several behavioral and task variables and has 19 parameters (the ß parameters). If the animal only used these two informations, can a model that only takes into account A1 and S0 reproduce the data? If not, this certainly indicates that other informations (even distributed) are necessary; and also indicates individual strategies. Finally, analyses are made by considering trials as a discrete chain (trial n, n+1…). However, the self-head-fixed methodology causes the trials to be organized with more or less time between successive trials depending on motivation (see above). Again, do the authors note differences in performance according to the timing between trials? Could it be a variable in the model?

      We thank the reviewer for these great suggestions. We tested a model that included only choice history A1, tactile stimulus S0, and a constant bias term (β0). This 3-parameter model performed as well as the full model in predicting choice. This indicates that other factors do not contribute significantly to the choice behavior. We have included this result in the revised Figure 4C.

      We next examined whether inter-fixation-interval (i.e. the time elapsed between head-fixations and presumably the motivation to engage in the task) could impact mice’s choice behavior. There are multiple ways inter-fixation-interval could be incorporated into the logistic regression model. For example, it could be modeled as an explicit variable that biases left/right choice, or modulations on existing regressors (i.e. a gain variable that modulates the contribution of specific regressors). Each approach requires assumptions about how motivation affects the behavioral strategy of the mice. Instead, as a first order analysis, we examined whether the logistic regression model could predict choice equally well in trials following short vs. long inter-fixation-intervals. Our logic is that if mice adapted different strategies in different motivational states (reflected in short vs. long inter-fixation- intervals), the predictive power of the model would differ between these conditions. We fit the logistic regression model using trials in their natural sequential order (regardless of the inter-fixation-intervals). The model was then used to predict choice on independent trials. Trials were then sorted by the preceding inter-fixation-intervals. Prediction performance was calculated separately for trials following short vs. long inter-fixation-intervals. We did not find a significant difference in the model prediction performance. The result was similar in early and late stages of task learning (Figure 6G), even though mice used distinct strategies during these periods (Figure 4). These results suggest consistent strategies in the choice behavior. We have included this analysis in the new Figure 6.

      3) The third part described optogenetic manipulations. It is clear that group sizes are small. Nevertheless, if the objective was to show that the method works, the results are convincing. Some experimental details and in particular the choice of the statistical procedure need clarification.

      We have improved the presentation and clarified experimental details of the task, hypotheses for targeting specific brain regions, and statistical procedures.

    1. Author Response:

      Reviewer #1:

      The authors note how previous studies on myocardial infarction have usually studied individual tissues and not examined the cross talk between tissues and their dysregulation. To address this challenge they have therefore performed, in a mouse model of MI, an integrated analysis of heart, liver, skeletal muscle and adipose tissue responses at 6 and 24 hours. They have then validated their findings at 24 hours in two independent mouse model data sets.

      A major strength is their comprehensive approach. They have used high throughput RNA seq and applied integrative network analysis. They show for multiple genes whether they are up regulated or down regulated in these four tissues at the 6 and 24 hour time points and whether the regulation directions are concordant or opposite and note in particular that for the liver both concordant and opposite effects occur. They identify key tissue specific clusters in each tissue and identify the key genes in each cluster. Finally they use whole body modelling to identify cross talk between tissues.

      A further strength of this paper is the integration of transcriptomic data (differential expression, functional analysis and reporter metabolite analysis). The final strength is the very clear presentation of the findings and their implications such that the reader gets a very clear message and at the same time can go in to more detail if this is their area of research interest.

      There are no major weaknesses. The authors have achieved their aims and the data supports their conclusions.

      This work represents a major advance in both methodology and understanding of a multi tissues approach to the study of the metabolic impact of MI and the underlying up and down regulation of relevant genes.

      The relevance of these findings in human MI will need to be tested and may ultimately have therapeutic implications.

      First and foremost, we would like to thank the reviewer for the positive and encouraging comments. We agree that further research, especially rigorous validation of the findings from this work in humans, is needed and hopefully it can be translated into clinical settings. Moreover, we would like to thank the reviewer for his highlight on our comprehensive approach that we hope can be a framework for future multi-tissue research in disease setting.

      Reviewer #2:

      The authors collected post-myocardial infarction (MI) transcriptome data from a mouse model as well as sham-operated control mice to identify systemic molecular changes in multiple tissues at pathway level. The data were collected at two time points (6 hours and 24 hours post-MI), and several computational systems biology tools were applied to the dataset to identify altered molecular processes. The applied tools vary from very standard tools (eg. enrichment analysis) to advanced methods based on mapping data on biological networks. A specific focus was put on the altered signaling pathways as well as metabolic pathways and metabolites. Identified up-/down-regulated pathways were in agreement with the literature.

      Strengths:

      • One unique aspect of the work is the fact that the transcriptomic data were collected from not only heart, the source tissue for MI, but also from three more tissues (liver, skeletal mouse, adipose). Therefore, molecular alterations in the related tissues were also able to be monitored and discussed comparatively. The introduced transcriptomic dataset has a high re-use potential by other researchers in the field since coverage of responses by four tissues at two different time points makes it unique.

      • Correlation-based coexpression networks were created for all four tissues, and some of the clusters in these networks were shown to be tissue-specific clusters, which nicely validates both the experimental and computational approach in the paper.

      • The results were validated by using independent transcriptomic datasets available in the literature. The authors showed that there is a high overlap between their dataset and the literature datasets in terms of identified differentially expressed genes and enriched pathways. This additional validation strengthens the results reported in the manuscript.

      • Use of a variety of computational approaches and showing that they point to similar or complementary molecular mechanisms increase the impact of the paper. The employed computational tools include not only information-extraction methods such as enrichment, coexpression networks, reporter metabolites, but also predictive methods based on modelling. The authors construct a multi-tissue genome-scale metabolic network covering all four tissues of interest in the study, and they show that this model can correctly predict some major post-MI changes in the metabolism. It is interesting to see that two completely different computational approaches (constraint-based metabolic modeling versus information-extraction based approaches) point to same/similar molecular mechanisms.

      We would like to thank the reviewer for providing positive comments and a comprehensive summary of our work. We also really appreciate the constructive comments from the reviewer to improve our work.

      Weaknesses:

      • Regarding predictions made by multi-tissue metabolic network modeling, the control case fluxes were predicted by maximizing the rate of lipid droplet accumulation in the adipose tissue. Although there is an agreement between the model predictions and the results obtained by other bioinformatics tools used in the study as well as literature information, it looks rather oversimplification to assume that all other three tissues are programmed to serve for maximum fat production in adipose tissue. This should be further elaborated by the authors.

      We would like to thank the reviewer for the comment and we agree that there is a simplification of the situation in the modeling. However, we would like also to emphasize that the model has been carefully constrained with the dietary composition as well as the tissue specific resting energy expenditure. In our opinion, these constraints have already included a great part of the metabolic activity and satisfied the basic metabolic needs of the mice. The rest of the energy in the diet could be either used as physical activity (energy production in muscle) or stored as fat (lipid droplet accumulation in adipose tissue), and in our analysis, we assumed the latter as we think it is more realistic in this study as mice in the cage might have very little physical activities. We added a clarification for this in the revised manuscript as follows,

      ‘To simulate the metabolic flux distribution in the sham-operated mice, we set the lipid droplet accumulation reaction in adipose tissue (m3_Adipose_LD_pool) as the objective function as we assume the energy additional to the resting energy expenditure will be mostly stored as fat rather than used by the muscle for physical activities because mice raised in the cages might have very little exercise. Then, we used parsimonious FBA to calculate the flux distribution.’

      Reviewer #3:

      In the manuscript, "Integrative transcriptomic analysis of tissue-specific metabolic crosstalk after myrocardial infarction" by Arif et al., the authors describe analyses of transcriptomes of +/- myocardial infarction (MI) mice. The study is useful and reports interesting results. These results could be of interest to further develop cellular insight in effects and treatments for MI. However, I do not find any methodological advances here. The manuscript appears to be a repository of transcriptomics analyses. All the techniques used have been tried and applied to other scientific problems. The authors have presented differential expression analysis, followed by GSEA, and then they perform different network analyses - co-expression networks, reporter analyses, multi-tissue model, etc.

      My main issues are that the authors do too many different analyses but neither of them get sufficient light in the paper. Also no other independent quantitative evidence is shown in support of results of their analyses. Further, validation was done the same way the pipeline was built. This makes their results comes across as circular. For e.g. when validating metabolic models of cells built using transcriptomic data, CRISPR-Cas9 essentiality screens are used. Here, they basically repeated the same analyses on the same transcriptome from a different experiment it appears.

      First of all, we would like to thank the reviewer for the positive summary of our research. We agree that this study can be useful to be explored further, especially by validating it in human. We also would like to thank you for the constructive comments. We agree that we presented multiple transcriptomics analyses that have been used before. Apart from understanding the metabolic effect of MI in multiple tissues (which is unique as of now), our secondary goal is to propose a novel integrative framework for analyzing multi-tissue transcriptomics data based on the available techniques. We would like to emphasize that, even though the singular analyses were not novel, the integrative analysis in multi-tissue and disease setting both at transcriptomic and metabolic crosstalk level is a strong novelty of this study. This required employing not only state-of-art network analyses but also reconstruction of multi-tissue models through new methods that enable joint modeling of the metabolic interactions within and between tissues.

      As this study is unique (as of now), we tried our best to validate it with other data with similar settings (from a tissue and we found only transcriptomics data) and run our pipeline to validate and strengthen our findings. Moreover, we also recognized the limitation that all the results presented in this study are purely based on transcriptomics data (as stated in the “Discussion” section of the manuscript). More experiments, such as with metabolomics and proteomics data, are in our pipeline to complement the results from the current study. In summary, we recognized the reviewer’s concerns and we would like to address it in our future studies.

    1. First, we can control for whether the Pradhan is new or not. It would not be legitimate to compare investments in all unreserved GP where Pradhan are new to those in reserved GP: the fact that the Pradhan is new may reflect unobserved characteristics of the GP, and this non-random sample selection would bias the results. There is, however, a random subset of unreserved GP where the Pradhan is always new in office. Individuals may run for a council seat only in the village in which she or he resides. Once elected, the councilors choose one of them to be Pradhan. As part of the reservation scheme, one third of council seats (identified by village) were reserved for women: thus, if the previous councilor was a man, and his seat was reserved for a woman in the 1998 election, we can be sure that the Pradhan for that GP will be new to office. We can therefore compare women Pradhan to this subgroup of new Pradhan, to control for the fact that the Pradhan is new in office. Clearly, this does not fully control for the Pradhan’s experience: even new Pradhan could be experienced politicians.Second, we can control for whether the Pradhan is likely to be re-elected in 2003. Every third GP starting with the second in the list will be reserved for a female Pradhan for the 2003 election. Pradhan in those GP should realize that they will not be able to stand for re-election as Pradhan (if their particular seat is not reserved, they may still be able to run for a position of member of the GP council). We therefore restrict the sample to GP reserved in 1998 and those which will be reserved in 2003, to examine whether and to what extent the differences we observe are due to the fact that women may not think they have a chance to be reelected.10 Again, men could still have a longer horizon in office than women, if they plan to be elected on another position or be elected again in 10 years.Finally, we take advantage of the reservation of about 44% of the seats to SC and ST. These reservation were also selected randomly, and within each list, one third of positions were reserved for women. Irrespective of their gender, all the leaders elected under this reservation policy tend to be new leaders and to be elected in large part due to the quota system. They also tend to

      hello

    1. Author Response:

      Reviewer #1 (Public Review):

      The manuscript by Schrieber et al., explores whether inbreeding affects floral attractiveness to pollinators with additional factors of sex and origin in play, in male and female plants of Silene latifolia. The authors use a combination of spatial sampling, floral volatiles, flower color, and floral rewards coupled with the response of a specialized pollinator to these traits. Their results show that females are more affected by inbreeding and in general inbreeding negatively impacts the "composite nature" of floral traits. The manuscript is well written, the experiments are detailed and quite elaborate. For example., the methodology for flower color estimation is the most detailed effort in this area that I can remember. All the experiments in the manuscript show meticulous planning, with extensive data collection addressing minute details, including the statistics used. However, I do have some concerns that need to be addressed.

      Core strengths: Detailed experimental design, elaborate data collection methods, well-defined methodology that is easy to follow. There is a logical flow for the experiments, and no details are missing in most of the experiemnts.

      Weaknesses: A recent study has addressed some of the questions detailed in the manuscript. So, introduction needs to be tweaked to reflect this.

      Thank you very much for bringing this excellent article to our attention! We adjusted the writing in the introduction and the discussion accordingly. Please consider that this article was first published at the 15th of January 21, while our manuscript was submitted at the 9th of January. Hence, we were not able to account for this study in the first submission. Introduction pp 4-5, ll 48-54: “Although in a few cases inbreeding has been shown to alter single components of flower attractiveness (Ivey and Carr, 2005; Ferrari et al., 2006; Haber et al., 2019), insight into syndrome-wide effects is restricted to a single study. Kariyat et al. (2021) demonstrated that inbred Solanum carolinense L. display reduced flower size, pollen and scent production and receive fewer visits from diurnal generalists. It is necessary to broaden such integrated methodological approaches to other plant-pollinator systems (e.g., nocturnal specialist pollinators) and further floral traits (i.e., flower colour).” Discussion p 19, ll 535-542: “In summary, our research on S. latifolia suggests that in addition to inbreeding disrupting interactions with herbivores by changing plant leaf chemistry (Schrieber et al., 2018) it affects plant interactions with pollinators by altering flower chemistry. Our observations are in line with studies on other plant species (Ivey and Carr, 2005; Kariyat et al., 2012, 2021) and highlight that inbreeding has the potential to reset the equilibrium of species interactions by altering functional traits that have developed in a long history of co-evolution. These threats to antagonistic and symbiotic plant-insect interactions may mutually magnify in reducing plant individual fitness and altering the dynamics of natural plant populations under global change.”

      Some details and controls are missing in floral scent estimation. Flower age, a pesticide treatment of plants that could affect chemistry..needs to be better refined.

      We clarified this issue at different occasions in the methods section. Previous studies (and our study) on S. latifolia have shown no clear differences in the quality of floral scent between sexes. However, one study found higher total emission of VOC in males, while others found no differences. Hence, females produce no specific VOC that are used as oviposition cues but may be differentiated from males by the total amount of emitted VOC and pronounced differences in spatial flower traits. We highlight this at p 6, ll 111-116: “Silene latifolia exhibits various sexual dimorphisms with male plants producing more and smaller flowers that excrete lower volumes of nectar with higher sugar concentrations as compared to females (Gehring et al., 2004; Delph et al., 2010). The quality of floral scent exhibits no clear sex-specific patterns, while male plants have been shown to emit higher or equal total amounts of VOC as compared to females in different studies (Dötterl & Jürgens 2005, Waelti et al. 2009)”.

      Both male and female moths show pronounced behavioural responses to lilac aldehyde isomers and other VOC in the floral scent of S. latifolia (Dötterl et al., 2006). We therefore treated these VOC as typical floral scent compounds. We clarified this at p 7, ll 125-126: “A substantial fraction of floral VOC produced by S. latifolia triggers antennal and behavioural responses in male and female H. bicruris moths (Dötterl et al., 2006).” and p 9, ll 2010-218:” For targeted statistical analyses, we focused on those VOC that evidently mediate communication with H. bicruris according to Dötterl et al. (2006). We analysed the Shannon diversity per plant (calculated with R-package: vegan v.2.5-5, Oksanen et al. 2019) for 20 floral VOC in our data set that were shown to elicit electrophysiological responses in the antennae of H. bicruris (Supplementary File 1). Moreover, we analysed the intensities of three lilac aldehyde isomers, which trigger oriented flight and landing behaviour in both male and female H. bicruris most efficiently when compared to other VOC in the floral scent of S. latifolia. Furthermore, H. bicruris is able to detect the slightest differences in the concentration of these three compounds at very low dosages (Dötterl et al. 2006).”

      We used biological pest control agents in a preventive manner because S. latifolia is often infested by thrips and aphids under greenhouse conditions. The writing in the previous manuscript version was not clear with this regard and we changed the text at p 8, ll 157-161: ” Plants received water and fertilisation (UniversolGelb 12-30-12, Everris-Headquarters, NL) when necessary for the entire experimental period and were prophylactically treated with biological pest control agents under greenhouse conditions to prevent thrips (agent Amblyseius barkeri and Amblyseius cucumeris) and aphid (agent Chrysoperla carnea) infestation (Katz Biotech GmbH, GE) .”

      Indeed, flower size and scent emission can be correlated. Although the question whether differences in scent emission were based on a difference in flower size is an interesting one, it seemed less relevant to us because it is unlikely that our pollinators correct their perception of a scent for the size of a flower (see also p 19, 520-526). We were rather interested in whether scent emission differs between the plant treatments and thus pollinators may chemically perceive such differences. Moreover, we found it problematic to correct our models for flower size by including it as a covariate, which is the reason why we have not assessed this trait during scent collection. In this case, we would have corrected our scent responses for the effects of inbreeding, sex and population origin (i.e., the predictors we are interested in) because all of them determine the size of a flower (Figure 2 c,d). Hence, the inbreeding, sex and origin effects on flower scent would likely vanish. However, it is highly unlikely that the set of genes contributing to sex-, breeding treatment- and origin-based variation in flower size is exactly the same one that determines variation in scent emission per flower, which is basically the assumption underlying the model that includes flower size as a covariate. We critically mentioned the trade-off relationships and our reasoning to not correct for flower size at 9p ll 208-210: “The intensities of VOC were not corrected for flower size because we wanted to capture all variation in scent emission that is relevant for the receiver i.e., the pollinator.”

      While the study is laser-focused on floral traits, as the authors are aware inbreeding affects the total phenotype of the plants including fitness and defense traits. For example, there are quite a few studies that have shown how inbreeding affects the plant defense phenotype. This could be addressed in the introduction and discussion.

      We agree that this aspect is important and therefore addressed it in further detail in the introduction at p 4 ll 34-38: “While it is well established that inbreeding can increase a plant’s susceptibility to herbivores by diminishing morphological and chemical defences (Campbell et al., 2013; Kariyat et al., 2012; Kalske et al., 2014), its effects on plant-pollinator interactions are less well understood. Inbreeding may reduce a plant’s attractiveness to pollinating insects by compromising the complex set of floral traits involved in interspecific communication.” Since other referees suggested to rather tone down than increase the discussion based on floral scent results, we stick to the general feedback relationship among of herbivory and pollination, rather than relating it specifically to volatiles in the discussion at p 19, ll 535-544: “In summary, our research on S. latifolia suggests that in addition to inbreeding disrupting interactions with herbivores by changing plant leaf chemistry (Schrieber et al., 2018) it affects plant interactions with pollinators by altering flower chemistry. Our observations are in line with studies on other plant species (Ivey and Carr, 2005; Kariyat et al., 2012, 2021) and highlight that inbreeding has the potential to reset the equilibrium of species interactions by altering functional traits that have developed in a long history of co-evolution. These threats to antagonistic and symbiotic plant-insect interactions may mutually magnify in reducing plant individual fitness and altering the dynamics of natural plant populations under global change. As such, our study adds to a growing body of literature supporting the need to maintain or restore sufficient genetic diversity in plant populations during conservation programs.”

      Reviewer #2 (Public Review):

      A summary of what the authors were trying to achieve. This interesting and data-rich paper reports the results of several detailed experiments on the pollination biology of the dioceus plant Silene latfolia. The authors uses multiple accessions from several European (native range) and North American (introduced range) populations of S. latifolia to generate an experimental common garden. After one generation of within-population crosses, each cross included either two (half-)siblings or two unrelated individuals, they compared the effects of one-generation of inbreeding on multiple plant traits (height, floral size, floral scent, floral color), controlling for population origin. Thereby, they set out to test the hypothesis that inbreeding reduces plant attractiveness. Furthermore, they ask if the effect is more pronounced in female than male plants, which may be predicted from sexual selection and sex-chromosome-specific expression, and if the effect of inbreeding larger in native European populations than in North American populations, that may have already undergone genetic purging during the bottleneck that inbreeding reduces plant attractiveness. Finally, the authors evaluate to what extent the inbreeding-related trait changes affect floral attractiveness (measured as visitation rates) in field-based bioassays.

      An account of the major strengths and weaknesses of the methods and results. The major strength of this paper is the ambitious and meticulous experimental setup and implementation that allows comparisons of the effect of multiple predictors (i.e. inbreeding treatment, plant origin, plant sex) on the intraspecific variation of floral traits. Previous work has shown direct effects of plant inbreeding on floral traits, but no previous study has taken this wholesale approach in a system where the pollination ecology is well known. In particular, very few studies, if any, has tested the effects of inbreeding on floral scent or color traits. Moreover, I particularly appreciate that the authors go the extra mile and evaluate the biological importance of the inbreeding-induced trait variation in a field bioassay. I also very much appreciate that the authors have taken into account the biological context by using a relevant vision model in the color analyses and by focusing on EAD-active compounds in the floral scent analyses.

      The results are very interesting and shows that the effects of inbreeding on trait variation is both origin- and sex-dependent, but that the strongest effects were not always consistent with the hypothesis that North American plants would have undergone genetic purging during a bottleneck that would make these plants less susceptible to inbreeding effects. The authors made a large collection effort, securing seeds from eight populations from each continent, but then only used population origin and seed family origin as random factors in the models, when testing the overall effect of inbreeding on floral traits. It would have been very interesting with an analysis that partition the variance both in the actual traits under study and in the response to inbreeding to determine whether to what extent there is variation among populations within continents. Not the least, because it is increasingly clear that the ecological outcome of species interactions (mutualistic/antagonistic) in nursery pollination systems often vary among populations (cf. Thompson 2005, The geographic mosaic of coevolution), and some results suggest that this is the case also in Hadena-Silene interactions (e.g. Kephardt et al. 2006, New Phytologist). Furthermore, some plants involved in nursery pollination systems both show evidence of distinct canalization across populations of floral traits of importance for the interaction (e.g. Svensson et al. 2005), whereas others show unexpected and fine-grained variation in floral traits among populations (e.g. Suinyuy et al. 2015, Proceedings B, Thompson et al. 2017 Am. Nat., Friberg et al. 2019, PNAS). Hence, it is possible that the local population history and local variation in the interactions between the plants and their pollinators may be more important predictors for explaining variation in floral trait responses to inbreeding, than the larger-scale continental analyses. Not the least, because North American S. latifolia probably has multiple origins, with subsequent opportunity for admixture in secondary contact.

      Yes, it is necessary to put populations from the same continent into one category, since native and invasive plant populations differ significantly in their evolutionary history (p 5, ll 74-81, http://onlinelibrary.wiley.com/doi/10.1111/j.1365-294X.2012.05751.x). Origin explained sufficient amounts of variation in several traits including flower number, corolla expansion, VOC diversity, lilac aldehyde A intensity, and pollinator visitation rates (see Figures 2-3; and Table 2) and some variation in in the magnitude of inbreeding effects (Figure 2e, f; Figure 3). Even if we would not be interested in differences among native and invasive populations, we would have to include origin as a fixed effect in our models because:

      i) populations within a distribution range are no independent samples,

      ii) origin explains sufficient variation in many responses,

      iii) origin cannot be fitted as a random factor, since it has only two levels (the minimum number of levels for random effect is 4). We agree that it would be very interesting to specifically assess differences in the magnitude of breeding and sex effects among populations within origins. We now discuss this as important future research direction at p 18, ll 500-507: “As such, the precise mechanisms underlying variation in inbreeding effects on different scent traits across population origins of S. latifolia can only be explored based on comprehensive genomic resources, which are currently not available. Future studies should also incorporate field-data on the abundance of specialist pollinators and extend the focus from variation in the magnitude of inbreeding effects among geographic origins to variation among populations within geographic origins and individuals within populations. This would allow a detailed quantification of geographic variation in inbreeding effects and elaborating on the causes and ecological consequences of such variation (Thompson, 2005; Schrieber and Lachmuth, 2017; Thompson et al., 2017)”.

      To empirically address within-origin variation of inbreeding effects with our data, we would have to i) fit correlated random intercepts and slopes for the interaction breeding-sex on the population random factor (models consume min. 22 DF); or ii) include population as a fixed effect in our models (models consume min. 67 DF). We have tried both of these approaches when preparing the revision, but unfortunately it turned out that our study is not designed to address this question. The models for both variants only partially converge (see R-script ll. 1568-1580), and even if they do this does not imply that one can draw solid inference from them. Approach i often results in multiple singular convergence warning messages implying that no variance is explained by population-specific reaction norms to the fixed effects specified in the random effects structure. Approach ii results in odd rank- deficient models (I was seriously worried about type I errors). We simply have too few replicates (5) per population-breeding treatment-sex combination for both approaches. For solid inference we would need 10approach i-40approach ii replicates = 640-2600 individuals. However, our experimental design is sufficient to address the hypothesis we have raised in the introduction as well as general differences in response variables among populations. We now provide information on variance partitioning for all models that include population as a random effect in S9. As you will see, population explains lower amounts of variation in our responses as the fixed effects in 9 out of 12 models. The random effects maternal and paternal genotype (mother&father) explain more variation than the random effect population in 6 of 12 cases. Thus, these data do not make a strong case for an extensive discussion of population-based differences in floral traits and this was also not a question or hypotheses we wanted to address with our study.

      I see no major weaknesses in the study, and but in my detailed response, I have made a few questions and suggestions about the floral scent analyses. In short, the authors have used a technique that is not the standard method used for making quantitative floral scent analyses, and I am curious about how it was made sure that the results obtained from the static headspace sampling using PDMS adsorbents could be used as a quantitative measure. I would suggest the authors to validate the use of this method more thoroughly in the manuscript, and have detailed this comment in my response to the authors.

      Also, and this may seem like a nit-picky comment, I am not convinced that the best way to describe the traits under study is "plant attractiveness", because in the experimental bioassays, most of the traits under study that are affected by the inbreeding treatment, did not result in a reduced pollinator visitation. Most (or all) of these traits may also be involved in other plant functions and important for other interactions, so I suggest potentially using a term like "floral traits" or "(putative) signalling traits".

      We now avoid the term floral attractiveness throughout the manuscript and instead refer to “floral traits”.

      An appraisal of whether the authors achieved their aims, and whether the results support their conclusions: By and large, the authors achieved the aims of this study, and drew conclusions based in these results. One interesting aspect of this work that I think could be discussed a bit deeper is the lack of congruence between the effects of inbreeding on floral traits and the variation in visitation pattern in the bioassay. In fact, the only large effect of inbreeding on a floral trait that may play a role as an explanatory factor is the reduction of emission of lilac aldehyde A in inbred female S. latifolia from North America, which correspond to a reduced visitation rate in this group in the pollinator visitation bioassay. I have made some specific suggestions in my comments to the authors.

      We agree that this aspect required deeper discussion and revised the section at p 19, ll 520-526 accordingly. We believe that the limited spatial vision of H. bicruris in combination with our experimental setup for pollinator observations increased the relative importance of floral scent for pollinator visitation rates (suggested by referee #3).

      A discussion of the likely impact of the work on the field, and the utility of the methods and data to the community: I think that one important aspect of this work that may broaden the impact of this study further is the link between these experiment, and our expectations from the evolution of selfing. Selfing plant species most often conform to the selfing syndrome, presenting smaller, less scented flowers than outcrossing relatives. Traditionally, the selfing syndrome is explained by natural selection against individuals that invest energy into floral signalling, when attracting pollinators is no longer crucial for reproduction. Some studies (for example Andersson, 2012, Am. J. Bot), however, have shown that only one, or a few, generations of inbreeding may reduce floral size as much as quite strong selection for reduced signalling. Here, at least for some populations and sexes, similar results are obtained in this paper regarding several traits (including floral scent), and one way to put this paper in context is by discussing the results in the light of these previous papers.

      We now address this issue at p 16, ll 417-420: “However, our findings highlight that even weak degrees of biparental inbreeding (i.e., one generation sib-mating) can result in a severe reduction of spatial flower trait and scent trait values that is detectable against the background of natural variation among multiple plant populations from a broad geographic region. This observation indirectly supports that the selfing syndrome (i.e., smaller, less scented flowers observed in selfing relative to outcrossing populations of hermaphroditic plant species) may not merely be a result of natural selection against resource investment into floral traits, but also a direct negative consequence of inbreeding (Andersson, 2012).”

      Reviewer #3 (Public Review):

      Schrieber et al. studied the effects of biparental inbreeding in the dioecious plant Silene latifolia, focusing specifically on traits important for floral attractiveness and pollinator attraction. These traits are especially important for dioecious species with separate sexes as they are obligate outcrossers. The authors find that inbreeding mostly decreases floral attractiveness, but that this effect tended to be stronger in the female flowers, which the authors suspect to result from the trade-off with larger investment in the sexual functions in the female plants. The authors then go on to couple the changes in visual and olfactory floral traits to pollinator attraction which allows them to conclude or at least speculate that differences in pollinator behavior are mostly driven by the changes in olfactory traits. The study is robust in its broad and well-balanced sampling of populations, rigorous and in large part meticulously documented experimental designs and linking of the effects on mechanisms to ecological function. The hypothesis are clearly stated and the study is able to address them mostly convincingly. However, some of the aspects of the decisions the authors made and possible caveats need to be addressed and elaborated on.

      A major caveat, in my opinion, is that while the authors find stronger effects of inbreeding on pollinator visitation rates in the plants from the North American (Na) origin, these plants were tested in an environment that was foreign to them, which could have important consequences for the results of this study. This is specifically because the main pollinator Hadena bicruris moth is completely absent from the populations in Na, and yet, was the main pollinator observed in the pollinator attraction experiment. As this pollinator is also a seed predator, the Na populations are released from the selection pressure to avoid attracting the females of this species and thus risking the loss of seeds and fitness. In fact, some of the results suggest that the release from the specialist pollinator and seed predator in Na has led to increase in the attractiveness of the female flowers based on the higher number of flowers visited in the outcrossed females compared to outcrossed males in the plant from the Na origin and the similar, though not statistically significant, pattern in the olfactory cue. While ideally this pollinator attraction experiment should be repeated within the local range of the Na plants, this is of course is not feasible. Instead I suggest the problem should be addressed in the discussion explicitly and its consequences for the interpretation of the results should be considered.

      Indeed, North American populations are tested in their “away”- habitat only and the observed plant performance and pollinator visitation rates can thus provide no direct implications for their “home”-habitat. We state this now more clearly at pp 11-12, ll 283-285. However, our design is appropriate for investigating inbreeding effects on plant-pollinator interactions in multiple plant populations in a common environment. Given the close taxonomic relationship of H. bicruris (main pollinator in Europe) and H. ectypa (main pollinator in North America), the behavioural responses of the former species to variation in the quality of its host plant was considered to overlap sufficiently with responses of the latter species as outlined at pp 11-12, ll 285-291.

      The hypothesis that North American (NA) S. latifolia evolved higher attractiveness to female Hadena moths because H. ectypa is not able to oviposit on female plants in contrast to H. bicruris is indeed a highly interesting one. However, as you have outlined correctly, our study is not designed to elaborate on questions related to adaptive evolutionary differentiation among North American and European plants. Instead of addressing this hypothesis based on our data, we thus take reference to previous studies in the discussion p 17, ll 482-487: “As discussed in detail in previous studies, higher flower numbers in North American S. latifolia plants (Figure 1b) may result from changes in the selective regimes for numerous abiotic factors (Keller et al., 2009) or from the release of seed predation. As opposed to H. bicruris, H. ectypa pollinates North American S. latifolia without incurring costs for seed predation, which may result in the evolution of higher flower numbers, specifically in female plants (Elzinga and Bernasconi, 2009).”

      The incorporation of the VOC data in the actual manuscript was quite limited and I found the reasoning for picking only the three lilac aldehydes (in addition to the Shannon diversity index) for the univariate statistical tests insufficient. How much more efficient was the effect of the lilac aldehydes compared to the other 17 compounds deemed important in the previous study? While the data on this one aldehyde matches the pollinator attraction results, having one compound out of 70 (or out of 20 if only considering the ones identified important for the main pollinator) seems, perhaps, fortuitous lest there is a good reason for focusing on these particular compounds.

      We adapted the text to increase clarity but sticked to our previous choice for the analyses of VOC data.

      i) We now explain our choice of analysing lilac aldehydes with more detail p9, ll 210-218: “For targeted statistical analyses, we focused on those VOC that evidently mediate communication with H. bicruris according to Dötterl et al. (2006). We analysed the Shannon diversity per plant (calculated with R-package: vegan v.2.5-5, Oksanen et al. 2019) for 20 floral VOC in our data set that were shown to elicit electrophysiological responses in the antennae of H. bicruris (Supplementary File 1). Moreover, we analysed the intensities of three lilac aldehyde isomers, which trigger oriented flight and landing behaviour in both male and female H. bicruris most efficiently when compared to other VOC in the floral scent of S. latifolia. Furthermore, H. bicruris is able to detect the slightest differences in the concentration of these three compounds at very low dosages (Dötterl et al. 2006).”

      ii) If one analyses 20 compounds with zero-inflation models (actually two models in one) + 8 floral trait models + 2 pollinator visitation models (zi-models with two component models), one ends up with 52 models investigating complex fixed and random effect structures. To keep type-1 errors as low as possible (see also comment 2.12.b from Referee#2), we approached the more comprehensive VOC data sets with multivariate analyses or Shannon diversity.

      iii) We tested the effect of sexoriginbreeding treatment on the Shannon diversity of 20 active VOC as well as in the random forest analyses with the 20 VOC and 70 VOC dataset and transparently reported the results from all of these analyses in the manuscript. Hence, the incorporation of VOC data was not limited. However, we agree that we have taken too little reference to these results and now changed the text accordingly. Results section p 13 ll 351-354: ”Multivariate statistical analyses of 20 H. bicruris active VOC and all 70 VOC detected in S. latifolia revealed no clear separation of floral headspace VOC patterns for any of the treatments (Figure 2-figure supplement 2). In summary, the combined effects of breeding treatment, sex and range on floral scent were rather week.”

      Sampling time of VOCs is reported ambiguously. Was it from 21:00 to 17:00 the next day or in fact from 9pm to 5AM (instead of 5 pm as reported)? Please be more specific in the text as this is quite important. If sampling tubes were left in place during the daytime, some of the compounds could have evaporated due to heating of the tubes in the summer. It would also be important to mention whether all of the headspace VOCs were sampled on the same day and whether there could be variation in i.e. temperature.

      Thank you very much for identifying this typo! It is from 9 pm to 5 am (p 9, l 186).

      Considering the experimental setup for the pollinator attraction observations and the pooling of the data at the block level (which I think is the right choice) it seems possible the authors were more likely to get a result where pollinator behavior matches the long-distance cue, the VOCs. Short-distance cues such a subtle difference in flower size would perhaps not be distinguished with the current setup. I would be interested to know if the authors agree, and if so, mention this in the discussion.

      Thank you very much for this excellent suggestion! We agree and discuss this aspect in detail at p 19, ll 520-526. Indeed, one would need two different experimental setups to assess the contributions of long and short distance cues. Our setup (large distances among plots) is optimal for long distance cues, while a setup for short distance cues should have all plants in close spatial proximity. However, the latter approach does then not allow to address long-distance cues and to exclude competition/facilitation for pollinators among plants from different treatment groups.

    1. We are not going to save each other, ourselves, America, or the world. But we certainly can leave it a little bit better. As my grandmother used to say, “If the Kingdom of God is within you, then everywhere you go, you ought to leave a little Heaven behind.”

      Yes we may think things are in turmoil in this world but it is up to us to do something and do our part to make the future better.

    1. Rooney tells us Marianne is isolated, lonely and also very smart but she never actually shows us Marianne’s supposedly exceptional intelligence or feelings of remoteness. Marianne may have been intended to appear deeply flawed, or difficult, but we only ever get a sense of her isolation through her persistent sexual degradation at the hands of men. As a character, Marianne is a cipher, inherently desirable to men and envied by women.

      I found it to be interesting that Rooney differentiated how Connell and Marianne were both shown. It is clearly shown through specific situations and examples that Connell is this popular, well-loved individual but when it comes to Marianne, I tend to agree with this statement. Rooney tells us all of these poor characteristics about Marianne, such as the fact that she's a loner and oddly smart but when do we ever see this actually play out in the story? We don't. I find this difference to be a little disturbing to be honest. I don't see why the men are viewed as so transparent and women are seen as this overly complex individual that we think we know but in reality, never actually get to know on a deeper level.

    1. Author Response:

      Reviewer #2:

      The current work makes the case that local neural measurements of selectivity to stimulus features and categories can, under certain circumstances, be misleading. The authors illustrate this point first through simulations within an artificial, deep, neural network model that is trained to map high-level visual representations of animals, plants, and objects to verbal labels, as well as to map the verbal labels back to their corresponding visual representations. As activity cycles forward and backward through the model, activity in the intermediate hidden layer (referred to as the "Hub") behaves in an interesting and non-linear fashion, with some units appearing first to respond more to animals than objects (or vice-versa) and then reversing category preference later in processing. This occurs despite the network progressively settling to a stable state (often referred to as a "point attractor"). Nevertheless, when the units are viewed at the population level, they are able to distinguish animals and objects (using logistic regression classifiers with L1- norm regularization) across the time points when the individual unit preferences appear to change. During the evolution of the network's states, classifiers trained at one time point do not apply well to data from earlier or later periods of time, with a gradual expansion of generalization to later time points as the network states become more stable. The authors then ask whether these same data properties (constant decodability, local temporal generalization, widening generalization window, change in code direction) are also present in electrophysiological recordings (ECoG) of anterior ventral temporal cortex during picture naming in 8 human epilepsy patients. Indeed, they find support for all four data properties, with more stable animal/object classification direction in posterior aspects of the fusiform gyrus and more dynamic changes in classification in the anterior fusiform gyrus (calculated in the average classifier weights across all patients).

      Strengths:

      Rogers et al. clearly expose the potential drawbacks to massive univariate analyses of stimulus feature and task selectivity in neuroimaging and physiological methods of all types -- which is a really important point given that this is the predominant approach to such analyses in cognitive neuroscience. fMRI, while having high spatial resolution, will almost certainly average over the kinds of temporal changes seen in this study. Even methods with high temporal and moderate spatial resolution (e.g. MEG, EEG) will often fail to find selectivity that is detectable only though multivariate methods. While some readers may be skeptical about the relevance of artificial neural networks to real human brain function, I found the simulations to be extremely useful. For me, what the simulations show is that a relatively typical multi-layer, recurrent backpropagation network (similar to ones used in numerous previous papers) does not require anything unusual to produce these kinds of counterintuitive effects. They simply need to exhibit strong attractor dynamics, which are naturally present in deep networks with multiple hidden layers, especially if the recurrent network interactions aid the model during training. This kind of recurrent processing should not be thought of as a stretch for the real brain. If anything, it should be the default expectation given our current knowledge of neuroanatomy. The authors also do a good job relating properties detected in their simulations to the ECoG data measured in human patients.

      We thank the reviewer for these positive comments.

      Weaknesses:

      While the ECoG data generally show the properties articulated by the authors, I found myself wanting to know more about the individual patients. Averaging across patients with different electrode locations -- and potentially different latencies of classification on different electrodes -- might be misleading. For example, how do we know that the shifts from negative to positive classification weights seen in the anterior temporal electrode sites are not really reflecting different dynamics of classification in separate patients? The authors partially examine this issue in the Supplementary Information (SI-3 and Figure SI-4) by analyzing classification shifts on individual patient electrodes. However, we don't know the locations of these electrodes (anterior versus posterior fusiform gyrus locations). The use of raw-ish LFPs averaged across the four repetitions of each stimulus (making an ERP) was also not an obvious choice, particularly if one desires to maximize the spatial precision of ECoG measures (compare unfiltered LFPs, which contain prominent low frequency fluctuations that can be shared across a larger spatial extent, to high frequency broadband power, 80-200 Hz).

      In the new statistical tests described above, we compute each metric separately for each patient, then conduct cross-subject statistical tests against a null hypothesis to assess whether the global pattern observed in the mean data is reliable across patients. We hope this addresses the reviewer's general concern that the mean pattern obscures heterogeneity across patients. With regard to the question of greater variability in anterior electrodes, the new analysis showing a remarkably strong correlation between variability of coefficient change and electrode location along the anterior-posterior axis provides a formal statistical test of this observation. We view variability of decoder coefficients as more informative than the independent correlations between electrode activity and category label shown in the supplementary materials, because the coefficients indicate the influence of electrode activity on classification when all other electrode states are taken into account (akin in some ways to a partial correlation coefficient). This distinction is noted in SI-3, p 48.

      The authors are well-known for arguing that conceptual processing is critically mediated by a single hub region located in the anterior temporal lobe, through which all sensory and motor modalities interact. I think that it's worth pointing out that the current data, while compatible with this theory, are also compatible with a conceptual system with multiple hubs. Deep recurrent dynamics from high-level visual processing, for which visual properties may be separated for animals and objects in the posterior aspects of the fusiform gyrus, through to phonological processing of object names may operate exactly as the authors suggest. However, other aspects of conceptual processing relating to object function (such as tool use) may not pass through the anterior fusiform gyrus, but instead through more posterior ventral stream (and dorsal stream) regions for which the high-level visual features are more segregated for animals versus tools. Social processing may similarly have its own distinct networks that tie in to visual<- >verbal networks at a distinct point. So while the authors are persuasive with regard to the need for deep, recurrent interactions, the status of one versus multiple conceptual hubs, and the exact locations of those hubs, remains open for debate.

      We agree that the current data does not speak to hypotheses about other components of the cortical semantic network outside the field-of-view of our dataset. We have added an explicit statement of this in the General Discussion (page 22).

      The concepts that the authors introduce are important, and they should lead researchers to examine the potential utility of multivariate classification methods for their own work. To the extent that fMRI is blind to the dynamics highlighted here, supplementing fMRI with other approaches with high temporal resolution will be required (e.g. MEG and simultaneous fMRI-EEG). For those interested in applying deep neural networks to neuroscientific data, the current demonstration should also be a cautionary tale for the use of feed-forward-only networks. Finally, the authors make an important contribution to our thinking about conceptual processing, providing novel arguments and evidence in support of point-attactor models.

      Thanks to the reviewer for highlighting these points, which we take to be central contributions of this work!

      Reviewer #3:

      The authors compared how semantic information is encoded as a function of time between a recurrent neural network trained to link visual and verbal representations of objects and in the ventral anterior temporal lobe of humans (ECOG recordings). The strategy is to decode between 'living' and 'nonliving' objects and test/train at different timepoints to examine how dynamic the underlying code is. The observation is that coding is dynamic in both the neural network as well as the neural data as shown by decoders not generalizing to all other timepoints and by some units contributing with different sign to decoders trained at different timepoints. These findings are well in line with extensive evidence for a dynamic neural code as seen in numerous experiments (Stokes et al. 2013, King&Dehaene 2014).

      Strengths of this paper include a direct model to data comparison with the same analysis strategy, a model capable of generating a dynamic code, and the usage of rare intracranial recordings from humans. Weaknesses: While the model driven examination of recordings is a major strength, the data analysis does only provide limited support for the major claim of a 'distributed and dynamic semantic code' - it isn't clear that the code is semantic and the claims of dynamics and anatomical distribution are not quantitative.

      Major issues:

      1) Claims re a 'semantic code'. The ECOG analysis shows that decoding 'living from 'nonliving' during viewing of images exhibits a dynamic code, with some electrodes coding to early decodability and some to later, and with some contributing with different signs. It is a far stretch to conclude from this that this shows evidence for a 'dynamic semantic code'. No work is done to show that this representation is semantic- in fact this kind of single categorical distinction could probably be done also based on purely visual signals (such as in higher levels of a network such as VGG or higher visual cortex recordings). In contrast the model has rich structure across numerous semantic distinctions.

      We have added a new analysis showing that the animate/inanimate distinction cannot be decoded for these stimuli from purely visual information as captured by a well-known unsupervised method for computing visual similarity structure amongst bitmap line drawings (Chamfer matching). We did not consider deep layers of the VGG-19 model as that model is explicitly trained to assign photographs to human-labeled semantic categories, so the representations do not reflect purely visual structure. The new analysis appears as part of the description of the stimulus set on page 31.

      The proposal that ventral anterior temporal cortex encodes semantic information is not new to this paper but is based on an extensive prior literature that includes studies of semantic impairments in patients with pathology in this area (e.g. refs 7, 13, 29-32), studies of semantic disruption by TMS applied to this region (refs. 37-38 ), functional brain imaging of semantic processing with PET (33), distortion-corrected MRI (34-36), MEG (e.g. Mollos et al., 2017, PLOS ONE), and ECOG (ref. 46), and neurally-constrained computational models of developing, mature, and disordered semantic processing (refs. 7, 31, 40, 53). A great deal of this literature uses the same animate/inanimate distinction employed here as a paradigmatic example of a semantic distinction. It is especially useful in the current case because the animate/inanimate distinction is unrelated to the response elicited by the stimuli (the basic-level name).

      2) Missing quantification of model-data comparison. These conclusions aren't supported by quantitative analysis. This includes importantly statements regarding anatomical location (Fig 4E), ressemblenes in dynamic coding patterns ('overlapping waves' Fig 4C-D), and presence of electrodes that 'switch sign'. These key conclusions seem to be derived purely by graphical inspection, which is not appropriate.

      We have added new statistical analyses of each core claim as explained above.

      3) ECOG recordings analysis. Raw LFP voltage was used as the feature (if I interpreted the methods correctly, see below). This does not seem like an appropriate way to decode from ECOG signals given the claims that are made due to sensitivity to large deflections (evoked potentials). Analysis of different frequency bands, power, phase etc would be necessary to substantiate these claims. As it stands, a simpler interpretation of the findings is that the early onset evoked activity (ERPs) gives rise to clusters 1-4, and more sustained deflections to the other clusters. This could also give rise to sign changes as ERPs change sign.

      The reviewer's comment suggests that information about the category should be reflected in spectral properties of the time-varying signals but not the direction/magnitude of the LFP itself. While we recognize that this is a common hypothesis in the literature, an alternative hypothesis more consistent with neural-network models of cognition suggests that such information can be encoded in magnitude and direction of the LFP itself—the closest brain analog to unit activity in a neural network model. The fact that semantic information can be accurately decoded from the LFPs, following a pattern closely resembling that arising in the model, is consistent with this hypothesis. We agree that, in future, it would be interesting to look at decoding of spectral properties of the signal. We note these points on revised manuscript page 22.

      With regard to this comment:

      a simpler interpretation of the findings is that the early onset evoked activity (ERPs) gives rise to clusters 1-4, and more sustained deflections to the other clusters. This could also give rise to sign changes as ERPs change sign

      We are not sure how this constitutes a simpler or even a different explanation of our data. ERPs at an intracranial electrode reflect local neural responses to the stimulus, which change over stimulus processing. The data show that semantic information about the stimulus can be decoded from these signals at the initial evoked response and all subsequent timepoints, but the relationship between the neural response and the semantic category (ie how the semantic information is encoded in the measured response) changes as the stimulus is processed. The changing sign of an ERP reflects changing activity of nearby neural populations. "More sustained deflections" indicates that changes to the code are slowing over time. These are essentially the conclusions that we draw about the dynamic code from our data.

      Maybe the reviewer is concerned that the results are an artifact of just the temporal structure of the LFPs themselves—that these change rapidly with stimulus onset and then slow down, so that the “expanding window” pattern arises from, for instance, temporal auto-correlation in the raw data. Testing this possibility was the goal of the analysis in SI-5, where we show that auto- correlation of the raw LFP signal does not grow broader over time—so the widening-window pattern observed in the generalization of classifiers is not attributable to the temporal autocorrelation structure of the raw data.

    2. Reviewer #2 (Public Review):

      The current work makes the case that local neural measurements of selectivity to stimulus features and categories can, under certain circumstances, be misleading. The authors illustrate this point first through simulations within an artificial, deep, neural network model that is trained to map high-level visual representations of animals, plants, and objects to verbal labels, as well as to map the verbal labels back to their corresponding visual representations. As activity cycles forward and backward through the model, activity in the intermediate hidden layer (referred to as the "Hub") behaves in an interesting and non-linear fashion, with some units appearing first to respond more to animals than objects (or vice-versa) and then reversing category preference later in processing. This occurs despite the network progressively settling to a stable state (often referred to as a "point attractor"). Nevertheless, when the units are viewed at the population level, they are able to distinguish animals and objects (using logistic regression classifiers with L1-norm regularization) across the time points when the individual unit preferences appear to change. During the evolution of the network's states, classifiers trained at one time point do not apply well to data from earlier or later periods of time, with a gradual expansion of generalization to later time points as the network states become more stable. The authors then ask whether these same data properties (constant decodability, local temporal generalization, widening generalization window, change in code direction) are also present in electrophysiological recordings (ECoG) of anterior ventral temporal cortex during picture naming in 8 human epilepsy patients. Indeed, they find support for all four data properties, with more stable animal/object classification direction in posterior aspects of the fusiform gyrus and more dynamic changes in classification in the anterior fusiform gyrus (calculated in the average classifier weights across all patients).

      Strengths:

      Rogers et al. clearly expose the potential drawbacks to massive univariate analyses of stimulus feature and task selectivity in neuroimaging and physiological methods of all types -- which is a really important point given that this is the predominant approach to such analyses in cognitive neuroscience. fMRI, while having high spatial resolution, will almost certainly average over the kinds of temporal changes seen in this study. Even methods with high temporal and moderate spatial resolution (e.g. MEG, EEG) will often fail to find selectivity that is detectable only though multivariate methods. While some readers may be skeptical about the relevance of artificial neural networks to real human brain function, I found the simulations to be extremely useful. For me, what the simulations show is that a relatively typical multi-layer, recurrent backpropagation network (similar to ones used in numerous previous papers) does not require anything unusual to produce these kinds of counterintuitive effects. They simply need to exhibit strong attractor dynamics, which are naturally present in deep networks with multiple hidden layers, especially if the recurrent network interactions aid the model during training. This kind of recurrent processing should not be thought of as a stretch for the real brain. If anything, it should be the default expectation given our current knowledge of neuroanatomy. The authors also do a good job relating properties detected in their simulations to the ECoG data measured in human patients.

      Weaknesses:

      While the ECoG data generally show the properties articulated by the authors, I found myself wanting to know more about the individual patients. Averaging across patients with different electrode locations -- and potentially different latencies of classification on different electrodes -- might be misleading. For example, how do we know that the shifts from negative to positive classification weights seen in the anterior temporal electrode sites are not really reflecting different dynamics of classification in separate patients? The authors partially examine this issue in the Supplementary Information (SI-3 and Figure SI-4) by analyzing classification shifts on individual patient electrodes. However, we don't know the locations of these electrodes (anterior versus posterior fusiform gyrus locations). The use of raw-ish LFPs averaged across the four repetitions of each stimulus (making an ERP) was also not an obvious choice, particularly if one desires to maximize the spatial precision of ECoG measures (compare unfiltered LFPs, which contain prominent low frequency fluctuations that can be shared across a larger spatial extent, to high frequency broadband power, 80-200 Hz).

      The authors are well-known for arguing that conceptual processing is critically mediated by a single hub region located in the anterior temporal lobe, through which all sensory and motor modalities interact. I think that it's worth pointing out that the current data, while compatible with this theory, are also compatible with a conceptual system with multiple hubs. Deep recurrent dynamics from high-level visual processing, for which visual properties may be separated for animals and objects in the posterior aspects of the fusiform gyrus, through to phonological processing of object names may operate exactly as the authors suggest. However, other aspects of conceptual processing relating to object function (such as tool use) may not pass through the anterior fusiform gyrus, but instead through more posterior ventral stream (and dorsal stream) regions for which the high-level visual features are more segregated for animals versus tools. Social processing may similarly have its own distinct networks that tie in to visual<->verbal networks at a distinct point. So while the authors are persuasive with regard to the need for deep, recurrent interactions, the status of one versus multiple conceptual hubs, and the exact locations of those hubs, remains open for debate.

      The concepts that the authors introduce are important, and they should lead researchers to examine the potential utility of multivariate classification methods for their own work. To the extent that fMRI is blind to the dynamics highlighted here, supplementing fMRI with other approaches with high temporal resolution will be required (e.g. MEG and simultaneous fMRI-EEG). For those interested in applying deep neural networks to neuroscientific data, the current demonstration should also be a cautionary tale for the use of feed-forward-only networks. Finally, the authors make an important contribution to our thinking about conceptual processing, providing novel arguments and evidence in support of point-attactor models.

    1. Yes some of the conditions I have described here work to systematically over empowercertain groups. Such privilege simply confers dominance because of one’s race or sex.

      I find these points to be quite interesting, as I think semantics and the specific kinds of words we use when talking about racism are very important and an important discussion to have. Some terms or words we use when talking about racism do not grasp the full extent of the experience it is trying to describe, and may inadvertently downplay the gravity of what the term is trying to address. Similar to how there are now black people asking non-black people to refer to "the n word" as "the n slur", as using the word slur emphasizes it absolutely should not be used by those cannot use it.

    1. This manuscript is in revision at eLife

      The decision letter after peer review, sent to the authors on August 20 2020, follows.

      Summary:

      In this manuscript, Mughrabi et al reported a technical advance of long term vagus nerve stimulation (VNS) in mice. VNS has been used in clinics for treating certain patients with epilepsy and depression and pioneered in clinical trials for a number of disorders including inflammation. Yet, VNS has not been widely used in mice for mechanistic studies largely due to technical challenges dealing with the small size. Here, the authors developed a method for chronic implantation of VNS stimulator in mice, and tested the effectiveness of the method using measurements of heart rate changes and effects on inflammation. This method is potentially useful to investigate the therapeutic potential of long-term VNS in chronic disease models in mice. While reviewers were positive about the work performed in this study including that it was carried out by multiple labs, there are major concerns about certain points and additional essential experiments are needed. These include the need for robust data related to the LPS inflammation studies and histological analysis. There were also missing details of methodologies that decrease the enthusiasm for this study.

      Essential Revisions:

      1) At least two papers (PMID: 28628030, 32521521) have reported implants usable for the same application (long-term VNS in mice) although more extensive validation and characterization were performed in this manuscript. A comparison between those implants and the one in this manuscript needs to be discussed. As the authors stated, one technical challenge is that the vague nerve in mice is very small and fragile. However, it is unclear how the approach presented here is different from previous designs, and in particular, how mechanical damage is reduced using the reported apparatus.

      2) If the paper is going to be a resource, the authors should provide detailed descriptions of the materials and construction of the electrode. Currently the details are sparse and the photos of poor resolution. It is unclear how the custom cuff was built (no details provided in the method section), what materials were used, and whether these materials are bio-compatible. Also, it is not clear whether and how the cuff electrode is appropriately insulated to prevent stimulation of surrounding muscles/nerves. In addition, the touching point between the nerve and the cuff is very easy to be damaged. With the description of the implantation procedure, it should also be made clearer as to when the cuff electrode is place on the nerve. A clear description could prevent torsion or other injury to the nerve.

      3) LPS experiments: All reviewers thought the LPS experiment needed improvement. This study is under-powered and lacks a control group (saline + Sham stim). The LPS study is inconclusive due to a small number of animals. Increasing N to get conclusive data is important because this implant will be very useful to investigate the anti-inflammatory effect of long-term VNS in chronic disease models in mice. Related to this point, out of the 4 animals with bradycardia, 2 animals did not show a decrease in serum TNF. This raises a concern that using heart rate threshold may not be appropriate to deliver a consistent stimulation dose within/across animals if the goal is to get a consistent anti-inflammatory effect. It is likely that vagus efferent fibers responsible for HR decrease (innervating the sinoatrial and atrioventricular nodes) and those responsible for an anti-inflammatory effect are different populations. Those two populations might be differently affected by the implantation surgery and repetitive stimulation. In addition, performing VNS in awake animals is closer to the human situation.

      4) Please confirm that 0.1mg/kg is the correct dose, this seems low to induce this amount of TNFa.

      5) The histology of the vagus nerve raised questions and needs to be addressed. Here were relevant comments by reviewers.

      • In fig 4b, the vagus nerve in the cuff is quite clear, as is the carotid artery. But there are other nerve fragments and/or auto-fluorescent tissue immediately adjacent. What are these? Leads one to wonder if they only stimulated the vagus? The cervical sympathetic travels with the cervical vagus and care is needed to separate them from the carotid sheath. On the right side of fig 4b, the "control" side, they highlight a nerve nowhere near the carotid artery. This is intact tissue, so the vagus has to be next to the carotid artery. There is a big nerve next to the right carotid that I would bet is the vagus. I think they've got it wrong. It is not clear at what level these photos are taken, is it the cervical vagus? The authors should indicate the left and right carotid in these figures.

      • Figure 4. I do not see how fibrosis is determined. Is this actually collagen? Can the sections in B be stained with mason's trichrome. In "B" I am not sure that I see that the indicated regions are in fact the vagus nerve. It is hard to tell what other nerves would be present as there are few indications of the anatomical area these sections are from other than neck. Thus it Is hard to discern if this really is the vagus or not. I would have thought that the carotid artery should be visible in close proximity to the nerve bundle, this seems not to be the case and leads to uncertainty that this is the correct nerve.

      • Was there any difference in histology between mice with functioning and non-functioning cuffs? As stated in Discussion, left VN without surgery in different animals would be a better control than right VN in the same animals.

      6) In the data presented in fig 2 or any of the studies where the kent scientific pulse/ox was used, Did O2 saturation decrease with the change in breathing?

      7) Why didn't animals receiving awake VNS show visible changes in BR, which is in contrast to remarkable changes in BR in anesthetized animals?

      8) In video 1, it is unclear when the stimulation starts or stops. As a result, it is uncertain if the mouse scratching is due to stimulation. Is this a pain/nociceptive response?

      9) Fig 3 is presented in a confusing manner. In "A", I'm not sure why two mice are presented for different days post implantation and what this is showing. There is a clear effect of VNS on the heart rate and breathing (rate, and air flow), is this the minimum current for each day that was found to induce the heart rate threshold change. While I appreciate that the longer pulse widths are less susceptible to the effect of bio-encapsulation of the electrode over time, I'm not sure how one compares 100 uA at 100 us to 400 uA at 600 us. In B how is the HRT achieved without damaging the electrode as the ICIC is exceeded, or are we not understanding this graph correctly? In C there are days that seem to be missing given the legend. The supplementary figure also appears to have data points missing or obscured?

      10) Success rate tops out at 75% with a skilled surgeon, and ranges between 40-60% for your average player. I'd say this is not too good.

      11) It would be nice to show that the implant does not cause chronic inflammation as this would impact its usefulness as a method. The authors should measure tnfa 14 days Post implanted in cuff implanted and sham implanted mice.

      12) What behavioral experiments were done, and what were the results? These are mentioned in several places (line 172, line 279 etc) but not reported.

      13) The vagus nerve is critically involved in many essential body functions. Chronic implantation of the VNS stimulator may cause severe inflammation, nerve damage, and neuronal dysfunction. Therefore, it is critical to demonstrate that the chronic implantation does not alter nerve function. The chronic effect of the VNS stimulator implantation needs to be carefully monitored. For example, whether there is any change in body weight, food intake, as well as the sensitivity of diverse physiological reflexes such as the baroreflex, the Hering-Breuer reflex, and the stomach accommodation reflex.

    1. Author Response:

      Reviewer #1:

      The largest concern with the manuscript is its use of resting-state recordings in Parkinson's Disease patients on and off levodopa, which the authors interpret as indicative of changes in dopamine levels in the brain but not indicative of altered movement and other neural functions. For example, when patients are off medication, their UPDRS scores are elevated, indicating they likely have spontaneous movements or motor abnormalities that will likely produce changed activations in MEG and LFP during "rest". Authors must address whether it is possible to study a true "resting state" in unmedicated patients with severe PD. At minimum this concern must be discussed in the manuscript.

      We agree that Parkinson’s disease can lead to unwanted movements such as tremor as well as hyperkinesias. This would of course be a deviation from a resting state in healthy subjects. However, such movements are part of the disease and occur unwillingly. The main tremor in Parkinson’s disease is a rest tremor and - as the name already suggests – it occurs while not doing anything. Therefore, such movements can arguably be considered part of the resting state of Parkinson’s disease. Resting state activity with and without medication is therefore still representative for changes in brain activity in Parkinson’s patients and indicative of alterations due to medication.

      To further investigate the effect of movement in our patients, we subdivided the UPDRS part 3 score into tremor and non-tremor subscores. For the tremor subscore we took the mean of item 15 and 17 of the UPDRS, whereas for the non-tremor subscore items 1, 2, 3, 9, 10, 12, 13, and 14 were averaged. Following Spiegel et al., 2007, we classified patients as akinetic-rigid (non-tremor score at least twice the tremor score), tremor-dominant (tremor score at least twice as large as the non-tremor score), and mixed type (for the remaining scores). Of the 17 patients, 1 was tremor dominant and 1 was classified as mixed type (his/her non-tremor score was greater than tremor score). None of our patients exhibited hyperkinesias during the recording. To exclude that our results are driven by tremor-related movement, we re-ran the HMM without the tremor-dominant and the mixed-type patient (see Figure R1 response letter).

      ON medication results for all HMM states remained the same. OFF medication results for the Ctx-Ctx and STN-STN state remained the same as well. The Ctx-STN state OFF medication was split into two states: Sensorimotor-STN connectivity was captured in one state and all other types of Ctx-STN connections were captured in another state (see Figure 1 response letter. The important point is that the biological conclusions stand across these solutions. Regardless, both with and without the two subjects a stable covariance matrix entailing sensorimotor-STN connectivity was determined, which is the main finding for the Ctx-STN state OFF medication.

      We therefore discuss this issue now within the limitation section (page 20):

      “Both motor impairment and motor improvement can cause movement during the resting state in PD. While such movement is a deviation from a resting state in healthy subjects, such movements are part of the disease and occur unwillingly. Therefore, such movements can arguably be considered part of the resting state of Parkinson’s disease. None of the patients in our cohort experienced hyperkinesia during the recording. All patients except for two were of the akinetic-rigid subtype. We verified that tremor movement is not driving our results. Recalculating the HMM states without these 2 subjects, even though it slightly changed some particular aspects of the HMM solution did not materially affect the conclusions.”

      Figure R1: States obtained after removing one tremor dominant and one mixed type patient from analysis. Panel C shows the split OFF medication cortico-STN state. Most of the cortico-STN connectivity is captured by the state shown in the top row (Figure 1 C OFF). Only the motor-STN connectivity in the alpha and beta band (along with a medial frontal-STN connection in the alpha band) is captured separately by the states labeled “OFF Split” (Figure 1 C OFF SPLIT).

      This reviewer was unclear on why increased "communication" in the medial OFC in delta and theta was interpreted as a pathological state indicating deteriorated frontal executive function. Given that the authors provide no evidence of poor executive function in the patients studied, the authors must at least provide evidence from other studies linking this feature with impaired executive function.

      If we understand the comment correctly it refers to the statement in the abstract “Dopaminergic medication led to communication within the medial and orbitofrontal cortex in the delta/theta frequency range. This is in line with deteriorated frontal executive functioning as a side effect of dopamine treatment in Parkinson’s disease”

      This statement is based on the dopamine overdose hypothesis reported in the Parkinson’s disease (PD) literature (Cools 2001; Kelly et al. 2009; MacDonald and Monchi 2011; Vaillancourt et al. 2013). We have elaborated upon the dopamine overdose hypothesis in the discussion on page 16. In short, dopaminergic neurons are primarily lost from the substantia nigra in PD, which causes a higher dopamine depletion in the dorsal striatal circuitry than within the ventral striatal circuits (Kelly et al. 2009; MacDonald and Monchi 2011). Thus, dopaminergic medication to treat the PD motor symptoms leads to increased dopamine levels in the ventral striatal circuits including frontal cortical activity, which can potentially explain the cognitive deficits observed in PD (Shohamy et al. 2005; George et al. 2013). We adjusted the abstract to read:

      “Dopaminergic medication led to coherence within the medial and orbitofrontal cortex in the delta/theta frequency range. This is in line with known side effects of dopamine treatment such as deteriorated executive functions in Parkinson’s disease.”

      In this article, authors repeatedly state their method allows them to delineate between pathological and physiological connectivity, but they don't explain how dynamical systems and discrete-state stochasticity support that goal.

      To recapitulate, the HMM divides a continuous time series into discrete states. Each state is a time-delay embedded covariance matrix reflecting the underlying connectivity between brain regions as well as the specific temporal dynamics in the data when such state is active. See Packard et al., (1980) for details about how a time-delay embedding characterises a linear dynamical system.

      Please note that the HMM was used as a data-driven, descriptive approach without explicitly assuming any a-priori relationship with pathological or physiological states. The relation between biology and the HMM states, thus, purely emerged from the data; i.e. is empirical. What we claim in this work is simply that the features captured by the HMM hold some relation with the physiology even though the estimation of the HMM was completely unsupervised (i.e. blind to the studied conditions). We have added this point also to the limitations of the study on page 19 and the following to the introduction to guide the reader more intuitively (page 4):

      “To allow the system to dynamically evolve, we use time delay embedding. Theoretically, delay embedding can reveal the state space of the underlying dynamical system (Packard et al., 1980). Thus, by delay-embedding PD time series OFF and ON medication we uncover the differential effects of a neurotransmitter such as dopamine on underlying whole brain connectivity.”

      Reviewer #2:

      Sharma et al. investigated the effect of dopaminergic medication on brain networks in patients with Parkinson's disease combining local field potential recordings from the subthalamic nucleus and magnetencephalography during rest. They aim to characterize both physiological and pathological spectral connectivity.

      They identified three networks, or brain states, that are differentially affected by medication. Under medication, the first state (termed hyperdopaminergic state) is characterized by increased connectivity of frontal areas, supposedly responsible for deteriorated frontal executive function as a side effect of medical treatment. In the second state (communication state), dopaminergic treatment largely disrupts cortico-STN connectivity, leaving only selected pathways communicating. This is in line with current models that propose that alleviation of motor symptoms relates to the disruption of pathological pathways. The local state, characterized by STN-STN oscillatory activities, is less affected by dopaminergic treatment.

      The authors utilize sophisticated methods with the potential to uncover the dynamics of activities within different brain network, which opens the avenue to investigate how the brain switches between different states, and how these states are characterized in terms of spectral, local, and temporal properties. The conclusions of this paper are mostly well supported by data, but some aspects, mainly about the presentation of the results, remain:

      We would like to thank the reviewer for his succinct and clear understanding of our work.

      1) The presentation of the results is suboptimal and needs improvement to increase readers' comprehension. At some points this section seems rather unstructured, some results are presented multiple times, and some passages already include points rather suitable for the discussion, which adds too much information for the results section.

      We have removed repetitions in the results sections and removed the rather lengthy introductory parts of each subsection. Moreover, we have now moved all parts, which were already an interpretation of our findings to the discussion.

      2) It is intriguing that the hyperdopaminergic state is not only identified under medication but also in the off-state. This is intriguing, especially with the results on the temporal properties of states showing that the time of the hyperdopaminergic state is unaffected by medication. When such a state can be identified even in the absence of levodopa, is it really optimal to call it "hyperdopaminergic"? Do the results not rather suggest that the identified network is active both off and on medication, while during the latter state its' activities are modulated in a way that could relate to side effects?

      The reviewer’s interpretations of the results pertaining to the hyper-dopaminergic state are correct. The states had been named post-hoc as explained in the results section. The hyper-dopaminergic state’s name derived from it showing the overdosing effects of dopamine. Of course, these results are only visible on medication. But off medication, this state also exists without exhibiting the effects of excess dopamine. To avoid confusion or misinterpretation of the findings and also following the relevant comment by reviewer 1, we renamed all states to be more descriptive:

      Hyperdopaminergic > Cortico-cortical state

      Communication > Cortico-STN state

      Local > STN-STN state.

      3) Some conclusions need to be improved/more elaborated. For example, the coherence of bilateral STN-STN did not change between medication off and on the state. Yet it is argued that a) "Since synchrony limits information transfer (Cruz et al. 2009; Cagnan, Duff, and Brown 2015; Holt et al. 2019) , local oscillations are a potential mechanism to prevent excessive communication with the cortex" (line 436) and b) "Another possibility is that a loss of cortical afferents causes local basal ganglia oscillations to become more pronounced" (line 438). Can these conclusions really be drawn if the local oscillations did not change in the first place?

      We apologize for the unclear description. Our conclusion was based on the following results:

      a) We state that STN-STN connectivity as measured by the magnitude of STN-STN coherence does not change OFF vs ON medication in the Cortico-STN state. This result is obtained using inter-medication analysis.

      b) But ON medication, STN-STN coherence in the Cortico-STN state was significantly different from mean coherence within the ON condition. These results are obtained using intra-medication analysis.

      Based on this, we conclude that in the Cortico-STN state, although OFF vs ON medication the magnitude of STN-STN coherence was unchanged, the STN-STN coherence was significantly different from mean coherence in the ON medication condition. The emergence of synchronous STN-STN activity may limit information exchange between STN and cortex ON medication.

      An alternative explanation for these findings might be a mechanism preventing connectivity between cortex and the STN ON medication. This missing interaction between STN and cortex might cause STN-STN oscillations to increase compared to the mean coherence within the ON state. Unfortunately, we cannot test such causal influences with our analysis.

      We have added the following discussion to the manuscript on page 17 in order to improve the exposition:

      “Bilateral STN–STN coherence in the alpha and beta band did not change in the cortico-STN state ON versus OFF medication (InterMed analysis). However, STN-STN coherence was significantly higher than the mean level ON medication (IntraMed analysis). Since synchrony limits information transfer (Cruz et al. 2009; Cagnan, Duff, and Brown 2015; Holt et al. 2019), the high coherence within the STN ON medication could prevent communication with the cortex. A different explanation would be that a loss of cortical afferents leads to increased local STN coherence. The causal nature of the cortico-basal ganglia interaction is an endeavour for future research.”

      Reviewer #3:

      In PD, pathological neuronal activity along the cortico-basal ganglia network notably consists in the emergence of abnormal synchronized oscillatory activity. Nevertheless, synchronous oscillatory activity is not necessarily pathological and also serve crucial cognitive functions in the brain. Moreover, the effect of dopaminergic medication on oscillatory network connectivity occurring in PD are still poorly understood. To clarify these issues, Sharma and colleagues simultaneously-recorded MEG-STN LFP signals in PD patients and characterized the effect of dopamine (ON and OFF dopaminergic medication) on oscillatory whole-brain networks (including the STN) in a time-resolved manner. Here, they identified three physiologically interpretable spectral connectivity patterns and found that cortico-cortical, cortico-STN, and STN-STN networks were differentially modulated by dopaminergic medication.

      Strengths:

      1) Both the methodological and experimental approaches used are thoughtful and rigorous.

      a) The use of an innovative data-driven machine learning approach (by employing a hidden Markov model), rather than hand-crafted analyses, to identify physiologically interpretable spectral connectivity patterns (i.e., distinct networks/states) is undeniably an added value. In doing so, the results are not biased by the human expertise and subjectivity, which make them even more solid.

      b) So far, the recurrent oscillatory patterns of transient network connectivity within and between the cortex and the STN reported in PD was evaluated/assessed to specific cortico-STN spectral connectivity. Conversely, whole-brain MEG studies in PD patients did not account for cortico-STN and STN-STN connectivity. Here, the authors studied, for the first time, the whole-brain connectivity including the STN (whole brain-STN approach) and therefore provide new evidence of the brain connectivity reported in PD, as well as new information regarding the effect of dopaminergic medication on the recurrent oscillatory patterns of transient network connectivity within and between the cortex and the STN reported in PD.

      2) Studying the temporal properties of the recurrent oscillatory patterns of transient network connectivity both ON and OFF medication is extremely important and provide interesting and crucial information in order to delineated pathological versus physiologically-relevant spectral brain connectivity in PD.

      We would like to thank the reviewer for their valuable feedback and correct interpretation of our manuscript.

      Weaknesses:

      1) In this study, the authors implied that the ON dopaminergic medication state correspond to a physiological state. However, as correctly mentioned in the limitations of the study, they did not have (for obvious reasons) a control/healthy group. Moreover, no one can exclude the emergence of compensatory and/or plasticity mechanisms in the brain of the PD patients related to the duration of the disease and/or the history of the chronic dopamine-replacement therapy (DRT). Duration of the disease and DRT history should be therefore considered when characterizing the recurrent oscillatory patterns of transient network connectivity within and between the cortex and the STN reported in PD, as well as when examining the effect of the dopaminergic medication on the functioning of these specific networks.

      We would like to thank the reviewer for pointing this out. We regressed duration of disease (year of measurement – year of onset) on the temporal properties of the HMM states. We found no relationship between any of the temporal properties and disease duration. Similarly, we regressed levodopa equivalent dosage for each subject on the temporal properties and found no relationship. We now discuss this point in the manuscript (page 20):

      “A further potential influencing factor might be the disease duration and the amount of dopamine patients are receiving. Both factors were not significantly related to the temporal properties of the states.”

      2) Here, the authors recorded LFPs in the STN activity. LFP represents sub-threshold (e.g., synaptic input) activity at best (Buzsaki et al., 2012; Logothetis, 2003). Recent studies demonstrated that mono-polar, but also bi-polar, BG LFPs are largely contaminated by volume conductance of cortical electroencephalogram (EEG) activity even when re-referenced (Lalla et al., 2017; Marmor et al., 2017). Therefore, it is likely that STN LFPs do not accurately reflect local cellular activity. In this study, the authors examined and measured coherence between cortical areas and STN. However, they cannot guarantee that STN signals were not contaminated by volume conducted signals from the cortex.

      We appreciate this concern and thank the reviewer for bringing it up. Marmor et al. (2017) investigated this on humans and is therefore most closely related to our research. They find that re-referenced STN recordings are not contaminated by cortical signals. Furthermore, the data in Lalla et al. (2017) is based on recordings in rats, making a direct transfer to human STN recordings problematic due to the different brain sizes. Since we re-referenced our LFP signals as recommended in the Marmor paper, we think that contamination due to cortical signals is relatively minor; see Litvak et al. (2011), Hirschmann et al. (2013), and Neumann et al. (2016) for additional references supporting this. That being said, we now discuss this potential issue in the paper on page 20.

      “Lastly, we recorded LFPs from within the STN –an established recording procedure during the implantation of DBS electrodes in various neurological and psychiatric diseases. Although for Parkinson patients results on beta and tremor activity within the STN have been reproduced by different groups (Reck et al. 2010, Litvak et al. 2011, Florin et al. 2013, Hirschmann et al. 2013, Neumann et al. 2016), it is still not fully clear whether these LFP signals are contaminated by volume-conducted cortical activity. However, while volume conduction seems to be a larger problem in rodents even after re-referencing the LFP signal (Lalla et al. 2017), the same was not found in humans (Marmor et al. 2017).”

      3) The methods and data processing are rigorous but also very sophisticated which make the perception of the results in terms of oscillatory activity and neural synchronization difficult.

      To aid intuition on how to interpret the result in light of the methods used, one can compare the analysis pipeline to a windowing approach. In a more standard approach, windows of different time length can be defined for different epochs within the time series and for each window coherence and connectivity can be determined. The difference in our approach is that we used an unsupervised learning algorithm to select windows of varying length based on recurring patterns of whole brain network activity. Within those defined windows we then determine the oscillatory properties via coherence and power – which is the same as one would do in a classical analysis. We have added an explanation of the concept of “oscillatory activity” within our framework to the introduction (page 2 footnote):

      “For the purpose of our paper, we refer to oscillatory activity or oscillations as recurrent, but transient frequency–specific patterns of network activity, even though the underlying patterns can be composed of either sustained rhythmic activity, neural bursting, or both (Quinn et al. 2019).”

      Moreover, we provide a more intuitive explanation of the analysis within the first section of the results (page 4):

      “Using an HMM, we identified recurrent patterns of transient network connectivity between the cortex and the STN, which we henceforth refer to as an ‘HMM state’. In comparison to classic sliding-window analysis, an HMM solution can be thought of as a data-driven estimation of time windows of variable length (within which a particular HMM state was active): once we know the time windows when a particular state is active, we compute coherence between different pairs of regions for each of these recurrent states.”

      4) Previous studies have shown that abnormal oscillations within the STN of PD patients are limited to its dorsolateral/motor region, thus dividing the STN into a dorsolateral oscillatory/motor region and ventromedial non-oscillatory/non-motor region (Kuhn et al. 2005; Moran et al. 2008; Zaidel et al. 2009, 2010; Seifreid et al. 2012; Lourens et al. 2013, Deffains et al., 2014). However, the authors do not provide clear information about the location of the LFP recordings within the STN.

      We selected the electrode contacts based on intraoperative microelectrode recordings (for details, see page 23). The first directional recording height after the entry into the STN was selected to obtain the three directional LFP recordings from the respective hemisphere. This practice has been proven to improve target location (Kochanski et al., 2019; Krauss et al., 2021). The common target area for DBS surgery is the dorsolateral STN. To confirm that the electrodes were actually located within this part of the STN, we now reconstructed the DBS location with Lead-DBS (Horn et al. 2019). All electrodes – except for one – were located within the dorsolateral STN (see figure 7 of the manuscript). To exclude that our results were driven by outlier, we reanalysed our data without this patient. No change in the overall connectivity pattern was observed (see figure R3 of the response letter).

      Figure R2: Lead DBS reconstruction of the location of electrodes in the STN for different subjects. The red electrodes have not been placed properly in the STN. The contacts marked in red represent the directional contacts from which the data was used for analysis.

      Figure R3: HMM states obtained after running the analysis without the subject with the electrode outside the STN.

      References:

      Buzsáki G, Anastassiou CA, Koch C. The origin of extracellular fields and currents-EEG, ECoG, LFP and spikes. Nat Rev Neurosci 2012; 13: 407–20.

      Cagnan H, Duff EP, Brown P. The relative phases of basal ganglia activities dynamically shape effective connectivity in Parkinson’s disease. Brain 2015; 138: 1667–78.

      Cools R. Enhanced or impaired cognitive function in Parkinson’s disease as a function of dopaminergic medication and task demands. Cereb Cortex 2001; 11: 1136–43.

      Cruz A V., Mallet N, Magill PJ, Brown P, Averbeck BB. Effects of dopamine depletion on network entropy in the external globus pallidus. J Neurophysiol 2009; 102: 1092–102.

      Florin E, Erasmi R, Reck C, Maarouf M, Schnitzler A, Fink GR, et al. Does increased gamma activity in patients suffering from Parkinson’s disease counteract the movement inhibiting beta activity? Neuroscience 2013; 237: 42–50.

      George JS, Strunk J, Mak-Mccully R, Houser M, Poizner H, Aron AR. Dopaminergic therapy in Parkinson’s disease decreases cortical beta band coherence in the resting state and increases cortical beta band power during executive control. NeuroImage Clin 2013; 3: 261–70.

      Hirschmann J, Özkurt TE, Butz M, Homburger M, Elben S, Hartmann CJ, et al. Differential modulation of STN-cortical and cortico-muscular coherence by movement and levodopa in Parkinson’s disease. Neuroimage 2013; 68: 203–13.

      Holt AB, Kormann E, Gulberti A, Pötter-Nerger M, McNamara CG, Cagnan H, et al. Phase-dependent suppression of beta oscillations in parkinson’s disease patients. J Neurosci 2019; 39: 1119–34.

      Horn A, Li N, Dembek TA, Kappel A, Boulay C, Ewert S, et al. Lead-DBS v2: Towards a comprehensive pipeline for deep brain stimulation imaging. Neuroimage 2019; 184: 293–316.

      Kelly C, De Zubicaray G, Di Martino A, Copland DA, Reiss PT, Klein DF, et al. L-dopa modulates functional connectivity in striatal cognitive and motor networks: A double-blind placebo-controlled study. J Neurosci 2009; 29: 7364–78.

      Kochanski RB, Bus S, Brahimaj B, Borghei A, Kraimer KL, Keppetipola KM, et al. The impact of microelectrode recording on lead location in deep brain stimulation for the treatment of movement disorders. World Neurosurg 2019; 132: e487–95.

      Krauss P, Oertel MF, Baumann-Vogel H, Imbach L, Baumann CR, Sarnthein J, et al. Intraoperative neurophysiologic assessment in deep brain stimulation surgery and its impact on lead placement. J Neurol Surgery, Part A Cent Eur Neurosurg 2021; 82: 18–26.

      Lalla L, Rueda Orozco PE, Jurado-Parras MT, Brovelli A, Robbe D. Local or not local: Investigating the nature of striatal theta oscillations in behaving rats. eNeuro 2017; 4: 128–45.

      Litvak V, Jha A, Eusebio A, Oostenveld R, Foltynie T, Limousin P, et al. Resting oscillatory cortico-subthalamic connectivity in patients with Parkinson’s disease. Brain 2011; 134: 359–74.

      MacDonald PA, MacDonald AA, Seergobin KN, Tamjeedi R, Ganjavi H, Provost JS, et al. The effect of dopamine therapy on ventral and dorsal striatum-mediated cognition in Parkinson’s disease: Support from functional MRI. Brain 2011; 134: 1447–63.

      MacDonald PA, Monchi O. Differential effects of dopaminergic therapies on dorsal and ventral striatum in Parkinson’s disease: Implications for cognitive function. Parkinsons Dis 2011; 2011: 1–18.

      Marmor O, Valsky D, Joshua M, Bick AS, Arkadir D, Tamir I, et al. Local vs. volume conductance activity of field potentials in the human subthalamic nucleus. J Neurophysiol 2017; 117: 2140–51.

      Neumann WJ, Degen K, Schneider GH, Brücke C, Huebl J, Brown P, et al. Subthalamic synchronized oscillatory activity correlates with motor impairment in patients with Parkinson’s disease. Mov Disord 2016; 31: 1748–51.

      Packard NH, Crutchfield JP, Farmer JD, Shaw RS. Geometry from a time series. Phys Rev Lett 1980; 45: 712–6.

      Quinn AJ, van Ede F, Brookes MJ, Heideman SG, Nowak M, Seedat ZA, et al. Unpacking Transient Event Dynamics in Electrophysiological Power Spectra. Brain Topogr 2019; 32: 1020–34.

      Reck C, Himmel M, Florin E, Maarouf M, Sturm V, Wojtecki L, et al. Coherence analysis of local field potentials in the subthalamic nucleus: Differences in parkinsonian rest and postural tremor. Eur J Neurosci 2010; 32: 1202–14.

      Shohamy D, Myers CE, Grossman S, Sage J, Gluck MA. The role of dopamine in cognitive sequence learning: Evidence from Parkinson’s disease. Behav Brain Res 2005; 156: 191–9.

      Spiegel J, Hellwig D, Samnick S, Jost W, Möllers MO, Fassbender K, et al. Striatal FP-CIT uptake differs in the subtypes of early Parkinson’s disease. J Neural Transm 2007; 114: 331–5.

      Vaillancourt DE, Schonfeld D, Kwak Y, Bohnen NI, Seidler R. Dopamine overdose hypothesis: Evidence and clinical implications. Mov Disord 2013; 28: 1920–9.

    1. Author Response:

      We would like to thank the reviewers for their thoughtful and thorough critique of our manuscript. In our revised preprint, we added important additional data and restructured our manuscript to reflect as many of the recommendations as possible. Additionally, we have added experiments to define the cellular mechanisms underlying observed damage following mechanical injury. The most significant additions of new data include:

      • Further experiments demonstrating block of glutamate clearance exacerbates stimulus-induced hair-cell synapse loss.
      • Analysis of neuromast disruption in lhfpl5b mutant null larvae showing mechanical displacement. Lhfpl5b mediates mechanosensitivity in lateral-line hair cells, allowing us to determine whether mechanotransduction is required for mechanical disruption of neuromasts.
      • Testing the vibratory stimulus at various frequencies to confirm the optimal frequency to induce acute, generally sub-lethal damage to lateral-line hair cells is 60 Hz.
      • Assessment of neuromast supporting cell and hair cell proliferation following mechanical overstimulation.
      • Quantitative analysis of kinocilia SEM and confocal images of hair bundles in control and stimulus exposed fish. Individual comments are addressed as outlined below.

      Reviewer #1:

      1) The authors use a vertically-oriented Brüel+Kjær LDS Vibrator to deliver a 60 Hz vibratory stimulus to damage lateral line hair cells. It is not made clear on why this frequency was selected. Did the authors choose this frequency because they screened a number of frequencies and this is the one that did the most damage to hair cells or was it chosen for another reason? Or, do all frequencies do the same amount of damage? The authors should screen a number of frequencies and choose the stimulus that does the most damage to hair cells. This would set the field in the best direction, should members of the community attempt this new technique. It is not necessary to repeat all of the experiments, but the authors should show which frequencies are best for inducing damage.

      The frequency selected for mechanical overexposure of lateral-line organs was based on previous studies showing 60 Hz to be within the optimal upper frequency range of mechanical sensitivity of superficial posterior lateral-line neuromasts, with maximal response between 10-60 Hz, but a suboptimal frequency for hair cells of the anterior macula in the ear (Weeg and Bass 2002, Trapani et al, 2009, Levi et al, 2015). To confirm that 60 Hz was the optimal frequency to induce damage, we tested 45, 60, and 75 Hz at comparable intensities. We observed at 75 Hz no apparent damage to lateral line neuromasts while 45 Hz at a comparable intensity proved toxic i.e. it was lethal to the fish. We have updated the Results and Method Details to include our rationale for choosing 60 Hz.

      2) The SEM images of the hair bundle are beautiful and do show damage to the hair bundle, but historically speaking older studies in mammals have shown that the actin core of the stereocilia is damaged. It would be critical to know if this was the case. Showing damage to the kinocilium and stereocilia splaying is a start, but readers would need to know if the actin cores are damaged. So, TEM should be used to find damage to the actin cores of stereocilia.

      Our main goal of this initial manuscript was to survey morphological and functional changes in mechanically injured lateral line organs with an emphasis on inflammation and synapse loss. We agree TEM studies showing damage to the actin core of the stereocilia will be important to determine whether mechanical damage to neuromast hair bundles fully mimics mammalian stereocilia damage, but these experiments will require significant time to perform and optimize. We have expanded our analysis of hair-bundle morphology in this study and intend to pursue deeper analysis of hair bundle damage, i.e. examination of the stereocilia actin core, in future follow-up studies.

      3) I think the use of "Noise-exposed lateral line" as a term for mechanically overstimulated lateral line hair cells is not correct and could be misleading. The lateral line senses water motion not sound as the word noise would imply. Calling the stimulus "noise" should be removed throughout.

      We have removed the term “noise” throughout the manuscript and replaced it with either “strong water current stimulus” or “mechanical overstimulation” where appropriate.

      4) Decreases in mechanotransduction are shown by dye entry. These results should be strengthened using microphonic potentials to determine the extent of damage. This experiment is not necessary but would improve the quality of the document.

      While we agree that microphonic recordings would provide further support for reduced mechanotransduction, quantitative FM1-43 uptake in zebrafish lateral line hair cells is a well-established proxy for microphonic measurements. In a previous study using the same protocol utilized in our manuscript, FM1-43 labeling intensity was shown to directly correspond with microphonic amplitude (Toro et al, 2015). Moreover, the fixable analogue of FM1-43 (FM1-43FX) gave us comparable relative measurements of uptake as live FM1-43 and provided the additional advantage of high temporal resolution and the ability to simultaneously assay entire cohorts of control and overstimulated fish (which is not possible with microphonic measurements or live FM1-43 imaging), as we could expose groups of fish briefly to the dye at determined time intervals following overstimulation, then immediately place in fixative.

      5) In figure 2, PSD labeling is not clear.

      We assume the reviewer meant PSD labeling in Figure 4 and we agree it is difficult to discern. We have changed the hair-cell label from gray to blue in the images so that the green PSD labeling is clear.

      Reviewer #2:

      1) While the findings are carefully measured and described, the effects of insult on hair cells are relatively minor, with a change in hair cell number, extent of innervation or synapses per hair cell (Figs 3 and 4) in the range of 10% reduction compared to control. One potential value of the model would be to use it to discover underlying pathways of damage or screen for potential therapeutics. However with these modest changes it is not clear that there will be enough power to determine effects of potential interventions.

      One advantage of the zebrafish model is the ability to overstimulate large cohorts of larvae, thereby providing enough power to uncover modest but significant changes resulting from moderate damage to hair cells. While not as well suited for unbiased large-scale screens of therapeutics, our overexposure protocol provides the opportunity to determine the role of specific cellular pathways (e.g. metabolic stress, inflammation, and glutamate excitotoxicity) in hair-cell damage and synapse loss following mechanically-induced damage via genetic or pharmacological manipulation of these pathways. Additionally, as the hair cell synapses fully repair following stimulus-induced loss, the zebrafish model has the potential for identifying novel pathways for repair through transcriptomic profiling (for an example, see Mattern et al, Front. Cell Dev. Biol., 2018). Cumulatively, these future experimental directions will provide important mechanistic information that could be used toward the development of targeted therapeutic interventions.

      2) The most dramatic phenotype after shaking is a physical displacement of hair cells, described as disrupted morphology. However it is not clear what the underlying cause of this change. Are only posterior neuromasts damaged in this way? Is it a wounding response as animals are exposed to an air interface during shaking? It is also not clear to what extent this displacement reveals more general principles of the effects of noise on hair cells. Additional discussion of underlying causes would be welcome.

      We agree that the underlying causes of the physical displacement of posterior lateral-line neuromasts warranted further investigation and we have expanded appropriate sections of the results. To determine if excessive hair-cell activity plays a role in the displacement of neuromasts we have exposed lhfpl5b mutant—fish that have intact hair cell function in the ear, but no mechanotransduction in hair cells of the lateral line—to mechanical overstimulation. We observed comparable disruption of neuromasts lacking mechanotransduction, supporting that displacement of lateral-line hair cells is due to mechanical damage and does not require intact mechnotransduction. Further, when examining the adjacent supporting cells in disrupted neuromasts, we observed they are similarly displaced and elongated. We conclude that observed disruption of hair cells is a consequence of mechanical displacement of the entire neuromast organ. We have added additional discussion of this phenomenon to the Results and Discussion sections of the manuscript.

      3) Because afferent neurons innervate more than one neuromast and more than one hair cell per neuromast, measurements of innervation of neuromasts (Figure 3) or synapses per hair cell (Fig 4) cannot be assumed to be independent events. That is, changes in a single postsynaptic neuron may be reflected across multiple synapses, hair cells, and even neuromasts. This needs to be accounted for in experimental design for statistical analysis.

      We agree that changes in single postsynaptic neurons, which innervate groups of hair cells of the same polarity within a neuromast, could be reflected across multiple synapses. Additionally, it is plausable that excitotoxic events at the postsynapse, while not contributing to apparent neurite retraction, could be contributing to synapse loss across multiple innervated hair cells. We have updated the manuscript to reflect the potential contribution of postsynaptic signaling to synapse loss and added experiments pharmacologically blocking glutamate uptake.

      4) The SEM analysis provides compelling snapshots of apical damage, but could be supplemented by quantitative analysis with antibody staining or transgenic lines where kinocilia are labeled. The amount of reduced FM1-43 labeling is one of the more dramatic effects of the shaking insult, suggesting widespread disruption to mechanotransduction that could be related to this apical damage. Further examination of the recovery of mechanotransduction would be interesting.

      To supplement the SEM snapshots of severe apical damage, we have expanded the SEM image analysis with quantitative data on kinocilia morphology. We have also added confocal images of hair bundles using antibody labeling of acetylated tubulin in a transgenic line expressing β-actin-GFP in hair cells. We agree that correlative studies of mechanotransduction recovery relative to hair-bundle morphology would be interesting, and we intend to examine this question in a future follow-up study.

      5) A previous publication by Uribe et al.2018 describes a somewhat similar shaking protocol with somewhat different results - more long-lasting changes in hair cell number, presynaptic changes in synapses, etc. It would be worth discussing potential differences across the two studies.

      We agree we did not adequately address the considerable differences between our mechanical damage protocol for the zebrafish lateral line and the damage protocol described by Uribe et al, 2018. We have provided a more direct comparison in the Results section and addressed the differences in our protocols in-depth in the Discussion section.

      Our damage protocol uses a stimulus within the known frequency range of lateral-line hair cells (60 Hz) that is applied to free-swimming larvae and evokes a behaviorally relevant response (fast start response). The damage is observable immediately following noise exposure, is specific to posterior lateral-line neuromasts, and appears to be rapidly repaired. Some features of the damage we observe—reduced mechanotransduction and hair-cell synapse loss—may correspond to mechanically induced damage of hair cell organs in other species. Notably, hair cell synapse loss in seemingly intact neuromasts is exacerbated by pharmacologically blocking synaptic glutamate clearance, supporting that the 60 Hz frequency stimulus is overstimulating neuromast hair cells directly and suggesting that the mechanism of synapse loss may be similar to inner hair cell synapse loss reported in mice following moderate noise exposures.

      By contrast, the damage protocol published by Uribe et al used ultrasonic transducers (40-kHz) to generate small, localized shock waves rather than directly stimulate neuromast hair cells. The damaged they reported—delayed hair-cell death and modest synapse loss with no effect on hair-cell mechanotransduction—was not apparent until 48 hours following exposure and not specific to the lateral-line organ. Some of the features of the damage they observed—delayed onset apoptosis and hair-cell death—may correspond to damage reported in mice following blast injuries.

      Reviewer #3:

      1) As the authors point out, zebrafish hair cells can be regenerated. With that in mind, and to make the relevance for mammalian hair cell repair clear, a clear distinction between mechanisms mediated by "repair" or "regeneration" needs to be made. The authors discuss that proliferative hair cell generation can be excluded based on the short time period, but suggest that transdifferentiation might be involved. Recovery of NM hair cell number occurs within the same 2 hour period in which NM morphology and hair cell function improved, making it difficult to determine the extent to which "regeneration" contributed to the recovery. The amount of transdifferentiation has to be shown experimentally (lineage tracing?).

      We agree that the distinction between "repair" and "regeneration" needs to be made when discussing this model of mechanical damage to zebrafish hair cell organs. We have tried to clarify that most of what we observe regarding recovery—restoration of neuromast shape, mechanostransduction, afferent contacts, and synapse number —reflect mechanisms of repair following mechanical damage (and, in the case of synapse loss, overstimulation) rather than regeneration. However, one feature of damage that may reflect rapid regeneration is restoration of hair cells number following mechanical injury. To experimentally determine whether proliferation contributed to hair cell generation, we assessed the incorporation of the thymidine analog EdU during a 4 hour recovery following mechanical overexposure in a transgenic line expressing GFP in neuromast supporting cells and observe a modest but not statistically significant increase in the number of proliferating supporting cells in neuromasts exposed to strong current stimulus, suggesting recovery of lost hair cells is not primarily due to renewed proliferation.

      The number of hair cells that are lost and recover within several hours are low, i.e., typically ~1 hair cell/neuromast. We observed this consistently in all of our experiments, but the mechanisms responsible are not clear. Based on previous studies of hair cell regeneration in the lateral line, the recovery time appears too rapid to be caused by renewed proliferation, a notion that is further supported by our Edu studies. On the other hand, it is possible that a few supporting cells may undergo the initial phases of phenotypic change into hair cells during this short time period, and we speculate that such transdifferentiation may be responsible for the observed recovery. We should emphasize that this is a new observation and, at present, we do not fully understand the underlying mechanism. However, the focus of the present study is on mechanical damage, synaptic loss, and subsequent repair. We believe that it is important to report our consistent findings of low level hair cell loss and recovery, but a detailed characterization of the mechanism would require considerable effort and would best be the topic of a future study.

      2) The classification of "normal" vs "disrupted" is vague and not quantitative. The examples shown in the paper seem to be quite clear-cut, but this reviewer doubts that was the case throughout all analyzed samples. Formulate clear benchmarks and criteria for the disrupted phenotype (even when blind analysis is performed).

      We have defined measurable criteria for "normal" vs "disrupted" neuromasts that we have added to the Method Details section: “We defined exposed neuromast morphology as “normal” when hair cells appeared radially organized with a relatively uniform shape and size, with ≤7 μm difference observed when comparing the lengths from apex to base of an opposing pair of anterior/posterior hair cells. Length was measured from a fixed point at the center of the hair bundle to the basolateral end of each opposing hair cell. We defined neuromasts as “disrupted” when hair cells appeared elongated and displaced to one side, with >7 μm difference observed when comparing the lengths of an opposing pair of anterior/posterior hair cells. Generally, the apical ends of the hair cells were displaced posteriorly, with the basolateral ends oriented anteriorly.”

      3) Sustained and periodic exposure: These two exposure protocols not only differ with respect to sustained vs periodic, they also differ in total exposure time (Fig 2B). This complicates the interpretation, especially considering the authors own finding that a pre-exposure is protective.

      To clarify—pre-exposure was not protective to hair-cell survival. Rather, in preliminary experiments, pre-exposure appeared to reduce larval mortality, and we have clarified that observation in the text of the Results and the Methods Details sections. We agree with the reviewer that comparing the two protocols based on differences in time distribution is complicated in that they also differ in total exposure time. For the purpose of clarity, we now focus on the sustained exposure in the main figures and created supplemental figures for the reduced damage still observed using periodic exposure, specifying that reduced damage may be the result of periodic time distribution of stimulus and/or less cumulative time exposed to the stimulus.

      4) The data on the mitochondrial ROS aspect seems not well integrated into the overall story.

      We agree that the ROS story was not well integrated and incomplete. We have removed the data describing mpv17-/- mutants and mitochondrial disfunction from this manuscript. A more comprehensive report of mpv17-/- mutant mitochondrial function and morphological analysis of neuromasts following noise exposure is now described in a follow-up manuscript (“Influence of Mpv17 on hair-cell mitochondrial homeostasis, synapse integrity, and vulnerability to damage in the zebrafish lateral line”).

      5) It is surprising that the hair bundle morphology was not assessed after recovery. This is crucial. Overall, it would be good to see some quantification of the SEM data, e.g. kinocilia length and number of splayed bundles.

      We have expanded the SEM image analysis to quantitatively access kinocilia morphology following exposure. We agree that assessment of recovery using live imaging of hair bundles paired with subsequent SEM analysis will be informative, and we intend to perform those experiments in a future study.

      6) Behavioral recovery (measured as number of "fast start" responses) was also not assessed. This is essential for determining the functional relevance of the recovery.

      We attempted to measure behavior recovery of lateral-line function by measuring “fast-start” responses immediately and several hours after recovery, and discovered that i) strong water current provided stimulation that was too intense to reveal subtle behavioral changes following lateral-line damage and recovery, and ii) when testing larvae immediately following sustained strong current exposures, it was difficult to discern if fewer “fast-start” responses were due to lateral-line organ damage or larval fatigue. We agree that behavioral recovery is important to assay but acknowledge assessing lateral-line mediated behavior following mechanical damage will require a more sensitive testing paradigm that stimulates the lateral-line sensory organ with a relatively gentile, calibrated water flow stimulus. We are currently performing a follow-up study to this paper using a testing paradigm developed by a postdoctoral associate in our lab that analyses subtle changes in larval orientation to water flow (rheotaxis) mediated by the lateral-line organ. Using this behavior paradigm, we will directly correlate morphological and functional recovery over time.

      7) This reviewer is not yet convinced that this damage model displays enough commonalities to mammalian noise damage to justify the ubiquitous use of the term "noise" throughout the manuscript. It would be more prudent to use a more careful term along the lines of "mechanical overstimulation-induced damage".

      We have removed the term “noise” throughout the manuscript and replaced it with either “strong water current stimulus” or “mechanical overstimulation” where appropriate.

      8) Overall, there was a lack of experimental and analysis detail in the results section. For example, how was afferent innervation quantified? Just counting GFP labeled contacts to hair cells?

      Innervation of neuromast hair cells was quantified during blinded analysis by scrolling through confocal z-stacks of each neuromast (step size 0.3 μm) containing hair cell and afferent labeling and identifying hair cells that were not directly contacted by an afferent neuron i.e. no discernable space between the hair cell and the neurite. Hair cells that were identified as no longer innervated showed measurable neurite retraction; there was generally >0.5 μm distance between a retracted neurite and hair cell. We have added this information to the Methods Detail section.

      There was also inconsistency in the use of two variations of the mechanical damage protocol, the time points at which repair was assessed, and whether the damage was quantified in all neuromasts or in normal vs. disrupted neuromasts separately, making the data difficult to interpret.

      We have revised our figure legends to clearly indicate when we are assessing damage in all exposed neuromasts (pooled) to control vs. comparative analysis of normal vs. disrupted neuromasts relative to control. In addition, we now focus on the sustained exposure in the main figures, which was the exposure protocol used for the time points in which repair and recovery were assessed.

    1. Author Response:

      Reviewer #1:

      In this manuscript, Ma, Hung and colleagues rewind the tape to explore the genetic landscape that precedes carbapenem resistance of Klebsiella pneumoniae strains. The importance of this work is underscored by the paucity of new drugs to treat CPO (carbapenemase producing organisms). 'Given the need for 35 greater antibiotic stewardship, these findings argue that in addition to considering the current 36 efficacy of an antibiotic for a clinical isolate in antibiotic selection, considerations of future 37 efficacy are also important.' And so I would say the major weakness of the paper is the aspirational nature of how this work could be used by clinicians in antibiotic selection or treatment of the patient.

      We consider this study as a first step towards recognizing the need to develop more comprehensive diagnostics and more sophisticated antibiotic stewardship programs. This study suggests that factors besides MICs could inform clinical antibiotic selection, including that specific lineages have higher propensity to develop resistance (i.e., ST258), stepping-stone mutations that facilitate the evolution of resistance (i.e., mutations in rseA and ompK36), and antibiotics that have high level resistance barriers (i.e., meropenem). We have now added language to both the introduction and discussion to note that next steps are needed to extend these findings into the clinic, including more extensive whole genome sequencing of isolates and tracking of these strains in the clinic, associated patient outcome and strain evolution data, to understand the full impact of these mutational events in CREs.

      The strains selected for these experiments and the evolutionary in vitro models are both well considered. One idea that has stuck with me from the figures of a review article by Kishony (https://pubmed.ncbi.nlm.nih.gov/23419278/, figure 4) is the concept of constraining the evolutionary pathways or fitness landscape for antibiotic resistance. Are there any peaks that a microbial strain reaches that optimize resistance to one AbX but basically leave it inherently unable to evolve resistance to another AbX? This could have application for dual drug therapy or pulsed therapy.

      This is a good evolutionary question that might be suggested by Kishony’s work. In our particular study however, because the majority of isolates used that are carbapenem susceptible are already resistant to many other antibiotics, we cannot measure their resistance frequencies to other clinically relevant antibiotics. It does suggest that such a strategy would have to be implemented early enough before strains have already acquired significant resistance and cannot be used to manage currently existing resistance.

      When you sequence the isolates that have increased their MIC do you find 'unrelated' mutations in genes that would control protein synthesis or other functions that might be compensatory mutations. Developing a clearer understanding of the rewiring of the bacterium's basic processes might also elucidate both integrated functions and potential weaknesses. You mention mutations in wzc, ompA, resA, bamD.

      Yes. We found some strains had acquired multiple mutations in multiple genes. Please refer to supplementary file 12. In some cases, we found additional mutations of unclear significance; for example, we identified two mutations in Mut86. We tested these two mutations separately and found that only the mutation in ompA affects the susceptibility of the mutant. However, this does not exclude the possibility that the other mutation might have other compensatory functions versus just being a random passenger mutation; this will require further investigation.

      On the other hand, in some cases, we indeed found mutations that affect the fitness of the isolates when cultured in LB medium or M9, e.g., mutations in rseA. Some mutations affect fitness only in LB medium but not M9, e.g., mutations in ompK36. Some mutations do not significantly affect the fitness in either LB or M9, e.g., duplication of blaSHV-12. We are performing RNA sequencing on these mutants to further understand the “rewiring of the bacterium's basic processes.”

      Point of discussion. Classic ST258 carries blaKPC on pKpQIL plasmid. Your ST258 strain (UCI38) carries blaSHV-12 on pESBL. Am I to assume that pESBL is in lieu of pKpQIL?

      Indeed, pESBL encodes an ESBL in UCI38 and may obviate the need for another classical KPC-carrying plasmid such as pKpQIL. However, pESBL and pKpQIL are not incompatible and so it is not clear that anything is precluding UCI38 from picking up pKpQIL.

      Transformation of CPO have many variables and in vitro data does not always mirror what is observed in vivo. So the findings of Fig 2f might need to be considered under different laboratory conditions (substrate, temperature) [https://pubmed.ncbi.nlm.nih.gov/27270289/].

      We revised the statement in the revision and pointed out that the results in Fig. 2F were limited to our assay condition.

      Reviewer #2:

      In this manuscript Ma et al., sought to investigate the breadth of genetic mechanisms available across various lineages of clinical isolates of Klebsiella pneumoniae, with a specific focus on carbapenem resistance evolution. The authors systematically evaluated how different carbapenems and genetic backgrounds affect the rate of evolution by measuring mutation frequencies. The authors found three major observations: First, that a higher mutational frequency is dependent on genetic background and high-level transposon activity affecting porins associated to carbapenem resistance. Importantly transposon activity was not only higher than SNP acquisition rates in distinct backgrounds, but was also reversible, thus emphasizing that resistance evolution via this mechanism might impart less of a cost than by the accumulation of mutations in other genetic backgrounds. Second, that CRISPR-cas systems have the potential to restrict the horizontal acquisition of resistance elements. Importantly, determining the presence or absence of such systems alone is not enough to determine wether a strain is "resistant" to certain foreign elements, but specific sequences within the different spacers can be more informative of the exact range of plasmids or genetic elements to which the system is restrictive. Third, pre-selection with ertapenem increases the likelihood of resistance evolution against other carbapenems both via de novo mutation and HGT.

      Altogether, these results emphasize the importance of additional factors, other than MIC values, such as genetic background, plasmid/transposon activity, and drug identity and choice in determining the rate at which resistance can evolve in K. pneumoniae. I consider that the data generally supports the authors conclusions and provides relevant observations to the field. I do not have any major concern and think the authors have done a very complete and systematic evaluation of the data necessary to answer their questions.

      My only minor concern is regarding the authors emphasis in their introduction and discussion on how these kind of data is relevant for clinical decision making. It remains unclear to me exactly how. While I completely agree that genomic information and drug choice play a major role in the evolution of antibiotic resistance, it is unclear to me how to efficiently and promptly translate all of this information at the bedside. Genome sequencing, however economical it has become in the recent years, is still not affordable to be implemented at the scales needed for diagnosis at the clinic. Perhaps the authors could expand on how they envision this could be implemented?

      We consider this study as a first step towards the development of more comprehensive diagnostics and more sophisticated antibiotic stewardship. Indeed, as current diagnostics exist, it would be difficult to implement. However, we hope that as studies such as these grow, it will usher in a new era of diagnostics that can indeed take such factors into account. We have now added such a discussion to the introduction and discussion in the revised manuscript.

    1. Author Response:

      Reviewer #1:

      This MS combines two-photon glutamate sensing (using the iGluSnFR fluorescent probe), two-photon glutamate uncaging, two-photon calcium imaging and electrophysiology to investigate whether synaptically released glutamate activates receptors outside the synapse of release, and at neighboring synapses. The data themselves are very impressive. The authors arrive at the revolutionary conclusion that synaptically released glutamate is able to activate both NMDA and even AMPA receptors at neighboring synapses, remarkably strongly. I say revolutionary, because previous modelling has yielded diametrically opposite conclusions. The reflex would be to prefer experiment over theory, yet the modelling was based upon quite strongly constrained physical parameters that would be quite incompatible with the interpretations reported here. However, I believe the authors have failed to take into account significant technical limitations inherent in the technologies they apply. These include spatial averaging of fluorescence, possible saturation of iGluSnFR and diffusive exchange of (caged) glutamate during uncaging. As a result, the conclusion is wholly unproven. Indeed, I believe it highly probable that all of the data in favor of distal activation will prove to be consistent with synapse specificity and the presence of technical artifacts related to spatial averaging of fluorescence signals and diffusive exchange of (caged) glutamate during uncaging.

      We agree that there are technical limitations and that the interpreration of signals recorded from near synapses is difficult. This concerns the length constants we describe and name SARGe. Our usage of those terms in the results may have suggested we propose the value of lambda istelf well dscribes the action range of glutamate. This is not the true as the reviewer states and in the beginning of the discussion section we note this limitation.

      However, our interpretation that glutamate may regularly activate AMPA-R in neighboring synapses is not based on lambda values (see discussion).

      It is based on the facts that a) ~5% iGluSnFr responses are observed at more than 1.5 µm remote to a synapse and b) uncaging at 500 nm produces a current response of ~38% of the quantal synaptic amplitude. Here, the remarks of the reviewer are incorrect: a) is not affected by volume averaging or saturation of iGluSnfr and previous models predict an activation of upto 1-2% only. We have shown this by simulation in an appeal letter which unfortunately was not forwarded to the reviewer. b) is not increased by “diffusive exchange of glutamate during uncaging”. In fact, releasing the same amount glutamate for a longer period reduces distant receptor activation and current models predict an 2-4 fold lower activation of AMPA-R than we observe here. This was also shown by simulation in the appeal letter but a further exchange with the reviewer on this was not permitted by the editors.

      Reviewer #2:

      Matthews, Sun, McMahon et al. addresses the extent of the spread of the neurotransmitter glutamate into the extracellular space. The authors use a combination of imaging techniques, 2-photon glutamate uncaging and electrophysiology to conclude that vesicular glutamate release reaches nearby, adjacent synapses. Although this is an interesting question, and one that has been addressed many times previously, I have several technical concerns about the strength of the conclusions that reduces my enthusiasm.

      Unfortunately, only this general part of comments of reviewer 2 is published so that we cannot meaningfully rule out/comment on the reviewer’s concerns.

      Reviewer #3:

      This is an interesting paper combining several impressive techniques to argue that synaptically released glutamate is allowed to diffuse to and activate receptors at much greater distance than previously thought. iGluSnFR recordings show that glutamate released from single vesicles activates the indicator with a spatial spread (length constant) of 1.2 um, substantially farther than previous estimates based on the time course of glutamate clearance by glial transporters (PMC6725141). Similar parameters are observed with spontaneous and evoked events, large or small, or when glutamate is released via 2P uncaging. Further uncaging experiments show that both AMPARs and especially NMDARs are activated a substantial distance. AMPARs, previously thought to be recruited only within active synapses, are activated with a spatial length constant that compares quite closely with the average distance between synapses in the hippocampus. More heroic experiments and some geometric calculations show that this behavior enables neighboring synapses to interact supralinearly. The results suggest that "crosstalk" between neighboring synapses may be substantially more common than previously thought.

      The experiments in this paper appear carefully performed and are analyzed thoroughly. Despite all of the quantitative rigor and careful thought, however, the authors fail to reconcile convincingly their results with what we know about neuropil structure and the laws of diffusion. There are very good data in the literature regarding the extracellular volume fraction and geometric tortuosity of the neuropil, the diffusion characteristics of glutamate and the time course of glutamate uptake. These data more or less demand that synaptically released glutamate is diluted over a much smaller spatial range than that suggested here. In the Discussion, the authors suggest that this discrepancy might reflect a simplified view of the neuropil as an isotropic diffusion medium (PMC6763864, PMC6792642, PMC6725141), whereas a more realistic network of sheets and tunnels (PMC3540825) might prolong the extracellular lifetime of neurotransmitter. I like this idea in principle, but there is no quantitative support in the paper for the claim - in fact, it seems at odds with the authors' very nice demonstration that diffusion appears to be similar in all directions (Figure 3B). I don't necessarily think a solution is within the scope of this single paper, but I would suggest that the authors acknowledge the present lack of a compelling explanation.

      Our results are not predicted by the modelling studies cited that is correct and this makes them important in our eyes. But it is important to note that those modelling/simulation studies use a strong simplification and view the extracellular space/ the neuropil as a porous medium. This is a powerful approach but it is only a valid description when considering diffusion distances of several micrometer - it is not applicable on the sub micron scale of neighboring synapses (PMID: 15345540 p1608; PMID: 7338810 p227, and DOI: 10.1088/0034-4885/64/7/202). This drawback of the simulation has been overlooked and the reviewer seems not to be aware of it and we point this out at the end of the discussion section. We do not suggest anisotropy near a synapse nor a particular perisynaptic geometry such that there would be specific channels from one synapse to the next; we don’t, we also assume that the neuropil is random (as shown by PMID 9547224) - instead everywhere in the neuropil the intial and submicron diffusion will not follow the “porous medium approach”.

      It is true that we do not offer a quantitative description of how this violation of the porous medium approach would lead to an underestimation of synaptic cross-talk - we provide experimental data. However, in our appeal letter we expicitly describe this discrepancy in detail to make the reviewer aware of it, but regrettably this information never reached the reviewer.